Getting Data In

How to alert on noise messages while preventing indexing for most of them

teedilo
Path Finder

We index logs from an application that is generally well behaved, but occasionally it gets into a state where it starts logging tons of messages. Fortunately, the messages that are logged are specific enough to pick out with a couple of regex expressions. Right now I have indexer side filtering configured to completely eliminate these messages from being logged. (There are already examples of how to do this in other posts and in Splunk documentation, but I included the relevant genericized configuration file entries at the bottom of this post for completeness sake.)

What I would like to do, however, is to be able to detect when the application gets into this state, yet prevent most messages from being indexed, mainly to avoid exceeding our Splunk indexing limits. My thoughts are to:

1) Configure a new index (separate from the main index) that has very limited space defined for it (using maxTotalDataSizeMB etc. in server.conf to control the index size).

2) Send the noise messages to the new index.

3) Create a saved search to send alerts when a lot of the noise messages are occurring.

4) Use the "splunk clean eventdata -index " CLI command to clean out all of the noise messages from the special index after the application has been restarted (which gets the app to behave properly once again until the next time it starts misbehaving).

I'm pretty sure I can fumble through the configuration file changes necessary to accommodate this plan (although I'd certainly welcome any examples! 🙂 ) My question is whether this sounds like the best way to accomplish what I'm trying to do. Thoughts?

CURRENT CONFIGURATION FILE ENTRIES FOR SENDING NOISE MESSAGES TO NULL

props.conf:

[noisyapplogs]
TRANSFORMS-null1 = setnull1
TRANSFORMS-null2 = setnull2

transforms.conf:

[setnull1]
REGEX = noise message 1
DEST_KEY = queue
FORMAT = nullQueue

[setnull2]
REGEX = noise message 2
DEST_KEY = queue
FORMAT = nullQueue
1 Solution

lguinn2
Legend

In some ways, this is a pretty good plan.

I don't think it is at all necessary to clean the eventdata. This will take care of itself if you size the index relatively small.

Since you will be indexing all the noisy data - this plan is NOT going to save you any of your Splunk license. That is -unless the alert will be quick enough to allow you to manually intervene and stop the inflow of noise.

It is a good idea to isolate the noise from the "good data." This will save disk space and keep your good indexes from being filled with "junk" that could potentially slow searches or distort results.

This is all you need to do to your transforms.conf stanzas to change them. Instead of sending the data to the nullQueue, the following stanza will send it to an index named NoiseIndex

[setnull1]
REGEX = noise message 1
DEST_KEY = _MetaData:Index
FORMAT = NoiseIndex

View solution in original post

lguinn2
Legend

In some ways, this is a pretty good plan.

I don't think it is at all necessary to clean the eventdata. This will take care of itself if you size the index relatively small.

Since you will be indexing all the noisy data - this plan is NOT going to save you any of your Splunk license. That is -unless the alert will be quick enough to allow you to manually intervene and stop the inflow of noise.

It is a good idea to isolate the noise from the "good data." This will save disk space and keep your good indexes from being filled with "junk" that could potentially slow searches or distort results.

This is all you need to do to your transforms.conf stanzas to change them. Instead of sending the data to the nullQueue, the following stanza will send it to an index named NoiseIndex

[setnull1]
REGEX = noise message 1
DEST_KEY = _MetaData:Index
FORMAT = NoiseIndex

teedilo
Path Finder

Thanks for confirming my suspicions. Unfortunate. One way to accomplish this might be to delete the noise index in a script using the REST interface that runs from a saved search that detects the noise, but I probably won't bother with anything that elaborate. As workarounds I might minimize the noise messages using SEDCMD to reduce their indexing footprint, or send the messages to null on half of the indexers. Thanks again.

0 Karma

lguinn2
Legend

There is no way to stop the indexing.

Either you are indexing the input or you aren't. There is no concept of "filling an index" - when an index becomes full, the oldest data is eliminated to make room for the new data - and the new data still counts against your license.

As you noted, filling the volume will stop indexing to all indexes, not just the noise index.

0 Karma

teedilo
Path Finder

Thanks for the reply. I had considered the advantages of isolating the noise in a separate index to keep the main indexes from being filled with this noise. However, my main goal was to stop indexing automatically by configuring the "noise index" to be small enough such that indexing to that index would stop when the index was "full". Revisiting the documentation, I'm wondering whether this is readily feasible. I see that minFreeSpace (in server.conf) can be used to stop indexing when volume space drops below some minimum, but I can't use that if that will affect all indexes. Thoughts?

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...