Is there a mechanism to protect a splunk server from hitting license limits by creating some sort of rate limiter?
we occassionally have a system that encounters an error, gets stuck in a loop, and ends up spewing out events at up to 25000 per minute. I would like to create a rule to prevent occurrences like this from preventing indexing of critical events.
e.g. if 1 host/sourcetype combination starts logging more than 1,000 per minute, then stop indexing data from that host+sourcetype
In the event of this happening, how can we remove these logs from the index and reset license? It's data that we don't need and don't want contributing towards our license limits.
thanks
AFAIK that is not possible through some configuration parameter specifically designed for this purpose.
However, there might be some ways to get near what you're after.
1) Use a forwarder of the occasionally misbehaving host, and set a cap on the network bandwidth it can utilize. It will require some tuning to get to the right number, so that you don't inadvertently throttle your known good traffic. See http://splunk-base.splunk.com/answers/29538/maxkbps-option-and-limiting-a-forwarders-rate-of-thruput
2) Set up a scheduled search - say every 5 min - for number of events and alert if there are above a certain threshold.
index=myindex host=myhost sourcetype=mysourcetype earliest=-5m | stats c
3) As for deleting, hopefully you can identify the messages based on content and/or timestamp, and use the delete
command.
With a little luck and timely investigation of the alerts generated, you'll be able to stay under your license limit.
ALTERNATIVE:
If the unwanted error messages contain a unique signature that does not occur otherwise, you can set up a nullQueue
for those on the indexer. This will not work in conjunction with the steps outlined above, as events sent to the nullQueue
are not indexed, and there would thus be little point in throttling the forwarder (remember, the reason for throttling was to avoid license violations).
If you have a heavy forwarder, you will have to do the nullQueue
ing there instead.
http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings
Hope this helps,
Kristian
AFAIK that is not possible through some configuration parameter specifically designed for this purpose.
However, there might be some ways to get near what you're after.
1) Use a forwarder of the occasionally misbehaving host, and set a cap on the network bandwidth it can utilize. It will require some tuning to get to the right number, so that you don't inadvertently throttle your known good traffic. See http://splunk-base.splunk.com/answers/29538/maxkbps-option-and-limiting-a-forwarders-rate-of-thruput
2) Set up a scheduled search - say every 5 min - for number of events and alert if there are above a certain threshold.
index=myindex host=myhost sourcetype=mysourcetype earliest=-5m | stats c
3) As for deleting, hopefully you can identify the messages based on content and/or timestamp, and use the delete
command.
With a little luck and timely investigation of the alerts generated, you'll be able to stay under your license limit.
ALTERNATIVE:
If the unwanted error messages contain a unique signature that does not occur otherwise, you can set up a nullQueue
for those on the indexer. This will not work in conjunction with the steps outlined above, as events sent to the nullQueue
are not indexed, and there would thus be little point in throttling the forwarder (remember, the reason for throttling was to avoid license violations).
If you have a heavy forwarder, you will have to do the nullQueue
ing there instead.
http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings
Hope this helps,
Kristian
Well, if you successfully set up a nullQueue
for the offending events, they will be discarded before being indexed. This defeats the purpose of having a scheduled search for alerting you that the application is misbehaving. At least based on the sheer number of events.
However, as an alerting mechanism for detecting that the application is acting up, you could possibly set up a scheduled search checking for the presence of known good events (assuming that these are not generated when the application goes into loop mode). Note that this need not be the most efficient way to detect this.
and the threshold search with notice is a quick win - easy to implement. mark
Thanks. I might be able to get something to work with nullQueue.
Rate limiting is definitely not an approach that we would consider, it doesn't seem like a good approach.
Deleting: deleting won't work, as the data that has come in will be contributed towards daily index volume, regardless of whether it is retained or not (http://splunk-base.splunk.com/answers/27390/how-to-remove-some-indexed-data) - see the comment about "it doesn't matter if you delete the data".
we need an approach that bypasses the data BEFORE it is indexed.
see update above