According to documentation, and generally speaking in action, hot buckets are named
hot_v1_<id>
... but I am noticing some of the hot directories labeled as
hot_quar_v1_<id>
What is the difference, and why?
The difference is that 'hot_v1_<id>
' is a normal hot bucket, where data is inserted based on timestamp. The 'hot_quar_v1_<id
>' is a quarantine bucket. These buckets are meant to catch data that is either older than specified in indexes.conf, or too far in the future than allowed by indexes.conf. The data is inserted into a quarantine bucket as a means of keeping the the index from being polluted by old and/or future data.
http://docs.splunk.com/Documentation/Splunk/latest/admin/indexesconf
quarantinePastSecs = <positive integer>
* Events with timestamp of quarantinePastSecs older than "now" will be
dropped into quarantine bucket.
* Defaults to 77760000 (900 days).
* This is a mechanism to prevent the main hot buckets from being polluted with
fringe events.
quarantineFutureSecs = <positive integer>
* Events with timestamp of quarantineFutureSecs newer than "now" will be
dropped into quarantine bucket.
* Defaults to 2592000 (30 days).
* This is a mechanism to prevent main hot buckets from being polluted with
fringe events.
The difference is that 'hot_v1_<id>
' is a normal hot bucket, where data is inserted based on timestamp. The 'hot_quar_v1_<id
>' is a quarantine bucket. These buckets are meant to catch data that is either older than specified in indexes.conf, or too far in the future than allowed by indexes.conf. The data is inserted into a quarantine bucket as a means of keeping the the index from being polluted by old and/or future data.
http://docs.splunk.com/Documentation/Splunk/latest/admin/indexesconf
quarantinePastSecs = <positive integer>
* Events with timestamp of quarantinePastSecs older than "now" will be
dropped into quarantine bucket.
* Defaults to 77760000 (900 days).
* This is a mechanism to prevent the main hot buckets from being polluted with
fringe events.
quarantineFutureSecs = <positive integer>
* Events with timestamp of quarantineFutureSecs newer than "now" will be
dropped into quarantine bucket.
* Defaults to 2592000 (30 days).
* This is a mechanism to prevent main hot buckets from being polluted with
fringe events.
I found cases where events older than quarantinePastSecs slipping into normal hot buckets. When is this possible ? Is this a bug and how to prevent this ?
That is expected behavior and in this case, the value needs to be set appropriately to include those far apart past timed events falling within quarantinePastSecs.
Adding to the comments,
Is there any configuration property that forces Splunk indexer to include results from quarantine buckets for search ?
I'm curious what the retention of events in quarantine buckets is. I can find back wrong-indexed events due to US/European timestamp settings. There appears to be only one quar bucket in an index if there's one. I understand you want to keep other buckets clean and tidy, but on the other hand we can't afford to miss events eventhough they're stored with a wrong timestamp in the past. Probably it'll have the same max settings as the hot buckets, but when restarting (and a bucket roll is often the case) the quar seems to exist with all events in it.
I figured as much, documentation just didn't clearly spell it out anywhere I could find. Any quick way to isolate that data? Obviously scanning for data outside of the accepted date range, but looking for something more straight forward.
According to the index metadata, my "latest" event is in the year 0468, which I'm having trouble turning into an actual date in Splunk...
earleist date is Dec 31, 1969 7:00:00 PM (00:00:00 01 Jan 1970 UTC)- easy enough to figure that one out...