Our co. has been gathering auditd logs since last summer now our Splunk infrastructure is getting very fat on the indexed auditd data. I cant delete this data either since we require it for audits.
The solution i was coming up with was to start summarizing the _raw data from some other previous
examples i've seen
index=audit | dedup _raw | rename _raw as orig_raw
Then verifying the summarized results vs indexed results and expiring data off colddb sooner then it is now.
Is there a better solution out there? the main goal is to reduce index disk usage.
Splunk is already compressing the raw data. If your main goal is to reduce disk usage, then my first question is: must the data be always searchable? Or is it simply a requirement that the data must be retrievable if needed?
If you specify a cold-to-frozen directory and a shorter lifetime, Splunk will move "expired" buckets into the frozen directory. In the frozen directory, the buckets will be approximately 30% of their former size - because most of the index info is stripped away. Most folks then store the frozen buckets offline, but you don't have to.
However, frozen buckets are not searchable; you have to rebuild a bucket rebuild to use its contents. But if the data is very rarely searched and really just kept for compliance, this could be a good solution.
I don't think that dedup
is going to help you unless you truly have exact duplicates of a lot of your data.
Splunk is already compressing the raw data. If your main goal is to reduce disk usage, then my first question is: must the data be always searchable? Or is it simply a requirement that the data must be retrievable if needed?
If you specify a cold-to-frozen directory and a shorter lifetime, Splunk will move "expired" buckets into the frozen directory. In the frozen directory, the buckets will be approximately 30% of their former size - because most of the index info is stripped away. Most folks then store the frozen buckets offline, but you don't have to.
However, frozen buckets are not searchable; you have to rebuild a bucket rebuild to use its contents. But if the data is very rarely searched and really just kept for compliance, this could be a good solution.
I don't think that dedup
is going to help you unless you truly have exact duplicates of a lot of your data.
You shouldn't need the coldToFrozenScript. Just make sure that the "Frozen archive path" is set to a real directory. Splunk will automatically strip off everything it can when it puts the compressed data into that directory.
In indexes.conf
the frozen archive path is set like this:
coldToFrozenDir = <path to frozen archive>
Note that the path cannot contain a volume reference.
The data does not need to be searchable, retrievable upon request would work for us.
I've always used Cold to frozen as our delete mechanism, I suppose i'll have to use the coldToFrozenScript.
Is the default $SPLUNK_HOME/bin/coldToFrozenExample.py the script that will convert buckets to 30% of their normal size?