Hey guys,
So I am looking at index'd time extraction as a possibly helping with my search time field extraction troubles. Any idea how I might measure this?
Background:
We process about ~1billion events a day in our Splunk instance. The first 4 characters of hostnames on our servers is our datacenterID. The field extraction is therefore running.. 10's of millions of times in any search.
1) This isn't going to change
2) We're using this field in hundreds of searches already
How would I know if this would help or not?
I am not an expert in this, but hopefully this will answer will pop to the front of the queue and someone who is can correct me if I'm wrong.
It seems like what you are describing would be a good thing - generally it's unrecommended to build index time extractions, but there are definitely times it's useful and good.
This will increase license. 4 characters each, a billion a day; that's 4 billion characters you will be adding to your license amount. Probably not a big issue in your environment because I'd guess the rest of the events are far larger.
That being said, I don't see too much downside to just trying it except effort and time. I'd find good "measurements" before trying, though, because you'll definitely want to measure the impact.
Another thought - are these all in the same index? how many data centers? Could you rework it to move each DC to a different index, then you'd change your "DC" part of your searches to index=dc04
or index=dc04 OR index=dc55