When I ran a search spanning an entire year it took 241 seconds. If I immediately rerun the search the time plummets to ~60 seconds. Why? Is this a Splunk or Disk optimization?
Background:
hot/warm sit on fast disk.
coldlib resides on not as fast, bigger disk.
Regardless of the search I run, when the data is polled the first time it's always a slower reply. When the I rerun the same exact search over the same exact disks the times drop considerably. Who's responsible? (who can I thank?) Splunk or Disks....and is it that easy, or is it more complex? I understand that searching back onto colddb disks will require a slower retrieval vs. warm/hotdb. The question is more of a lower level, backend one. But one I want to share with my user base when I advise them how to tune their searches and what will happen when they rerun the search.
I've looked through a lot of the Answers and on Splunk's site but can't really find the answer. This group is outstanding, so I'm leaning on you. Any insight is appreciated.
pstein
Splunk caches search results for a set period of time (configurable by the admin)
So if you run the exact same search while the data is still cached - the search will return results much faster the second time
But the search must truly be identical
It's not exactly for a set period of time, it's related to your user (and/or role) disk quota - https://docs.splunk.com/Documentation/Splunk/8.0.1/Admin/authorizeconf (though search results expiration time does factor in)
For ad-hoc searches, you need to thank the OS and its disk cache. There is no caching in Splunk that makes those reruns faster - and that's a good thing, otherwise you'd get old results.
For completeness' sake, there is a caching mechanism in dashboards, search docs.splunk.com/Documentation/Splunk/8.0.1/Viz/PanelreferenceforSimplifiedXML for "cache" to find out more.
I downvoted this post because your os isn't going to magically cache things - especially in a clustered environment
you user/role disk quota/cache is what's factoring-in here
Repeated downvotes don't change facts... it's not magic either: https://en.wikipedia.org/wiki/Page_cache
You insisting on something that's irrelevant doesn't help your case
The OS doesn't magically cache extra data just because you wish it would
You can wish that all you want - doesn't change reality
What extra data?
The first search reads a bucket off disk, the cache keeps those files in memory.
The second search reads the same bucket again, cache serves files from memory much faster.
@martin_mueller
This also helps complete my question. Your link to the dashboard caching mechanism will help others as well. You too get an 'Accept', but I can only give out one. But in my heart you deserve one.
For even more insight, post screenshots of the top section of the job inspector for the first slow run and another for the second fast run.
Unfortunately, I'm unable to upload an image from my desktop to share.
It's ok. With the answers, I understand.
Your OS probably isn't caching that much - unless you happen to run identical searches frequently: ie ones with the same static time settings, or ones from summary indices
The OS caches a ton of things: Your entire disks, if you have the memory for it.
I downvoted this post because your os won't cache the whole disk if it has the memory - oses utilize some memory cache, but they don't magically cache things that haven't been asked for (and won't keep around things that haven't been accessed in a long while)
Splunk caches search results for a set period of time (configurable by the admin)
So if you run the exact same search while the data is still cached - the search will return results much faster the second time
But the search must truly be identical
It's not exactly for a set period of time, it's related to your user (and/or role) disk quota - https://docs.splunk.com/Documentation/Splunk/8.0.1/Admin/authorizeconf (though search results expiration time does factor in)
I'm pretty sure this is not the case. You're probably seeing the effect of the OS caching tsidx files. Or, if you're on Splunk Cloud, you're seeing the effect of buckets in s3 getting localized to the search peers on the first search, so the second search doesn't need to copy buckets from s3.
I got clarification from engineering:
If your searches are EXACTLY the same. Including time range (meaning that if you do not have exact time range this is super rare.) Then it will reuse.
Basically if you run a search wait 1 second then run it again we do not reuse.
If you have normal time ranges from like latest=now earliest=-5m it will not reuse but if instead you were doing. earliest=-2d@d latest=-1d@d then it can end up reusing.
My testing disagrees, when running index=_internal earliest=-w@d latest=@d | stats count by component
twice (same search, same timerange, no time window creep happening) I still see significant time consumed. If reuse happened, it should finish in 0.little seconds.
Using the exact same time range: earliest=1/14/2019:00:00:00 latest=01/13/2020:24:00:00
From your note, I would be seeing the effect of the OS caching tsidx files from the onPRem cluster.
Thanks, @wmyersas
That's what I was looking to confirm. Appreciate your guidance and sharing the link for others to read up on.
Can you point at docs for configuring that?
Check the disk-quota settings in Authorize.conf - https://docs.splunk.com/Documentation/Splunk/8.0.1/Admin/authorizeconf
The disk quota stores search artifacts (= results), but those are not (re)used for future searches.