Solved: Cache indexes in memory?

smileyge · ‎09-26-2013

I have an odd one that I imagine most folks here would not want to do. I am using Splunk as an analysis tool, not an alerting tool, dashboard, etc. I load data, run a bunch of queries and generally I'm then done with it and might even delete the index. My question is every time I run a query, it appears to go off to load stuff in memory, give me the result, and then releases the memory. Repeat.

The nature of what I'm doing is lots of searches in series one right after another. Is there a way to tell Splunk to keep more (all?) of an index and/or lookup table in memory or to be much more aggressive with it's caching strategy so search #2 doesn't take the same time as #1? I'm not at all worried about system resources and would be quite happy if Splunk consumed everything there is. I have a dedicated box for this.

Ayn · ‎09-26-2013

There's nothing that I know of that can make Splunk behave like this, other than that hot buckets in Splunk are partially kept in memory for performance purposes. However one thing I'm thinking is - have you looked into post processing? The idea is to run one query that retrieves and aggregates results, then depending on how you want to slice and analyze these results you feed them into post processing queries. http://docs.splunk.com/Documentation/Splunk/5.0/AdvancedDev/PostProcess

You could possibly also use cached data from saved searches. Basically you can specify for how long results from a saved search should be retained. You can then grab these results from the search instead of having to issue the original search all over again.

Finally you could look into using summary indexing - run the original search, write the result set to a separate index that you then perform all your operations on. http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Usesummaryindexing

View solution in original post

smileyge · ‎09-26-2013

This is great info, Thanks Ayn, but it doesn't quite solve it 100% as I see the post processing has the limit of 10k rows (unconfigurable). It's also more work to save the search etc. Close, other ideas welcome!

Ayn · ‎09-26-2013

There's nothing that I know of that can make Splunk behave like this, other than that hot buckets in Splunk are partially kept in memory for performance purposes. However one thing I'm thinking is - have you looked into post processing? The idea is to run one query that retrieves and aggregates results, then depending on how you want to slice and analyze these results you feed them into post processing queries. http://docs.splunk.com/Documentation/Splunk/5.0/AdvancedDev/PostProcess

You could possibly also use cached data from saved searches. Basically you can specify for how long results from a saved search should be retained. You can then grab these results from the search instead of having to issue the original search all over again.

Finally you could look into using summary indexing - run the original search, write the result set to a separate index that you then perform all your operations on. http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Usesummaryindexing

kristian_kolb · ‎09-26-2013

RAM disk?

/K

dwaddle · ‎09-26-2013

Splunk counts on the Operating System's filesystem cache for this purpose. Unlike an RDBMS it does not have a dedicated "buffer pool" or "SGA" or similar. Splunk's data storage is all "just plain files" and these files are not opened with O_DIRECT or anything like that to impede caching. The more memory your indexers and search heads have, the more caching the OS does on your behalf. On Linux this can be somewhat tuned using the 'swappiness' kernel setting, but the default swappiness is usually reasonable.

Cache indexes in memory?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms