Monitoring Splunk

Cache indexes in memory?

smileyge
Path Finder

I have an odd one that I imagine most folks here would not want to do. I am using Splunk as an analysis tool, not an alerting tool, dashboard, etc. I load data, run a bunch of queries and generally I'm then done with it and might even delete the index. My question is every time I run a query, it appears to go off to load stuff in memory, give me the result, and then releases the memory. Repeat.

The nature of what I'm doing is lots of searches in series one right after another. Is there a way to tell Splunk to keep more (all?) of an index and/or lookup table in memory or to be much more aggressive with it's caching strategy so search #2 doesn't take the same time as #1? I'm not at all worried about system resources and would be quite happy if Splunk consumed everything there is. I have a dedicated box for this.

Tags (3)
1 Solution

Ayn
Legend

There's nothing that I know of that can make Splunk behave like this, other than that hot buckets in Splunk are partially kept in memory for performance purposes. However one thing I'm thinking is - have you looked into post processing? The idea is to run one query that retrieves and aggregates results, then depending on how you want to slice and analyze these results you feed them into post processing queries. http://docs.splunk.com/Documentation/Splunk/5.0/AdvancedDev/PostProcess

You could possibly also use cached data from saved searches. Basically you can specify for how long results from a saved search should be retained. You can then grab these results from the search instead of having to issue the original search all over again.

Finally you could look into using summary indexing - run the original search, write the result set to a separate index that you then perform all your operations on. http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Usesummaryindexing

View solution in original post

smileyge
Path Finder

This is great info, Thanks Ayn, but it doesn't quite solve it 100% as I see the post processing has the limit of 10k rows (unconfigurable). It's also more work to save the search etc. Close, other ideas welcome!

0 Karma

Ayn
Legend

There's nothing that I know of that can make Splunk behave like this, other than that hot buckets in Splunk are partially kept in memory for performance purposes. However one thing I'm thinking is - have you looked into post processing? The idea is to run one query that retrieves and aggregates results, then depending on how you want to slice and analyze these results you feed them into post processing queries. http://docs.splunk.com/Documentation/Splunk/5.0/AdvancedDev/PostProcess

You could possibly also use cached data from saved searches. Basically you can specify for how long results from a saved search should be retained. You can then grab these results from the search instead of having to issue the original search all over again.

Finally you could look into using summary indexing - run the original search, write the result set to a separate index that you then perform all your operations on. http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Usesummaryindexing

kristian_kolb
Ultra Champion

RAM disk?

/K

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Splunk counts on the Operating System's filesystem cache for this purpose. Unlike an RDBMS it does not have a dedicated "buffer pool" or "SGA" or similar. Splunk's data storage is all "just plain files" and these files are not opened with O_DIRECT or anything like that to impede caching. The more memory your indexers and search heads have, the more caching the OS does on your behalf. On Linux this can be somewhat tuned using the 'swappiness' kernel setting, but the default swappiness is usually reasonable.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...