Reporting

Big Data Analysis with limited ressources

fu8R5juiNP64AKI
Explorer

Hi,

we have multiple dashboards with about 25 searches each. Each search searches about 600 GB of raw data.
The dashboards should always (and only) display the data of the day before between 1pm and 9pm.

Since those dashboards unsurprisingly took forever to be displayed, I got charged with their acceleration.

At the beginning I thought this was a task, splunk was made for, but I ran into some issues.

Each dashboard accesses similar data, so it should not be a problem to make a summary index out of this data. Unluckily some of the searches need a value in ms, so summarizing via

| (si)stats count by url, cache_hit,
decision, req_runtime

still creates 50% of the data. I guess a summary index out of this data would still be about 200 GB big, which is way too much for fast searches. Not taking into account, that it would put quit some load on the indexers as well.
I could additionally create summaryIndizes for each search, but this seems insane to me because it would mean to have about 75 scheduled searches to run every minute.
The final dashboard could then probably be simply accelerated.

What is the best solution to such high-demanding usecases? Do I really need to create 75 scheduled searches?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The best approach for your use cases depend on too many details to give a proper answer here. Hence the best thing would be to get on-location professional services... depending on said location many here, including me or even Splunk itself, can offer that.

That being said, here's a thought based on the search snippet you posted. You're counting events split by four fields, so you could create a summary index that contains pre-summarized counts for a slice of time.
For example, if the smallest resolution you need is ten minutes, create a summarizing search scheduled every ten minutes that searches through -15m@m to -5m@m, saving four numbers into the summary index. Then, the searches in the dashboard only access these numbers and calculate fast sums.
Each event in the raw data is only searched through once by the scheduled search, giving you the lowest possible load on the raw data. Any analysis on the pre-computed counts will be blazingly quick.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...