Monitoring Splunk

Summary index time grouping for performance

joebensimo
Path Finder

Is there a significant performance difference in searching summary index aggregate results (result of stats command) grouped by hour (with all summary index events in the 1st second of the hour) or spread out throughout the hour?

In other words, is there likely to be a significant performance difference in performing searches on a summary index created with

| stats
sum(a) as a
sum(b) as b
by date_hour x y z

(which puts all summary index rows/events at the start of each hour) or with

| stats
first(_time) as _time
sum(a) as a
sum(b) as b
by date_hour x y z

(which spreads summary index rows/events out across each hour)???

And in case I didn't make it clear above, I am concerned about the performance of searching the summary index; not generating it.

0 Karma

emotz
Splunk Employee
Splunk Employee

For using summary indexing, and the search that populates it in general, you should use sistats and not stats. The summary index will handle the time, so you don't need to group by date_hour and you don't need the first(_time) either.

You could also just use report acceleration in Splunk 5.x to make this a whole lot simpler too. Create your search, run it, save it, schedule it and click on the accelerate button and everything will be done for you in the background. Then you can run your same search as normal over longer periods of time and get the answer quickly.

0 Karma

joebensimo
Path Finder

Summary index does not handle the time as I want. It aggregates by whatever time I tell it to -- or by the entire range of the summary-index-generating query.

I group by time because I often need/want to group results by time periods shorter than the period over which the summary index query runs. Eg, I have summary index generating queries that run daily and generate aggregated data by hour.

Report acceleration doesn't work for my queries due to the calculations that are in the building of the summary indexes.

0 Karma

joebensimo
Path Finder

sistats limits what I can do with the summary index data when I query it. sistats doesn't give me the flexibility I need. Therefore, I use stats.

The main difference between sistats and stats is that sistats keeps/saves the minimum data needed to generate the specific statistics specified, while stats saves the results of each specified statistic.

0 Karma

joebensimo
Path Finder

Yes, I do use job inspector to learn about my query performance.

And the real test will be for me to try these two variations with my data and environment and see if there is a difference in performance.

I was hoping someone else might have tried this or have some theoretical explanation as to why one might be faster than the other (or explain why it will make no difference).

0 Karma

rturk
Builder

This isn't an answer per se, but have you tried using the job inspector to determine the efficiency of your searches?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...