Splunk Search

Summary indexing question

sranga
Path Finder

Hi

We have a scheduled-search that does summary indexing. For some reason, it doesn't capture all of the data that matches the specified criteria. I am not sure if its hitting a limit. Is there a way to check?

If I run the associated aggregate function command in a regular search dashboard, I can see the desired results.

The scheduled search runs every 30 minutes and is of the form:

index=blah | ... | sistats count, avg(field5) by field1, field2, field3  

Each of the fields could have upwards of 300 values. Is there a better way to do this? Can a bucket be used to summarize the results (given that the fields are non-numeric values).

Any help is appreciated.

Ranga

0 Karma
1 Solution

Lowell
Super Champion

First, what's the time range of your summary indexing saved search? You mentioned that it's running every 30 minutes, but are you searching from -30m to now? If so, I think the simple fix is to do something like this:

earliest=-35m@m
latest=-5m@m

This will give you a 5 minute window for your logs to get picked up by splunk and indexed. You may be able to get away with a shorter window, you can play around with that. Generally 5 minutes is enough to cover even a splunk restart and possibly a quick reboot.

In terms of how many events, yes there currently a limit of 10,000 events per this post. Take a look at how many results you get after the sistats command? Some stats functions (like distinct_count) can generate a massive amount of events if you have a field has a large number of distinct values. Sometimes the number of summarized events is greater than the number of input events. If you find this to be the case, then you are not gaining any performance improvements using summary indexing and you should review your indexing approach.

View solution in original post

Lowell
Super Champion

First, what's the time range of your summary indexing saved search? You mentioned that it's running every 30 minutes, but are you searching from -30m to now? If so, I think the simple fix is to do something like this:

earliest=-35m@m
latest=-5m@m

This will give you a 5 minute window for your logs to get picked up by splunk and indexed. You may be able to get away with a shorter window, you can play around with that. Generally 5 minutes is enough to cover even a splunk restart and possibly a quick reboot.

In terms of how many events, yes there currently a limit of 10,000 events per this post. Take a look at how many results you get after the sistats command? Some stats functions (like distinct_count) can generate a massive amount of events if you have a field has a large number of distinct values. Sometimes the number of summarized events is greater than the number of input events. If you find this to be the case, then you are not gaining any performance improvements using summary indexing and you should review your indexing approach.

Lowell
Super Champion

Out of curiosity, how may events are you summarizing? If you are creating adding over 10,000 summary events each 30 minutes, you must be dealing with a pretty high volume of events.. Have you confirmed that you are actually generating more than 10,000 events in your search and that it's not all fluffed up because of using sistats?

0 Karma

Lowell
Super Champion

Out of curiosity, how may events are you summarizing? If you are creating adding over 10,000 summary events each 30 minutes, you must be dealing with a pretty high volume of events...

0 Karma

Lowell
Super Champion

Hmm. You could try: ... | collect addtime=true index=my_summary name="name of your saved search" I don't know if that will make a difference or not. To be honest the summary index commands are confusing. There's collect, stash pycollect, pystash, summaryindex, sumindex, .... Some of them are python implentations and other are not.

0 Karma

sranga
Path Finder

I tried using the collect command but the data doesn't get stored in the summary index. This is how I have it: index=blah | ... | sistats count, avg(field5) by field1, field2, field3 | collect index=my_summary

0 Karma

sranga
Path Finder

Thanks Lowell. Yes, the earliest time I have is -30m & latest is +0s. But, I believe we are hitting the 10K limit. I shall try using the collect command as mentioned in the other post.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...