Splunk Search

How to improve Splunk search performance?

guillecasco
Path Finder

I have a search like this:

index= foo  earliest=-3d |rex field=summary "(?{.*)" | spath input=json_data |stats count by Version  | search Version < 30401942 |sort -Version. 

it reads about 2.5 million events approximately, but it takes like 25 seconds to finish. Is this a normal time response for that amount of logs? is there any configuration in Splunk that i should check to improve performance? I'm quite new with Splunk.

thank you!

Tags (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi guillecasco,
the first check you have to do is to verify throughput of your disks using tools like Bonnie++ and verify if it's higher or less that the required 800 iops.
Bye.
Giuseppe

View solution in original post

preactivity
Path Finder

You can improve the performance by 10 X times by using Splunk meta data fields. I can help you in that please contact me in fiverr or Email (hurdlej1@gmail.com)

https://www.fiverr.com/s2/affc9b7a8a
https://www.fiverr.com/s2/608e8ed73f?utm_source=CopyLink_Mobile

0 Karma

diogofgm
SplunkTrust
SplunkTrust

If you want to help just post here an answer so every splunk user looking for something similar will be able to find it.

------------
Hope I was able to help you. If so, some karma would be appreciated.

DalJeanis
SplunkTrust
SplunkTrust

In splunk, get rid of everything you don't need at the earliest possible time. In this case, I believe the only field you need out of the events is summary.

Therefore, you can add "| fields summary" as your first command after the initial search, and the search will speed up quite a bit.

Also, richgalloway's suggestion to eliminate version values you don't care about before the stats command is a good one.

Here is a recode you can try, and after that, there is a a description of my assumptions.

 index= foo  earliest=-3d summary=*
| fields summary
| rex field=summary "(?<json_data>{.*)" 
| spath input=json_data 
| fields Version
| search Version < 30401942 
| stats count by Version  
| sort 0 -Version. 

The above recode should be significantly faster, based upon this interpretation of your original code -

  • You search the last three days of your foo index.
  • You then extract the value of json_data from the field "summary" with the regex. (I assume the answers site web interface has deleted the field name in angle brackets from the regular expression. Figuring that out was what took the longest part of my time on this response.)
  • You then extract the version field from the json_data.

NOTE - You can also speed it up a bit more if you know the exact path to the version data you are looking at, rather than having spath extract all information from the json when you only need the version.


edited to use sort 0 rather than sort in case more than 100 Version values were returned.
updated version to Version in the fields command.

gcusello
SplunkTrust
SplunkTrust

Hi guillecasco,
the first check you have to do is to verify throughput of your disks using tools like Bonnie++ and verify if it's higher or less that the required 800 iops.
Bye.
Giuseppe

richgalloway
SplunkTrust
SplunkTrust

I can offer some generic suggestions for improving performance.

Make your base search as specific as possible. Include everything you know about the events you want.
Narrow the time window as much as you can. Do you really need 3 days of data?
Consider moving the "search Version < 30401942" command to the left. This should reduce the number of events that are read/processed.

Examine the job inspector for insight into where time is being spent on your search.
Consider spreading your data across more indexers.

---
If this reply helps you, Karma would be appreciated.

rroberts
Splunk Employee
Splunk Employee

Also, perhaps we can improve on the field extraction regex? Instead of using .* what exactly are you trying to extract? Word characters only? Digits?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...