Re: SPL optimization for timechart

santosh_sshanbh · ‎11-06-2020

I have a dashboard to show disk read/write data for a server on a area chart. I have wrote below SPL for the same

host="Server1" index="performance_data" instance=*: source="PerfmonMk:LogicalDisk" sourcetype="PerfmonMk:LogicalDisk"
| eval instance = substr(instance, 1, len(instance)-1)
| eval Host_Instance = 'host'."-".'instance'
| timechart eval(round(avg('Avg._Disk_Queue_Length'),2)) AS "Avg. Disk Queue Length" BY Host_Instance limit=0

When I run this SPL for a weeks time and I have the disk data collected at every 30 s interval, the dashboard takes 10-15 mins to load.

My Splunk instance is in Splunk managed cloud. Still it loads very slow. Is there any issue with the SPL or I have to use some optimization technique here to improve performance?

alonsocaio · ‎11-06-2020

HI @santosh_sshanbh,

Looking at your initial search I would suggest the following to improve performance:

1 - Using wildcards as prefix is not that efficient (instance=*:). As exposed in the docs "the search must look at every string to determine if the end of the string matches what you specify after the asterisk" (https://docs.splunk.com/Documentation/SCS/current/Search/Wildcards#Avoid_using_wildcards_as_prefixes)

2 - I suggest you to use the "fields" command, so you can restrict your search to use only the fields you need on your results.

| fields _time, host, instance, Avg._Disk_Queue_Length

santosh_sshanbh · ‎11-06-2020

Thanks alonsocaio for your response.

I have changed SPL as per your comments as below

host="Server1" index="performance_data" instance IN ("C:","D:","E:","F:","G:","H:") source="PerfmonMk:LogicalDisk" sourcetype="PerfmonMk:LogicalDisk"
| fields _time, host, instance, Avg._Disk_Queue_Length
| eval instance = substr(instance, 1, len(instance)-1)
| eval Host_Instance = 'host'."-".'instance'
| timechart eval(round(avg('Avg._Disk_Queue_Length'),2)) AS "Avg. Disk Queue Length" BY Host_Instance limit=0

But still the SPL took around 10 mins to load and the chart keeps on dancing while it loads. Even if I just fetch the events using base search and without any further command after |, still it takes significant amount of time.

Job takes "This search has completed and has returned 7 results by scanning 9,184,627 events "

Is this normal for these many count of events? The index has data for various sourcetypes and its size is 700+ GB. Could there be any problem with index or need additional IX in the cluster. Currently there are 4 indexers in cloud deployment which is managed by Splunk.

SPL optimization for timechart

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases