Splunk Search

How do I improve the performance my dashboard load times for a large amount of data?

tkwaller
Builder

I have a form that uses a searchTemplate:

index=java earliest=$timerange.earliest$ latest=$timerange.latest$ app_name=API (location=bunchOfLocations) (operation="bunchOfOperations" ) NOT acceptLanguage="*q=*" | table _time, location, priority, status, respTime, operation, NumFound, applicationName, geoExpansion, q

I have a dashboard that uses a search as a base, and then specific panels on the dashboard use different variations along with the base search for example :

search NumFound=0 geoExpansion=true q=* | rex "q=(?<q>.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q

There are about 10 panels on this dashboard. The searchTemplate search returns about 12Mil results over a 24 hour span, so it takes a very long time to run.
I am trying to avoid using acceleration as it uses resources I am trying not to use.
How can I speed up the dashboard while being as minimally impactful as possible?

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Since almost all your examples use

search NumFound=0 q=*

You could use a post process search to lump that data into one search, then you'd have the other ones feed off it. Right now, you're retrieving very similar information, multiple times. This way there would be one main search. I'm not really sure if it would speed up the whole thing by that much, however your first search (in the comments) is pulling back all the data that the other ones need... so there is at least some speedup from that. There are a few limitations though:

  • If the base search is a non-transforming search, Splunk retains only the first 500,000 events returned. In this case, events in excess of this 500,000 limit are not processed by the post process search, resulting in incomplete data. Splunk recommends that you use a transforming search for the base search to avoid this problem.
  • If the post-processing operation takes too long, it can exceed Splunk Web client’s non-configurable timeout value of 30 seconds. This can result in a timeout due to an unresponsive splunkd daemon/service. This scenario typically happens when you use a non-transforming search as the base search. Splunk recommends that you use a transforming search for the base search to avoid this problem.

You could probably pack those rex commands into the search before the timechart, then do the search for applicationName on the main, giant timechart, since you're pulling that data in anyway with the first search.

Here is what I would do:

search NumFound=0 q=* 
| eval q =lower(q) 
| rex "q=(?<q>.+),radius" 
| timechart 
count 
count(eval('geoexpansion'="true")) as geoexpansion
count(eval(match(applicationName, "ios"))) as ios
count(eval(match(applicationName, "device")) as device
count(eval(match(applicationName, "android")) as android
by q

So timechart is a transforming command which allows us to get around the aforementioned limitations (hopefully). If the search above is your main global search, your post process searches can be really simple.

The first one,

search NumFound=0 q=* | eval q=lower(q) | timechart count as "Count of Null Results" by q

would simply be

| fields _time tee*

The second one,

search NumFound=0 geoExpansion=true q=* applicationName="*ios*" | rex "q=(?<q>.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q

would simply be

| fields _time ios*

and so on. @ mention me (@aljohnson_splunk) here in the comments if you need more help.

somesoni2
Revered Legend

It depends on the metrics that you're showing in the panels. You need to provide query of at least 4-5 panels to see if there is some pre aggregation that can be done on the search template itself.

0 Karma

tkwaller
Builder

Here are some searches for some of the other panels:

 1. search NumFound=0 q=* | eval q=lower(q) | timechart count as "Count of Null Results" by q
 2. search NumFound=0 geoExpansion=true q=* applicationName="*ios*" | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
 3. search NumFound=0 geoExpansion=true q=* applicationName="*device*"  | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
 4. search NumFound=0 geoExpansion=true q=* applicationName="*android*" | rex "q=(?&lt;q&gt;.+),radius" | eval q=lower(q) | timechart count as "Count of Null Results" by q
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...