Optimization of a 6minute+ search while using inli...

wrussell12 · ‎08-08-2019

I have a search that I have almost optimized, the current search is 6+ minutes, and I have it down to about 16 seconds.
Inspecting the job is showing that most of the job is due to the join, which is needed for the spark line.

The idea is that we can monitor bandwidth for 2 days, and then flag if any outliers are found. If an outlier is found, I need to show a sparkline within a field for the last 4 hours.

This search takes about 16 seconds, I wish I could make it faster, but I cannot figure out how to make the sparkline work inside the tstats.

index="cables" source="bandwidth" earliest=-4h latest=now() | stats sparkline(avg(bandwidthMbps)) as "4HR" by cableId
| join cableId [| tstats count from datamodel=prod.cables by cables.bandwidth, cables.name,  cable.Id, _time span=1m 
| rename cable.id as cableId, cable.bandwidth as bandwidth, cable.name as name

| lookup allcables.csv cableId output zipcode,active | where active="true" | search zipcode!=*90210* 
| eventstats avg(bandwidth) as avg, stdev(bandwidth) as stdev by cableId | eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2)) | eval isOutlier=if(bandwidth < lowerBound OR bandwidth > upperBound, 1, 0) | sort - _time | dedup cableId | where isOutlier=1 | where _time > (now() - 360) ] 
| lookup network.csv ipName  | eval IP_lookup=coalesce(ip_name, alt_ip_name) | lookup Regions.csv range as IP_lookup output address | table cableId, avg, "4HR"

This is the original search

 index="cables" source="bandwidth" earliest=-4h latest=now() | stats sparkline(avg(bandwidthMbps)) as "4HR" by cableId
| join cableId [| tstats latest(cables.bandwidth) as bandwidth, cables.name as name from datamodel=prod.cables by cables.id,  _time span=1m 
    | rename cable.id as cableId
    | lookup allcables.csv cableId output zipcode,active | where active="true" | search zipcode!=*90210* 
    | eventstats avg(bandwidth) as avg, stdev(bandwidth) as stdev by cableId | eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2)) | eval isOutlier=if(bandwidth < lowerBound OR bandwidth > upperBound, 1, 0) | sort - _time | dedup cableId | where isOutlier=1 | where _time > (now() - 360) ] 
    | lookup network.csv ipName  | eval IP_lookup=coalesce(ip_name, alt_ip_name) | lookup Regions.csv range as IP_lookup output address | table cableId, avg, "4HR"

Differences:
1. the faster search uses count, instead of latest.
2. I tried using avg(bandwidth) within the tstats, but I was getting different results from the original search.

My two questions:
1. why is tstats count so much faster when I list everything I want within the by clause.
2. how can I get correct sparkline within the tstats without needing to do a double search. The tstats search is a range of 48hours, but the sparkline is the 4 hours.

I have a lot of searches like this, that take 6 minutes, and would love to optimize it all.

Kawtar · ‎08-15-2019

Hello,

You can replace join by append, but you should use stats by .

martin_mueller · ‎08-15-2019

In order to see differences in search speeds and their reasons, do run both variants and post the performance graph from the top section of the job inspector.

Pay attention to field naming in the tstats searches, I see cables.id and cable.Id - it's likely that only one of those two is correct.

As for sparklines in tstats, these two searches do the same:

index=_internal | stats sparkline(count)

| tstats count where index=_internal by _time span=1m | stats sparkline(sum(count))

The tstats loads the underlying data, a count by minute. The stats formats the data as a sparkline.

Optimization of a 6minute+ search while using inline sparkline with tstats

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...