Splunk Search

How to make my search more efficient? Help to remove joins

katzr
Path Finder

My search is running pretty slow and I am looking to edit/remove the joins to make it run faster. It looks pretty messy and the reason I have weird things going on with my location information is because for any location that does not match my lookup- location, Country, and Region to be filled as Unknown. And unfortunately, I was getting a weird error with using a lookup table and setting a default value so I had to do it manually.

I am using two joins because I am looking over 3 different time periods. Are there any ways to improve the efficiency of this search and make it run quicker?

 index=example date_month=August date_year=2017 (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="*") (Country="*") (site="*")) |stats count as Tickets by contact_type | join overwrite=false contact_type [search index=example earliest="6/01/2017:00:00:00" latest="12/31/2017:24:00:00" (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="") (Country="") (site=""))| bucket _time span=1mon | stats count as Tickets by contact_type _time | stats avg(Tickets) as Baseline by contact_type | eval Baseline = round(Baseline,0)] | eval "Baseline Variance" = Tickets - Baseline| join overwrite=false contact_type [search index=example earliest=-3mon@mon (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="*") (Country="*") (site="*")) | bucket _time span=1mon | stats count as Tickets by contact_type _time | stats avg(Tickets) as Average by contact_type | eval Average = round(Average,0)] | eval "Average Variance" = Tickets - Average | table contact_type Tickets Baseline "Baseline Variance" Average "Average Variance" | addcoltotals | sort 0 Tickets
0 Karma

DalJeanis
Legend

On further review, I believe that various code changes have rendered much of the code irrelevant when the location values were defaulted to "UNKNOWN". The search that followed that section previously had the purpose of eliminating locations that were not in the file. Overriding the nulls with the value "UNKNOWN" prevents those records from being eliminated, and the fields are not otherwise used in the code, so the whole lookup is redundant.

This should accomplish the same effect more efficiently.

| join overwrite=false contact_type 
    [search index=example earliest=-6mon@mon latest=now (assignment_group="*") 
    | fields contact_type    ... whatever else you absolutely need...
    | eval _time = relative _time(_time,"@mon") 
    | eval BaselineFlag = case(...test the date for if this event is in baseline...., 1)
    | eval AverageFlag = case(...test the date for if this event is in average...., 1)
    | rename COMMENT as "The above commands are streaming and distributable, so should be above the dedup unless you have LOTS of dups." 

    | rename COMMENT as "By using dc instead of count, this stats eliminates the need for dedup." 
    | stats dc(eval(case(BaselineFlag=1,number))) as BaselineTickets 
            dc(eval(case(AverageFlag=1,number))) as AverageTickets 
            by contact_type _time 

    | stats avg(BaselineTickets) as Baseline avg(AverageTickets) as Average by contact_type 
    | eval Baseline = round(Baseline,0)  
    | eval Average = round(Average,0)]

At the very least, you can collapse the two joins into a single one that looks something like this...

| join overwrite=false contact_type 
    [search index=example earliest=-6mon@mon latest=now (assignment_group="*") 
    | dedup number 
    | fillnull value="UNKNOWN" location 
    | eval regionblank= "UNKNOWN" 
    | eval countryblank= "UNKNOWN" 
    | eval locationblank="UNKNOWN" 
    | lookup CurrentSiteInfo.csv location
    | eval site=coalesce(location2,locationblank) 
    | eval Region=coalesce(Region,regionblank)
    | eval Country=coalesce(Country,countryblank) 
    | bucket _time span=1mon 

    | eval BaselineFlag = case(test date for if this event is in baseline, 1)
    | eval AverageFlag = case(test date for if this event is in average, 1)
    | stats count(BaselineFlag) as BaselineTickets count(AverageFlag) as AverageTickets by contact_type _time 

    | stats avg(BaselineTickets) as Baseline avg(AverageTickets) as Average by contact_type 
    | eval Baseline = round(Baseline,0)  
    | eval Average = round(Average,0)]
| eval "Baseline Variance" = Tickets - Baseline
| eval "Average Variance" = Tickets - Average 
0 Karma

katzr
Path Finder

| eval BaselineFlag = case(...test the date for if this event is in baseline...., 1)
| eval AverageFlag = case(...test the date for if this event is in average...., 1)

I can't figure out the syntax for testing the date. For my timeline it is a static date of between 6/01/2017:00:00:00 and 12/31/2017:24:00:00. My average is if it is within the past 3 months.

0 Karma

DalJeanis
Legend

@katzr -

 | eventstats max(_time) as maxtime
 | eval BaselineFlag=case(_time>=strptime("2017-06-01","%Y-%m-%d") AND 
        _time<strptime("2018-01-01","%Y-%m-%d"),1)
 | eval AverageFlag = case(_time>=relative_time(maxtime,"-2mon@mon"),1)
0 Karma

katzr
Path Finder

Hi @DalJeanis thank you for all of your help. However, using this search- my totals are not equaling to my original search. Here is what my average search looks like which is returning large values.

index=example earliest =-3mon@mon | dedup number | bucket _time span=1mon | stats count as Tickets by contact_type _time | stats avg(Tickets) as Average by contact_type | eval Average=round(Average,0)

I tried changing your search to included -3mon@mon and also tried other solutions that people answered- but my original search is still returning larger numbers

0 Karma

katzr
Path Finder

Hi @DalJeanis I figured out the problem- the AverageTickets in your example is computing the average over the whole time period- which includes 0s in the dc so it is bringing down the average which should be computed only over the past 3 months- not the whole time period.

Example below
_time AverageTickets
17-5 0
17-6 0
17-7 100
17-8 120
17-9 130
17-10 140

So the 0s are bringing down the average

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...