How to make my search more efficient? Help to remo...

katzr · ‎09-28-2017

My search is running pretty slow and I am looking to edit/remove the joins to make it run faster. It looks pretty messy and the reason I have weird things going on with my location information is because for any location that does not match my lookup- location, Country, and Region to be filled as Unknown. And unfortunately, I was getting a weird error with using a lookup table and setting a default value so I had to do it manually.

I am using two joins because I am looking over 3 different time periods. Are there any ways to improve the efficiency of this search and make it run quicker?

 index=example date_month=August date_year=2017 (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="*") (Country="*") (site="*")) |stats count as Tickets by contact_type | join overwrite=false contact_type [search index=example earliest="6/01/2017:00:00:00" latest="12/31/2017:24:00:00" (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="") (Country="") (site=""))| bucket _time span=1mon | stats count as Tickets by contact_type _time | stats avg(Tickets) as Baseline by contact_type | eval Baseline = round(Baseline,0)] | eval "Baseline Variance" = Tickets - Baseline| join overwrite=false contact_type [search index=example earliest=-3mon@mon (assignment_group="*") | dedup number | fillnull value="UNKNOWN" location | eval regionblank= "UNKNOWN" | eval countryblank= "UNKNOWN" | eval locationblank="UNKNOWN" | lookup CurrentSiteInfo.csv location| eval site=coalesce(location2,locationblank) | eval Region=coalesce(Region,regionblank)| eval Country=coalesce(Country,countryblank) | search ((Region="*") (Country="*") (site="*")) | bucket _time span=1mon | stats count as Tickets by contact_type _time | stats avg(Tickets) as Average by contact_type | eval Average = round(Average,0)] | eval "Average Variance" = Tickets - Average | table contact_type Tickets Baseline "Baseline Variance" Average "Average Variance" | addcoltotals | sort 0 Tickets

DalJeanis · ‎09-28-2017

On further review, I believe that various code changes have rendered much of the code irrelevant when the location values were defaulted to "UNKNOWN". The search that followed that section previously had the purpose of eliminating locations that were not in the file. Overriding the nulls with the value "UNKNOWN" prevents those records from being eliminated, and the fields are not otherwise used in the code, so the whole lookup is redundant.

This should accomplish the same effect more efficiently.

| join overwrite=false contact_type 
    [search index=example earliest=-6mon@mon latest=now (assignment_group="*") 
    | fields contact_type    ... whatever else you absolutely need...
    | eval _time = relative _time(_time,"@mon") 
    | eval BaselineFlag = case(...test the date for if this event is in baseline...., 1)
    | eval AverageFlag = case(...test the date for if this event is in average...., 1)
    | rename COMMENT as "The above commands are streaming and distributable, so should be above the dedup unless you have LOTS of dups." 

    | rename COMMENT as "By using dc instead of count, this stats eliminates the need for dedup." 
    | stats dc(eval(case(BaselineFlag=1,number))) as BaselineTickets 
            dc(eval(case(AverageFlag=1,number))) as AverageTickets 
            by contact_type _time 

    | stats avg(BaselineTickets) as Baseline avg(AverageTickets) as Average by contact_type 
    | eval Baseline = round(Baseline,0)  
    | eval Average = round(Average,0)]

At the very least, you can collapse the two joins into a single one that looks something like this...

| join overwrite=false contact_type 
    [search index=example earliest=-6mon@mon latest=now (assignment_group="*") 
    | dedup number 
    | fillnull value="UNKNOWN" location 
    | eval regionblank= "UNKNOWN" 
    | eval countryblank= "UNKNOWN" 
    | eval locationblank="UNKNOWN" 
    | lookup CurrentSiteInfo.csv location
    | eval site=coalesce(location2,locationblank) 
    | eval Region=coalesce(Region,regionblank)
    | eval Country=coalesce(Country,countryblank) 
    | bucket _time span=1mon 

    | eval BaselineFlag = case(test date for if this event is in baseline, 1)
    | eval AverageFlag = case(test date for if this event is in average, 1)
    | stats count(BaselineFlag) as BaselineTickets count(AverageFlag) as AverageTickets by contact_type _time 

    | stats avg(BaselineTickets) as Baseline avg(AverageTickets) as Average by contact_type 
    | eval Baseline = round(Baseline,0)  
    | eval Average = round(Average,0)]
| eval "Baseline Variance" = Tickets - Baseline
| eval "Average Variance" = Tickets - Average

katzr · ‎10-03-2017

| eval BaselineFlag = case(...test the date for if this event is in baseline...., 1)
| eval AverageFlag = case(...test the date for if this event is in average...., 1)

I can't figure out the syntax for testing the date. For my timeline it is a static date of between 6/01/2017:00:00:00 and 12/31/2017:24:00:00. My average is if it is within the past 3 months.

DalJeanis · ‎10-03-2017

@katzr -

 | eventstats max(_time) as maxtime
 | eval BaselineFlag=case(_time>=strptime("2017-06-01","%Y-%m-%d") AND 
        _time<strptime("2018-01-01","%Y-%m-%d"),1)
 | eval AverageFlag = case(_time>=relative_time(maxtime,"-2mon@mon"),1)

katzr · ‎10-04-2017

Hi @DalJeanis thank you for all of your help. However, using this search- my totals are not equaling to my original search. Here is what my average search looks like which is returning large values.

I tried changing your search to included -3mon@mon and also tried other solutions that people answered- but my original search is still returning larger numbers

katzr · ‎10-04-2017

Hi @DalJeanis I figured out the problem- the AverageTickets in your example is computing the average over the whole time period- which includes 0s in the dc so it is bringing down the average which should be computed only over the past 3 months- not the whole time period.

Example below
_time AverageTickets
17-5 0
17-6 0
17-7 100
17-8 120
17-9 130
17-10 140

So the 0s are bringing down the average

How to make my search more efficient? Help to remove joins

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life