Splunk Search

How to build a line graph to compare the results of a search with its inverse results?

posava
Explorer

I must apologize as I have found partial examples of what I am looking for, but I'm not well-versed enough to merge them together to get what I need. I have a search:

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")

And I need to take that data and overlay it on a line graph (based on # of events over the span of a month) with its inverse. In other words, I need to compare it to the following search:

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender!="*reply*" (sender_domain!="domain1" AND sender_domain!="domain2" AND sender_domain!="domain3")

(not even really sure if I inverted that correctly)
Thanks in advance. Any help is very much appreciated.

0 Karma
1 Solution

gokadroid
Motivator

Since there are two fields on which you are trying to search data so it depends on what do u want to call an inversion out of following combinations:

  1. sender!="*reply*" AND (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")
  2. sender="*reply*" AND NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")
  3. sender!="*reply*" AND NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")

I will choose option 2) as the inversion and use it in query, however you can choose the one you feel is suited for your case. Also I am assuming both the data actualData and invertedData need to be shown over last one month (30 days), hence try as follows:

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d
 | stats count as Total by _time
 | eval ReportKey="actualData" 
 | append [ search index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d 
 | stats count as Total by _time
 | eval ReportKey="invertedData" ]
 | timechart max(Total) as Total_Count by ReportKey

In the last line of query | timechart max(Total) as Total_Count by ReportKey choose the appropriate function max(Total), avg(Total), min(Total) etc whatever suits your case. Let me know if it helped.

Updating as per comments to show the data over same time line for two sets of data.

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=@d 
 | stats sum(count) as Total by _time
 | eval ReportKey="actualDataForToday" 
 | append [ search index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d latest=-29d@d
 | stats sum(count) as Total by _time
 | eval ReportKey="invertedDataForAMonthBack" 
 | eval _time=_time+(30*24*60*60) ]
 | timechart Total as Total_Count by ReportKey

Take a note of changed lines as:
earliest=@d rather than earliest=-30d@d for today's data
earliest=-30d@d latest=-29d@d for a single day, a month back's data
| eval _time=_time+(30*24*60*60) shift of time from a month back single day to match to today's time in one month back's query.
See here for more insights.

View solution in original post

gokadroid
Motivator

Since there are two fields on which you are trying to search data so it depends on what do u want to call an inversion out of following combinations:

  1. sender!="*reply*" AND (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")
  2. sender="*reply*" AND NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")
  3. sender!="*reply*" AND NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3")

I will choose option 2) as the inversion and use it in query, however you can choose the one you feel is suited for your case. Also I am assuming both the data actualData and invertedData need to be shown over last one month (30 days), hence try as follows:

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d
 | stats count as Total by _time
 | eval ReportKey="actualData" 
 | append [ search index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d 
 | stats count as Total by _time
 | eval ReportKey="invertedData" ]
 | timechart max(Total) as Total_Count by ReportKey

In the last line of query | timechart max(Total) as Total_Count by ReportKey choose the appropriate function max(Total), avg(Total), min(Total) etc whatever suits your case. Let me know if it helped.

Updating as per comments to show the data over same time line for two sets of data.

index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=@d 
 | stats sum(count) as Total by _time
 | eval ReportKey="actualDataForToday" 
 | append [ search index="msexchange" sourcetype="MSExchange:2010:MessageTracking" sender="*reply*" NOT (sender_domain="domain1" OR sender_domain="domain2" OR sender_domain="domain3") earliest=-30d@d latest=-29d@d
 | stats sum(count) as Total by _time
 | eval ReportKey="invertedDataForAMonthBack" 
 | eval _time=_time+(30*24*60*60) ]
 | timechart Total as Total_Count by ReportKey

Take a note of changed lines as:
earliest=@d rather than earliest=-30d@d for today's data
earliest=-30d@d latest=-29d@d for a single day, a month back's data
| eval _time=_time+(30*24*60*60) shift of time from a month back single day to match to today's time in one month back's query.
See here for more insights.

gokadroid
Motivator

I would suggest that you run both the queries, one before append and one after append, separately and see whether both the "inversion" and "non inversion" queries return you the data in same timeline for which you are plotting. For example the way your graphs show up tell me that one of the query's data is for a time earlier than 15th Nov. In your case the blue graph which has data only for Nov 15th is trying to plot the data of 30 days ago as well.

In such cases you need to adjust the previous data's timeline to match to current data. So let's say if you wanted to compare one month's back data (not of entire month data but of a day 30 days ago) with today's data then you will have to shift the one month back data's timeline (yellow graph) to match it to current blue graph's timeline which is of today's timeline. I am adding the logic of how to shift this timeline in answer as well so you have both the graphs on same timeline.

Also since timechart will tend to bucket the numbers on its own based on the time range chosen so if you were looking for the total number of emails, yes it will be sum(count), whereas in cases where you want to deal with times, like response times charting, then maybe max, min or avg might be suitable. Updating the missing pipe as suggested in the answer.

See here for more insights on how to shift timeline to show on same chart.

0 Karma

posava
Explorer

Ok I might have spoken too soon - something's not quite right. I've tried a couple variations of the search and in every case, the graph is missing data after the first couple points. I've changed time ranges and even switched the searches around and get the same result every time. It looks something like this:
alt text

0 Karma

posava
Explorer

That was exactly what I needed, thank you! I had to make a couple adjustments. I added a pipe before the append command to make it work. Also, when I first ran it I was getting some unusually low numbers so I modified the timechart to sum(Total) and that gave me exactly what I needed. Thanks so much again for the help!

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...