Splunk Search

Alternative to join command to do a summation on date_hour

byu168
Path Finder

I need to sum up the time differential for two events on a date_hour, date_wday, and date_month basis. Originally I used the join command to accomplish this until I realized events were being dropped.

I changed my query to remove the command and am currently working with this

  index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
    | eval temp1=split(message, " ")
    | eval bank=mvindex(temp1, 0)
    | eval run_id_bank=run_name.".".tag_value.".".bank 
    | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
    | eval temp=date_hour."##".run_id_bank | chart values(stime) over temp by msg_type 
    | eval TimeDiff=(INFO-BEGIN)/3600 
    | eval date_month=mvindex(split(temp,"##"),0)
    | stats sum(TimeDiff) AS Sum by date_hour

Each run has 16 different banks which need to be differentiated. The issue I'm running into is when the decompression and processing occur in a different hour (or wday/month) then the differential won't be taken. This can be solved by charting over run_id_bank but then I lose the date_hour timestamp. Is there a way to do the calculation but retain the date_hour from the BEGIN (only occurs for processing) to do a summation?

0 Karma
1 Solution

somesoni2
Revered Legend

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (date_month should actually be date_hour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour

View solution in original post

0 Karma

somesoni2
Revered Legend

Your comments did make some things clear to me. This is my suggested query. I've made some changes to reduce the lines of code and corrected a type in the 2nd last line (date_month should actually be date_hour).

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | table run_id_bank BEGIN INFO TimeDiff date_hour

YOu can now validate the results first and then add following to get total TimeDiff by date_hour of the BEGIN events.

index=default source=mysql-pipe sourcetype=pipeline_logs ((message="bank** processing" OR message="bank** decompression complete") AND (msg_type="BEGIN" OR msg_type="INFO")) 
     | eval run_id_bank=run_name.".".tag_value.".".mvindex(split(message, " "), 0) 
     | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
     | chart values(stime) over run_id_bank by msg_type 
     | eval TimeDiff=(INFO-BEGIN)/3600 
     | eval date_hour=strftime(BEGIN,"%H")
     | stats sum(TimeDiff) AS Sum by date_hour
0 Karma

somesoni2
Revered Legend

Could you provide the mock table of what is your final expected output? (or what's wrong with output of current search that you have?)

0 Karma

byu168
Path Finder

Let's disregard the summation aspect for now and explain what's wrong prior to that

Take the following situation

Run1: Begins processing at 2:30 and finished decompression at 3:33, giving a TimeDiff of 63 minutes
Run2: Begins processing at 1:03 and finished decompression at 1:23, giving a TimeDiff of 20minutes

Below is the current output that I am getting. The TimeDiff isn't taken for run1 because the different date_hour stamps separate the two messages even though it's the same run_id_bank.

temp               BEGIN        INFO       TimeDiff date_hour
2##run1_id_bank 1481079017.00                            2
3##run1_id_bank                1481079161.00              3
1##run2_id_bank 1481079017.00  1481079173.00  20          1

Below is the kind of output I would be looking for

temp                     BEGIN            INFO          TimeDiff          date_hour
2##run1_id_bank 1481079017.00   1481079161.00           63                  2
1#run2_id_bank     1481079017.00    1481079173.00           20                    1

In this case I'd be able to sum up the TimeDiff for all the runs by the date_hour returned from the BEGIN log. If I chart over just run_id_bank instead of temp I get the above table, with an empty date_hour column. The temp column isn't necessary in that format as long as the other columns are accurate

Then I'd be able to run a summation on top of that query to get

date_hour               Sum
1                       20
2                       63
...
24
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...