Splunk Search

How to extract the start and end date/time from my sample data, then calculate the duration of a job?

athorat
Communicator

Hi

I have to extract start date, end date, and the duration of a job based on the following two events:

Started:
2016-01-06 07:00:10,314  INFO ActionStartXCommand:543 - SERVER[redacted.redacted.redacted.com] USER[hdfs] GROUP[-] TOKEN[] APP[BOB_CAS_WF_IM_DataStore] JOB[0018443-150810162217214-oozie-oozi-W] ACTION[0018443-150810162217214-oozie-oozi-W@:start:] [***0018443-150810162217214-oozie-oozi-W@:start:***]Action updated in DB!


End:
2016-01-06 08:10:50,301  INFO ActionStartXCommand:543 - SERVER[redacted.redacted.redacted.com] USER[hdfs] GROUP[-] TOKEN[] APP[BOB_CAS_WF_IM_DataStore] JOB[0018443-150810162217214-oozie-oozi-W] ACTION[0018443-150810162217214-oozie-oozi-W@end] [***0018443-150810162217214-oozie-oozi-W@end***]Action updated in DB!

How do I extract the date and time and then calculate the difference.

0 Karma
1 Solution

sundareshr
Legend

Try something like this (untested code, assuming there are only two events per JOB)

... | rex "JOB\[?(?<jobid>[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff | search diff=*

View solution in original post

jkat54
SplunkTrust
SplunkTrust

In your props.conf:

[sourcetypeName]
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3}
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 24
EXTRACT-myfields = (?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\[

Your search:

 index="foo" servername="server.acme.com" action=:start: 
| stats first(_time) AS startTime BY servername 
| append [
   search index="foo" sourcetype="testing" servername="server.acme.com" action=end 
   | stats first(_time) AS endTime BY servername 
] | chart eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername
0 Karma

athorat
Communicator

thanks for the reply @jka154 @sundareshr

I am using the following query to extract job name and Action=:start: or Action=@end

index=aap_prod sourcetype="HDP:PROD:OOZIE"  ("@:start:" OR "@end")  | rex "TOKEN\[\] APP\[(?[^\]]*)"    | rex "ACTION\[[^\@]*(?[^\d\]]*)"

If I append rex "JOB\[?(?[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff

I get the table but there are jobid's with the same id with value = 0 and then there is one value with the actual duration.
How do we filter only the values with actual duration.
tried using | stats values(jobid) which does not work.

Thanks for looking into this.

Anil.

0 Karma

jkat54
SplunkTrust
SplunkTrust
index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") 
 | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\["
  | search action=:start:
  | stats first(_time) AS startTime by servername
  | append [ 
   search index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") 
   | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\["
   | stats first(_time) AS endTime by servername
  ]
  | chart eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername
0 Karma

sundareshr
Legend

Updated my answer above to show how you can filter out where diff is not calculated.

0 Karma

athorat
Communicator

@sundareshr

I tired | search diff!=0.0 and it worked.
How can I add the Start and the end dates, can you throw some light on it ?

Thanks,

0 Karma

sundareshr
Legend

Try something like this (untested code, assuming there are only two events per JOB)

... | rex "JOB\[?(?<jobid>[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff | search diff=*

jkat54
SplunkTrust
SplunkTrust

What happens if there are more than 2 events per job?

I like how you simplified by jobid and used streamstats... pretty cool.

0 Karma

athorat
Communicator

@sundareshr

I tired | search diff!=0.0 and it worked.
How can I add the Start and the end dates, can you throw some light on it ?

Thanks,

0 Karma

jkat54
SplunkTrust
SplunkTrust

My search works fine with the sample data you provided. It's not as efficient as sundareshr's search but i believe it handles more scenarios / possibilities. If you want to still see start/end, and diff, and servername...

Edit my chart command at the end of my search on the bottom of my reply

...| chart startTime endTime eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername

0 Karma

jkat54
SplunkTrust
SplunkTrust

only issue is the time's will be in epoch...

0 Karma

sundareshr
Legend

Use this

.... | streamstats current=f window=2 range(_time) as diff latest(_time) as end earliest(_time) as start| table jobid, diff, start, end | eval start=strftime(start, "%c") | search diff=*
0 Karma

athorat
Communicator

@sundareshr is there a way we can we convert the range-> diff to seconds or minutes?

0 Karma

sundareshr
Legend
| eval diff=tostring(diff, "duration")
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...