Splunk Search

How to search and table a list of jobs that are currently running in our system, not finished?

RecoMark0
Path Finder

Hello,
I'm trying to create a table that lists jobs that are currently still running in our system(meaning a FINISHED has not been logged yet).
Example:

2014-09-30 20:00:29,037 [I] Setting status of job 43452 to RUNNING. <123 (5555)> [] ProjectA
...
2014-09-30 20:05:12,136 [I] Setting status of job 43452 to FINISHED. <123 (5555)> [] ProjectA

The lines are connected by both the job#(43452) and the processID(123 (5555))

I have tried the following searches without luck:

sourcetype="System Logs" "Setting status of job"  (JobStatus="RUNNING" OR JobStatus="FINISHED") | transaction JobID [search NOT JobStatus="FINISHED"]

,

sourcetype="System Logs" "Setting status of job"  (JobStatus="RUNNING" OR JobStatus="FINISHED") | transaction JobID | join [search NOT JobStatus="FINISHED"]

Is there a search that will allow me to have the following table?
||Time Started || Project || Job Number || Status ||
|2014-09-30 20:00:29,037 | ProjectA | 43452 | RUNNING |
| 2014-09-30 20:12:14,044 | ProjectB | 48794 | RUNNING |

These would be removed once the FINISHED line for the job is logged. I may be asking too much from splunk though.

Thank you.

Tags (3)
1 Solution

the_wolverine
Champion

Transactions are relatively expensive so I would try something that doesn't require it. In this case:

sourcetype="System Logs" Setting status of job RUNNING  | search  NOT  [ search sourcetype="System Logs" Setting status of job FINISHED | fields Job_Number | format ]   | table _time,Project,Job_Number,Status

View solution in original post

sideview
SplunkTrust
SplunkTrust

Here's another solution that avoids using a subsearch.

sourcetype="System Logs" "Setting status of job"  (JobStatus="RUNNING" OR JobStatus="FINISHED") 
| stats last(_time) as _time last(Project) as Project values(JobStatus) as JobStatus by Job_Number
| search JobStatus!="Finished"

How it works is that the stats clause rolls up the results so each row represents one Job_Number. For each row the values() clause makes a Multivalue field called JobStatus. Then the search clause at the end filters out any rows that had a second value of JobStatus="Finished". This will leave only the rows that had started but not finished.

For smaller use cases, like if you know there are always less than 50,000 Jobs, or where the subsearch for just the finished events will always takes less than a minute, the solution posted by the_wolverine that uses a subsearch will be better and slightly faster. For cases where you might have more than 50,000 rows, you'll hit subsearch limits whereas this search will remain accurate.

RecoMark0
Path Finder

Nice, this worked as well. Thank you!

0 Karma

the_wolverine
Champion

Transactions are relatively expensive so I would try something that doesn't require it. In this case:

sourcetype="System Logs" Setting status of job RUNNING  | search  NOT  [ search sourcetype="System Logs" Setting status of job FINISHED | fields Job_Number | format ]   | table _time,Project,Job_Number,Status

the_wolverine
Champion

Format allows the search to bypass some limits due to larger result sets. I use it as a habit just in case. You could try the search without it to see if it still works.

RecoMark0
Path Finder

Quick question: What does the 'format' do in the second search? It seems to be the biggest missing piece of my puzzle.

0 Karma

RecoMark0
Path Finder

Awesome, that worked! Thank you very much!

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...