Splunk Search

List files in search then table the duration between beginning and end of processing

jamieivanov
Explorer

I'm trying to limit the scope of a search to a 24 hour window and list all of the files analyzed within that period but then display a table of how long it took to finish processing each file.

For example, I have:

2015-07-31 15:56:13.7599|DEBUG|FileAnalyzer|Finished file.zip
2015-07-31 15:56:13.0099|DEBUG|FileAnalyzer|Starting file.zip

I can get something as simple as:

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | stats count by inputfile

But that's a simple count. I'm looking for a table that will resemble something like:

INPUTFILE | DURATION
file.zip | 3.067

I've tried a combination of map, transaction, and table with and without subsearches without being successful. Any help would be most appreciated.

Tags (3)
0 Karma
1 Solution

somesoni2
Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

View solution in original post

somesoni2
Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

jamieivanov
Explorer

I appreciate your suggestion and it looks like that is working exactly as I had expected. Thank you!

0 Karma

woodcock
Esteemed Legend

Assuming that a file can be processed more than once, then this should work:

  | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | streamstats last(EndTime) AS EndTime by inputfile | where Stage="Starting" | eval Duration = EndTime - _time| table inputfile Duration
0 Karma

jamieivanov
Explorer

Streamstats still has the same issue but also introduced multiple instances of the input file name without a duration.

0 Karma

woodcock
Esteemed Legend

Run this search for "last 24 hours":

 | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | stats earliest(StartTime) AS StartTime latest(EndTime) AS EndTime by inputfile | eval Duration = EndTime - StartTime | table inputfile Duration

jamieivanov
Explorer

That looks to be a brilliant solution! However, the main search brings up 14 results of starting/finished log entry pairs which would be 7 files but I'm getting 6 results with one of them >1,000 seconds (for duration). The one file that is >1,000 seconds shows up twice in the search results but both instances are ❤️ seconds. I think what's happening is that the start time if the first instance is being matched with the end time of the last instance; would I be able to call upon your perspicacity for thoughts on this?

0 Karma

woodcock
Esteemed Legend

My solution presumes that you are timestamping your events correctly and that each event's _time value is the same as the first PSV value in each row. Is this the case? It also presumes that each file is only processed once and I think that is where the breakdown is. Is it possible for a file to be processed more than once?

0 Karma

jamieivanov
Explorer

The timestamps should be handled correctly but I'm afraid I don't understand the second part of that sentence. Under ideal circumstances, the assumption that a file would be used only once would be accurate but during testing it is possible that a file may be processed more than once.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...