Solved: List files in search then table the duration betwe...

jamieivanov · ‎08-04-2015

I'm trying to limit the scope of a search to a 24 hour window and list all of the files analyzed within that period but then display a table of how long it took to finish processing each file.

For example, I have:

I can get something as simple as:

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | stats count by inputfile

But that's a simple count. I'm looking for a table that will resemble something like:

INPUTFILE | DURATION
file.zip | 3.067

I've tried a combination of map, transaction, and table with and without subsearches without being successful. Any help would be most appreciated.

somesoni2 · ‎08-05-2015

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

View solution in original post

somesoni2 · ‎08-05-2015

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

jamieivanov · ‎08-05-2015

I appreciate your suggestion and it looks like that is working exactly as I had expected. Thank you!

woodcock · ‎08-05-2015

Assuming that a file can be processed more than once, then this should work:

  | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | streamstats last(EndTime) AS EndTime by inputfile | where Stage="Starting" | eval Duration = EndTime - _time| table inputfile Duration

jamieivanov · ‎08-05-2015

Streamstats still has the same issue but also introduced multiple instances of the input file name without a duration.

woodcock · ‎08-04-2015

Run this search for "last 24 hours":

 | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | stats earliest(StartTime) AS StartTime latest(EndTime) AS EndTime by inputfile | eval Duration = EndTime - StartTime | table inputfile Duration

jamieivanov · ‎08-05-2015

That looks to be a brilliant solution! However, the main search brings up 14 results of starting/finished log entry pairs which would be 7 files but I'm getting 6 results with one of them >1,000 seconds (for duration). The one file that is >1,000 seconds shows up twice in the search results but both instances are ❤️ seconds. I think what's happening is that the start time if the first instance is being matched with the end time of the last instance; would I be able to call upon your perspicacity for thoughts on this?

woodcock · ‎08-05-2015

My solution presumes that you are timestamping your events correctly and that each event's _time value is the same as the first PSV value in each row. Is this the case? It also presumes that each file is only processed once and I think that is where the breakdown is. Is it possible for a file to be processed more than once?

jamieivanov · ‎08-05-2015

The timestamps should be handled correctly but I'm afraid I don't understand the second part of that sentence. Under ideal circumstances, the assumption that a file would be used only once would be accurate but during testing it is possible that a file may be processed more than once.

List files in search then table the duration between beginning and end of processing

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes