I'm looking for a way to filter search results based on calculating time deltas between 2 rows (goal is to extract contiguous events based on 0 or greater time delta from the end of one event to the start of the next). The problem is that I need to evaluate each row pair and filter in a single operation and then repeat for the entire data series. The only way I have been able to achieve this so far is to use searchstats, and filter out a row at time and rerun the streamstats until the result set no longer reduces. Is there a better way?
Example data set (in reverse chronological 'end time' order):
Event Start_Time End_Time Delta
Event_1 13:10:00 13:20:00 -
Event_2 13:07:00 13:15:00 -5:00
Event_3 13:06:00 13:14:00 -7:00
Event_4 13:00:00 13:10:00 -4:00
Event_5 12:50:00 13:00:00 0
Desired Output
Event Start_Time End_Time Delta
Event_1 13:10:00 13:20:00 -
Event_4 13:00:00 13:10:00 0
Event_5 12:50:00 13:00:00 0
Actual Output
Event Start_Time End_Time Delta
Event_1 13:10:00 13:20:00 -
Event_5 12:50:00 13:00:00 0
Original Query
<query> | streamstats current=f window=1 global=f last(Start_Time) as Next_Start | while (Next_Start - End_Time) >= 0
-- Next_Start is the 'Start_Time' from the first row, and End_Time is from the second row
Problem is that this filters out Event 4 because it evaluates the entire data set in one operation before evaluating the filter
Current Working (but inefficient) Query, and I don't really know how many times to call the filter (although strangely enough it doesn't seem to cause a significant time impact even with 30+ calls!)
<query> | streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0
| streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0
| streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0
... <repeat until data set doesn't reduce any further>
-- Only consider the first two rows with a negative result. If there are contiguous rows with a negative result zero out all but the first and let the next searchstats call filter them one by one.
Any suggestions would be greatly appreciated. Thanks
Give this a try (Run anywhere search with your sample data, everything before the convert
command is just to generate sample data, replace it with you actual search )
| gentimes start=-1 | eval temp="Event_1 13:10:00 13:20:00#Event_2 13:07:00 13:15:00#Event_3 13:06:00 13:14:00#Event_4 13:00:00 13:10:00#Event_5 12:50:00 13:00:00" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Event>\S+)\s+(?<Start_Time>\S+)\s+(?<End_Time>\S+)" | fields - temp
| convert dur2sec(End_Time) as s_End dur2sec(Start_Time) as s_Start | streamstats current=f window=1 values(s_Start) as next_start | eval delta1=next_start-s_End | reverse | streamstats current=f window=1 values(s_End) as prev_end | eval delta2=s_Start-prev_end | where coalesce(delta1,0)=0 OR coalesce(delta2,0)=0 | reverse | table Event Start_Time End_Time
Thanks for this. While it didn't solve my problem exactly (as it only has a 1 element lookahead, before and after) it did put me on a better path. I've ended up with the following which does a window lookahead, before and after, to determine if there is are contiguous events. If there isn't (and it isn't the start or end event) it eliminates it.
| streamstats current=f window=30 values(LAST_START) as next_start | eval next_start = if(isnull(next_start),LAST_END,next_start) | reverse | streamstats current=f window=30 values(LAST_END) as prev_end | eval prev_end = if(isnull(prev_end),LAST_START,prev_end) | mvexpand next_start | eval delta1=next_start-LAST_END | mvexpand prev_end | eval delta2=LAST_START-prev_end| where coalesce(delta1,0)=0 AND coalesce(delta2,0)=0 | reverse
Hi @brunton2 - Did the search you wrote above provide a working solution to your question?
Yes and no unfortunately. Yes from the perspective that it did what I wanted. No from the perspective that the performance of mvexpand makes the solution unbearably slow in the large data set scenario I'm trying to solve.
What I'd really like to do is perform an mvfind with a numeric comparison against each element in the mvlist. Unfortunately the comparison needs to use the LAST_START and LAST_END variables which mvfind appears unable to support.
What I've resorted to doing is extracting each of the mvlist elements as eval variables and comparing against each in sequence. The performance of the query is orders of magnitude faster than the mvexpand approach but it makes the query horrific to look at!
Unless someone can suggest a method to either optimizing mvexpand over large data sets or a method to use mvfind with a variable in the comparison routine then I think I'm stuck with this approach
No problem. We'll leave this open just in case another user would like to take a stab at it.