Splunk Search

How to Determine total time duration (deduplicated) for overlapping events?

dw_jcro
Loves-to-Learn Lots

To start - I was suggested this solution, but despite the fact that the question is very similar the answer marked as a solution doesn't seem to actually provide the quantitative total that I am looking for.

I have a series of events where there is a Start and Stop time, in epoch time. These events can be grouped by a common field, `host`, and I am trying to determine the total amount of deduplicated time that these events span.

For example:

  • Host_1, Event_1: starts at 13:00, ends at 13:15
  • Host_1, Event_2: starts at 13:10, ends at 13:20
  • Host_1, Event_3: starts at 13:30, ends at 14:00

The total time for Host_1 would therefore be 50 minutes:

  • Event_1: 15 minutes
  • Event_2: 5 minutes (10 minutes - 5 minutes of overlap with Event_1)
  • Event_3: 30 minutes (no overlap with any other events)
  • Total: 50 minutes

 

I had tried to leverage streamstats to get information about previous events, but couldn't work out how to get it to properly reset when the events didn't overlap.

Not even sure streamstats is the best method for solving this type of problem.

 

EDIT: some test data may be helpful.

0,"hostname","start_time","end_time"
1,"host_1","1654130041.626307","1654130566.626307"
2,"host_1","1654131696.975800","1654133451.975800"
3,"host_1","1654132454.687189","1654134263.687189"
4,"host_1","1654132747.975800","1654133451.975800"
5,"host_1","1654133805.740912","1654134236.740912"
6,"host_1","1654136688.170093","1654136722.170093"
7,"host_1","1654136782.300892","1654136818.300892"
8,"host_1","1654136885.031861","1654137288.031861"
9,"host_1","1654137388.801936","1654139394.801936"

 

Doing the math, rows numbered 3 and 4 both have `start_time` values that are earlier than row 1's `end_time` value - indicating that there would be a duration overlap occurring in several rows.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| eval event=mvrange(1,3)
| mvexpand event
| eval time=if(event=1,start,end)
| sort 0 time
| eval event=if(event=1,event,-1)
| streamstats sum(event) as concurrent by host
| eval start=if(concurrent=1 AND event=1,start,null())
| eval end=if(concurrent=0 AND event=-1,end,null())
| streamstats last(start) as start by host
| eval duration=if(concurrent=0,end-start,null())
| stats sum(duration) as duration by host
| eval duration=tostring(duration,"duration")
0 Karma

PickleRick
SplunkTrust
SplunkTrust

But if you have start and end at each event what more you need?

0 Karma

dw_jcro
Loves-to-Learn Lots

I'm not sure I follow. Are you suggesting to simply export the data from Splunk to perform the calculations on total time elsewhere?

"What more I need" -  I want to be able to calculate the sum of all deduplicated/non-overlapping time by host. If you'd like, I can try to provide some example data to work with.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, I mean that I was simply surprised at the simplicity since you have all the data needed for a simple

<your search>
| eval duration=end_time-start_time
| stats sum(duration)

But it seems I missed (or rather misinterpreted) the overlapping part. (I thought that you didn't care about the overlap so it really seemed too simple). Hence my reaction.

But to the point, I'd try something similar to what @ITWhisperer showed but from a slightly different angle.

Sorry, don't have my splunk installation at hand so can't provide exact solution, just a rough outline.

Firstly split the event into separate start/end events using untable.

Then you sort the events by time.

Set a field (let's call it optype) to 1 if it's a start event or -1 if it's a end event.

Streamstats sum(optype) as filter

Now you have a filter field - you're only interested in "balanced" values for this field so filter using where to get only those events which are starts with filter=1 or ends with filter=0. Other - "unbalanced" - values are the "internal" starts/ends.

Now you only have to combine them back into start/end pairs. For example by streamstats last(time) - this will give you start/end of a "same state" period. Now you only have to do "where" to only get the "end" events (with corresponding starts of course.

And you have "deduplicated" periods for duration calculation and stats(sum).

I hope it makes sense. I might be able to turn this into spl later.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

untable (as @PickleRick suggested) works well for doubling up the events too

| makeresults
| eval _raw="0,hostname,start_time,end_time
1,host_1,1654130041.626307,1654130566.626307
2,host_1,1654131696.975800,1654133451.975800
3,host_1,1654132454.687189,1654134263.687189
4,host_1,1654132747.975800,1654133451.975800
5,host_1,1654133805.740912,1654134236.740912
6,host_1,1654136688.170093,1654136722.170093
7,host_1,1654136782.300892,1654136818.300892
8,host_1,1654136885.031861,1654137288.031861
9,host_1,1654137388.801936,1654139394.801936"
| multikv forceheader=1
| table hostname start_time end_time
| untable hostname event time
| sort 0 time
| eval event=if(event="start_time",1,-1)
| streamstats sum(event) as concurrent by hostname
| eval start_time=if(concurrent=1 AND event=1,time,null())
| eval end_time=if(concurrent=0 AND event=-1,time,null())
| streamstats last(start_time) as start_time by hostname
| eval duration=if(concurrent=0,end_time-start_time,null())
| stats sum(duration) as duration by hostname
| eval duration=tostring(duration,"duration")
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...