I need to perform a search that spans a days worth of logs looking for 5 identical events in a one hour window. Breaking the day up into hours will not work since the 5 events could cross the hour boundary (ie. 11:50 - 12:10). I imagine that I need to evaluate the difference in timestamps of each event to determine if they are within a one hour window but I can't see how to do it.
In a script I would put all events sorted by time into an array and look for any grouping of 5 events where the difference in time between event[x] and event[x+4] is <= 3600(working with epoch time). How would you do this, or something like this, in Splunk?
I'd rather use streamstats like this:
<yourbasesearch> | streamstats count range(_time) as range window=5 | where range <= 3600 AND count = 5
This will discard the first four results due to not having seen five yet, and after that keep every row where the four previous events are within an hour from it.
The reason for not using transaction, apart from speed, is that it may actually fail. Consider this:
...long silence...
event n: 12:00
event n+1: 12:45
event n+2: 12:55
event n+3: 13:05
event n+4: 13:15
event n+5: 13:25
Events n+1 to n+5 would satisfy the condition and should be found, but I'd think that transaction would group the events n to n+2 into one transaction, then go to the next transaction due to exceeding a span of one hour so you should get two transactions with three events each.
Use transaction
(http://docs.splunk.com/Documentation/Splunk/5.0/SearchReference/Transaction )
<yourbasesearch> | transaction maxspan=1h | search eventcount>=5