Hi,
I am trying to make a service downtime calculation based on the following rules: If the service has the status 'error' for more than 3 consecutive events within an hour, measure the duration between status='error' and the last status='error' ( before the next 'ok' ).
I have pasted a log further down, and here's what I want the results to become:
Snippet 1: The downtime should be 1815 seconds
Snippet 2: 600 seconds
Snippet 3, part1: No downtime because there are only 2 status='down'
Snippet 3, part2: 3 seconds
Snippet 4: 3000 seconds
I've tried transaction- and streamstats command to do what I want, but I have still not had any success.
The following transaction commands nearly works as for grouping the events together:
... | eval down=if(status=="error",1,0) | transaction down maxspan=1h maxpause=5m
or
... | eval down=if(status=="error",1,0) | transaction startswith=down=1 endswith=down=0
The first transaction example include the maxpause parameter - which I don't really want. But then again leaving it out doesn't work either.
The second transaction example includes the first ok event after the last error, which is also something I don't want.
It would be great to have the transaction command receiving a parameter like 'ends_before=down=0' rather than endswith...
FYI, I have also touched upon the streamstats command to filter out similar statuses like so:
| streamstats window=1 current=f last(down) as previous_down by service |where NOT down=previous_down
If I could get the transaction command to work, though, I could use the linecount field to exclude errors that are fewer than 3.
Does anybody have a solution or pointers in the right direction?
Thanks,
Hans J.
==== 8< ==== Log example =====
... Snippet 1
2014-09-08T00:14:00,ok
2014-09-08T00:30:00,error
2014-09-08T01:01:00,error
2014-09-08T01:01:05,error
2014-09-08T01:01:10,error
2014-09-08T01:01:15,error
2014-09-08T01:10:00,ok
... Snippet 2
2014-09-08T02:00:35,ok
2014-09-08T02:55:00,error
2014-09-08T03:02:00,error
2014-09-08T03:05:00,error
2014-09-08T03:10:01,ok
... Snippet 3
2014-09-08T03:10:05,ok
2014-09-08T03:10:06,error
2014-09-08T03:10:07,error
2014-09-08T03:20:00,ok
2014-09-08T03:20:01,error
2014-09-08T03:20:02,error
2014-09-08T03:20:03,error
2014-09-08T03:20:04,ok
... Snippet 4
2014-09-08T03:20:09,ok
2014-09-08T04:00:00,error
2014-09-08T04:00:00,error
2014-09-08T04:05:00,error
2014-09-08T04:06:00,error
2014-09-08T04:07:00,error
2014-09-08T04:08:00,error
2014-09-08T04:09:00,error
2014-09-08T04:30:00,error
2014-09-08T04:35:00,error
2014-09-08T04:45:00,error
2014-09-08T04:50:00,error
2014-09-08T05:10:00,ok
...
... View more