I have logs like this:
10:40:00 AM: id=1,status=SUCCESS
10:45:17 AM: id=2,status=SUCCESS
11:00:23 AM: id=34,status=SUCCESS
11:15:49 AM: id=1,status=SUCCESS
11:20:59 AM: id=2,status=SUCCESS
I want to write a query, that brings me only those records where I see logs for the same identifier in a short span of time.
Look at this one:
10:40:00 AM: id=1,status=SUCCESS
10:40:02 AM: id=1,status=SUCCESS
10:40:15 AM: id=1,status=SUCCESS
10:45:17 AM: id=2,status=SUCCESS
10:45:23 AM: id=2,status=SUCCESS
11:00:23 AM: id=34,status=SUCCESS
11:15:49 AM: id=1,status=SUCCESS
11:20:59 AM: id=2,status=SUCCESS
If you look at the above sample there are 3 success states for id=1 at 10:40:00, 10:40:02 and 10:40:15 and 2 success states for id=2 at 10:45:17 and 10:45:23 AM.
I'm interested in this where I want to display repeated logs that happened in a short span of time.
When I run a query the output has to be just the following:
10:40:00 AM: id=1,status=SUCCESS
10:40:02 AM: id=1,status=SUCCESS
10:40:15 AM: id=1,status=SUCCESS
10:45:17 AM: id=2,status=SUCCESS
10:45:23 AM: id=2,status=SUCCESS
As for ID=1 and 2 I see many records within seconds (this will be something I want to specify in the query as well).
I would probably use streamstats
for this.
| sort 0 id _time
| streamstats current=f timewindow=30s count as idcount by id
| eval newgroup=case(isnull(idcount),1,idcount=0,1,true(),0)
| streamstats sum(newgroup) as groupno by id
| eventstats count as groupcount by id groupno
The above counts records for an id all as the same group if each is within 30s of the prior one. The minute that there is no prior record for the same id within 30s previously, it counts as a new group, so a group might have one record in it.