Hi, I have a query that is meant to compare longitudinal count of an event of a given day (e.g. today) with historical longitudinal percentiles.
query 1:
index=interfaces sourcesession="MICHART_SIU_HL7_INBOUND" [`define_relative_week(1)`] date_wday!=saturday date_wday!=sunday | bin _time span=20m | stats count as count by _time | eval timeofday=tonumber(strftime(_time, "%H"))*3600 + tonumber(strftime(_time,"%M"))*60 | stats perc5(count) as perc5 perc95(count) as perc95 by timeofday | eval timeofday= tostring(timeofday,"duration") | join type=outer timeofday [search index=interfaces sourcesession="MICHART_SIU_HL7_INBOUND" | bin _time span=20m | eval timeofday=tonumber(strftime(_time, "%H"))*3600 + tonumber(strftime(_time,"%M"))*60| eval timeofday= tostring(timeofday,"duration")| stats count by timeofday] |sort timeofday
where define_relative_week
is a macro:
stats count | addinfo | eval earliest=(info_min_time-604800*$n$) | eval earliest=strftime(earliest,"%m/%d/%Y:%H:%M:%S") | eval latest=(info_max_time-86400) | eval latest=strftime(latest,"%m/%d/%Y:%H:%M:%S") | return earliest,latest
Basically, this search grabs historical data (previous week) and bins the time, and then compute percentile by binned time.
Some of the things that I'm grappling with are:
query 2:
| gentimes end=-1 increment=24h [stats count | addinfo | eval start=(info_min_time-604800*2) | eval start=strftime(start,"%m/%d/%y:%H:%M:00")| return start]
| rename starttime as earliest
| eval latest=earliest+20*60
| sort - earliest
| map maxsearches=99999
search="search earliest=$earliest$ latest=$latest$ index=interfaces sourcesession=MICHART_SIU_HL7_INBOUND " | ...
So here is the query that is being tested and it needs to be tweaked to further reduce false positives.
index=interfaces earliest=-10d latest=@d | bucket _time span=10m | stats count by _time sourcesession | eval timeOfDay=strftime(_time,"%H:%M") | stats p5(count) as perc5 p95(count) as perc95 by timeOfDay sourcesession | append [ search index=interfaces earliest=-40m latest=-10m | bucket _time span=10m | eval timeOfDay=strftime(_time,"%H:%M") | stats count by timeOfDay sourcesession |fillnull ]
| stats first(perc5) as perc5 first(perc95) as perc95 first(count) as count by timeOfDay sourcesession | eval etime=strptime(timeOfDay,"%H:%M") | where etime>=relative_time(now(),"-40m") AND etime<relative_time(now(),"-10m") | fillnull | where (count<perc5) | fields - etime | sort sourcesession timeOfDay | stats count by sourcesession |search count>=3
Wow - this is complicated. I think you can simplify a lot by using Splunk's built-in time math...
index=interfaces earliest=-7d latest=@d sourcesession="MICHART_SIU_HL7_INBOUND"
| bucket _time span=20m | eval timeOfDay=strftime(_time,"%H:%M")
| stats count by timeOfDay
| start p5(count) as perc5 p95(count) as perc95 by timeOfDay
| append [ search index=interfaces earliest=@d
| bucket _time span=20m | eval timeOfDay=strftime(_time,"%H:%M")
| stats count by timeOfDay ]
| chart first(perc5) as perc5 first(perc95) as perc95 first(count) as count by timeOfDay
This makes a nice chart. If you want to build an alert - try this
index=interfaces earliest=-7d latest=@d
| bucket _time span=20m | eval timeOfDay=strftime(_time,"%H:%M")
| stats count by timeOfDay sourcesession
| start p5(count) as perc5 p95(count) as perc95 by timeOfDay sourcesession
| append [ search index=interfaces earliest=@d
| bucket _time span=20m | eval timeOfDay=strftime(_time,"%H:%M")
| stats count by timeOfDay sourcesession ]
| stats first(perc5) as perc5 first(perc95) as perc95 first(count) as count by timeOfDay sourcesession
| where count < perc5 OR count > perc95
Set the alert condition for "# results > 0". Note that the alert checks across all source sessions at once. If you leave off the last line, it will make a report as well, but it won't look as nice as the first one.
Oh, I didn't use the macro at all.
doesn't quite work yet. Besides some errors:
*"stats count by _time sourcesession" should be added before eval.
*typo "start"
There are two intertwining problems in result table. Suppose current time is 10:20, and suppose between 9:00-10:00, there is NO data at all for a sourcesession.
1. The abnormal row will be filtered out, because it's count==NULL. However, if I fill NULL count with 0,
2. then I will include extra rows such as rows for timeofday=11:00, because its count is also NULL.
How do I deal with this?