Solved: Eval limitations on si index with sistats count by...

Starlette · ‎09-26-2011

I have a search to SI index=sec marker=01

sourcetype=cisco_firewall | bin _time span=5m | sistats count by log_level, hostname

When calling the results with

index=sec marker=01 | stats count by log_level, hostname

I get results,,,but :

index=sec marker=01  | stats count(eval(log_level="7")) AS Debugging, count(eval(log_level="6")) AS Information, count(eval(log_level="5")) AS Notification, count(eval(log_level="4")) AS Warning, count(eval(log_level="3")) AS Error, count(eval(log_level="2")) AS Critical, count(eval(log_level="1")) AS Alert, count(eval(log_level="0")) AS Emergency by hostname

isnt,,,am i missing something? if this is not possible how can i "SI" this search?

dwaddle · ‎09-26-2011

An SQL-like table pivot should serve as reasonable workaround for this. It's not pretty, but it does work.

index=sec marker=01 
| stats count by log_level, host 
| eval Debug_Count=if(log_level==7,count,0) 
| eval Info_Count=if(log_level==6,count,0) 
| eval Notify_Count=if(log_level==5,count,0) 
| eval Warning_Count=if(log_level==4,count,0) 
| eval Error_Count=if(log_level==3,count,0) 
| eval Alert_Count=if(log_level==2,count,0) 
| eval Emergency_Count=if(log_level==1,count,0) 
| stats max(Debug_Count) as Debugging, 
        max(Info_Count) as Informational, 
        max(Notify_Count) as Notification,
        max(Warning_Count) as Warning
        max(Error_Count) as Error
        max(Alert_Count) as Alert
        max(Emergency_Count) as Emergency
        by host

The general idea is to take the "rows" containing data for each log_level and pivot them up into "columns". Coming out of the first stats is a series of counts for each log_level and host combination. The series of eval operations creates new fields ("columns") that either have the actual value from a "row" if (and only if) that "row" contains data of interest in that "column". Otherwise, the value in that row is 0. Then, the second stats filters down to only the non-zero values in each new field for each host.

This is a fairly common trick done in SQL with a combination of CASE and GROUP BY. I've just adapted it a bit to Splunk's syntax.

View solution in original post

andersmholmgren · ‎12-12-2011

I know it's rather late in the piece, but I just hit the same problem. The problem seems to be that the name of the field that is saved in the summary index is different to the one produced by sistats and consequently what is expected by stats running against the si.

in my case the name generated by sistats was psrsvd_ct_eval(MedianDurationAboveObjective > 0) but the name in the index is psrsvd_ct_eval_MedianDurationAboveObjective__0

So a simple workaround (what must surely be a bug) for me was

index=summary | rename "psrsvd_ct_eval_MedianDurationAboveObjective___0_" as "psrsvd_ct_eval(MedianDurationAboveObjective > 0)" | stats ...

dwaddle · ‎09-26-2011

An SQL-like table pivot should serve as reasonable workaround for this. It's not pretty, but it does work.

index=sec marker=01 
| stats count by log_level, host 
| eval Debug_Count=if(log_level==7,count,0) 
| eval Info_Count=if(log_level==6,count,0) 
| eval Notify_Count=if(log_level==5,count,0) 
| eval Warning_Count=if(log_level==4,count,0) 
| eval Error_Count=if(log_level==3,count,0) 
| eval Alert_Count=if(log_level==2,count,0) 
| eval Emergency_Count=if(log_level==1,count,0) 
| stats max(Debug_Count) as Debugging, 
        max(Info_Count) as Informational, 
        max(Notify_Count) as Notification,
        max(Warning_Count) as Warning
        max(Error_Count) as Error
        max(Alert_Count) as Alert
        max(Emergency_Count) as Emergency
        by host

The general idea is to take the "rows" containing data for each log_level and pivot them up into "columns". Coming out of the first stats is a series of counts for each log_level and host combination. The series of eval operations creates new fields ("columns") that either have the actual value from a "row" if (and only if) that "row" contains data of interest in that "column". Otherwise, the value in that row is 0. Then, the second stats filters down to only the non-zero values in each new field for each host.

This is a fairly common trick done in SQL with a combination of CASE and GROUP BY. I've just adapted it a bit to Splunk's syntax.

Starlette · ‎09-26-2011

this works like a charm!

sideview · ‎09-26-2011

Oddly enough, if you change the 'sistats' to stats it'll work fine.

In general you want to be careful using the SI commands, and the rule is that if you do sistats count by log_level, hostname in your summary search, then the first reporting clause off of the summary events should be an exact copy of that "si" search but without the 'si'. ie -- stats count by log_level, hostname.

As to a full explanation of what can go off the rails when you cross the streams and why, it's a little beyond me. In this case it's so weird that it may just be a bug.

The good news is that there isnt really any reason to use sistats in such a simple use case. The fancy statistics that it preserves don't add anything here.

UPDATE: One thing you'll have to watch out if you take this approach and use stats instead of sistats, is that stats count on the search side will simply count the number of rows, and pay no attention to the fact that those rows themselves have 'count' fields. As such you often have to do things like stats sum(count) as count by host when you use stats manually in summary indexing. In this particular case with the eval syntax, I'm not sure what the equivalent would be but there is one. Watch out you dont end up with the count of summary rows rather than count of the original events.

Starlette · ‎09-26-2011

Glad i reckon its not my misconfiguring!
Thanks nick

Eval limitations on si index with sistats count by field1, field2 etc

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes