Solved: Availability Panel in Status History Dashboard

klaxdal · ‎12-21-2017

I am trying to adjust the Availability panel within the Status History to show month over month availability percentages - it currently only shows the aggregate e.g. displays one value as opposed to breaking out availability month -to -month

Here is the current search :

sourcetype="web_ping" title="" NOT (date_hour>=1 date_hour<120) NOT (date_hour>=4 date_hour<420) NOT (date_hour>=21 date_hour<2120) | fillnull value=1000 response_code | eval success=case(response_code>=400, 0, timed_out == "True", 0) | fillnull value=1 success | chart count as total, sum(success) as successes | eval availability=round(100(successes/total),2) | fields availability

Any help would be greatly appreciated !

niketn · ‎12-21-2017

@cmerriman, I think you missed multiplication for percent i.e. | eval availability= round(100*(success/total),2). However, I think the complete query in the question is not ideal way of writing SPL. Also for correct SPL we would still need for information around data and intent of SPL in question.

@klaxdal, please post code on Splunk answer using the code button (101010) so that special characters do not escape. Following things are not clear in your query
1) date_hour ideally should be between 0 to 24. However, you have used 120, 420, 2120, which would never exist. Please explain the intent of using date_hour filter. Also inclusion is better than exclusion. See if you can add positive match rather than negative match using NOT. Further you should also know that != and NOT behavior are different.

2) If we were to use your query, the success eval case() should ideally be re-written as following and fillnull value=1000 response_code and | fillnull value=1 success would NOT be required.

| eval success=case(response_code>=400 OR isnull(response_code) OR timed_out == "True", 0,true(),1)

3) However, there is even better way of writing the query with count with eval within timechart command since evals should ideally be performed after transforming commands (statistical commands that generate tabular data).

Try the following search after correcting date_hour in your base search:

 <Your_Base_Search_With_Valid_date_hour_Filter> 
| timechart span=1mon count as total, count(eval((response_code<400 AND isnotnull(response_code)) OR timed_out!="True")) as successes
| eval availability=round(100*(successes/total),2) 
| fields _time availability

Also, OR timed_out!="True" can be AND timed_out!="True" depending on whether the field is timed_out is available on all events or not i.e. either time_out=="True" or timed_out=="False". Based on your query time_out=="True" condition would be evaluated only if response_code<400. Please confirm if this is the expected use case or not. Also confirm if time_out is not True then what is its value?

PS: Most of these tips are already listed in Splunk Documentation: http://docs.splunk.com/Documentation/Splunk/latest/Search/Quicktipsforoptimization

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn · ‎12-21-2017

@cmerriman, I think you missed multiplication for percent i.e. | eval availability= round(100*(success/total),2). However, I think the complete query in the question is not ideal way of writing SPL. Also for correct SPL we would still need for information around data and intent of SPL in question.

@klaxdal, please post code on Splunk answer using the code button (101010) so that special characters do not escape. Following things are not clear in your query
1) date_hour ideally should be between 0 to 24. However, you have used 120, 420, 2120, which would never exist. Please explain the intent of using date_hour filter. Also inclusion is better than exclusion. See if you can add positive match rather than negative match using NOT. Further you should also know that != and NOT behavior are different.

2) If we were to use your query, the success eval case() should ideally be re-written as following and fillnull value=1000 response_code and | fillnull value=1 success would NOT be required.

| eval success=case(response_code>=400 OR isnull(response_code) OR timed_out == "True", 0,true(),1)

3) However, there is even better way of writing the query with count with eval within timechart command since evals should ideally be performed after transforming commands (statistical commands that generate tabular data).

Try the following search after correcting date_hour in your base search:

 <Your_Base_Search_With_Valid_date_hour_Filter> 
| timechart span=1mon count as total, count(eval((response_code<400 AND isnotnull(response_code)) OR timed_out!="True")) as successes
| eval availability=round(100*(successes/total),2) 
| fields _time availability

Also, OR timed_out!="True" can be AND timed_out!="True" depending on whether the field is timed_out is available on all events or not i.e. either time_out=="True" or timed_out=="False". Based on your query time_out=="True" condition would be evaluated only if response_code<400. Please confirm if this is the expected use case or not. Also confirm if time_out is not True then what is its value?

PS: Most of these tips are already listed in Splunk Documentation: http://docs.splunk.com/Documentation/Splunk/latest/Search/Quicktipsforoptimization

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

klaxdal · ‎12-21-2017

Thanks so much for your help !

cmerriman · ‎12-21-2017

if you want this month by month try:

sourcetype="web_ping" title="" NOT (date_hour>=1 date_hour<120) NOT (date_hour>=4 date_hour<420) NOT (date_hour>=21 date_hour<2120) | fillnull value=1000 response_code | eval success=case(response_code>=400, 0, timed_out == "True", 0) | fillnull value=1 success | timechart span=1mon count as total, sum(success) as successes | eval availability=round(100(successes/total),2) | fields _time availability

you need to add a timefield to your chart and fields commands.

Availability Panel in Status History Dashboard

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes