Splunk Search

How can I calculate application outages based on time?

nabeel652
Builder

Hi Guys

I am having problem calculating application outages. I am polling for application state and data looks like this:

_time ---------------------- Application--------------Status
1/12/17 1:20pm-----------App1 --------------------Up
1/12/17 1:20pm -----------App2--------------------Up
1/12/17 1:25pm -----------App1-------------------- Down
1/12/17 1:25pm -----------App2 -------------------- Up
1/12/17 1:30pm -----------App1-------------------- Down
1/12/17 1:30pm-----------App2--------------------Up
1/12/17 1:35pm-----------App1--------------------Up
1/12/17 1:35pm -----------App2--------------------Up
1/12/17 1:40pm -----------App1 --------------------Down
1/12/17 1:40pm-----------App2 --------------------Up
1/12/17 1:45pm-----------App1 --------------------Up ...So on

Now the problem is I want to calculate:
1. For what intervals App1 was Up and down
2. For how long overall App1 was Up and down

I've tried | stats range(_time) but it doesn't work as I think it calculates the range from earliest and latest times when app was down so calculates the intervals as "Down" when the app was Up in between two Down intervals. I have around 60 apps and two statuses (Up/Down) -
Note: The data is not coming in at regular intervals so I cannot do a count and calculate each value as 5mins - It was just to make the example more readable.

0 Karma
1 Solution

niketn
Legend

@nabeel652, seems like you are getting data App Status events every 5 minutes (300 seconds). Based on the same please try the following code with streamstats you should be able to get the duration between previous and current time of the status for each app. Then filter the results where App's status flips in subsequent event or the event with total duration with App's same status.

 <YourBaseSearch>
| sort Application _time
| streamstats last(_time) as lastTime by Application current=f window=1
| eval duration=round(_time-lastTime,0)
| fillnull duration value=300
| fieldformat lastTime=strftime(lastTime,"%d/%m/%y %H:%M%p")
| streamstats sum(duration) as StatusDuration by Application Status reset_on_change=true
| reverse
| streamstats values(StatusDuration) as StatusDurationMV by Application Status reset_on_change=true
| eval keepFlg=mvcount(StatusDurationMV)
| search keepFlg=1

Following is run anywhere search

| makeresults
| fields - _time
| eval data="1/12/17 1:20pm-----------App1--------------------Up;1/12/17 1:20pm-----------App2--------------------Up;1/12/17 1:25pm-----------App1--------------------Down;1/12/17 1:25pm-----------App2--------------------Up;1/12/17 1:30pm-----------App1--------------------Down;1/12/17 1:30pm-----------App2--------------------Up;1/12/17 1:35pm-----------App1--------------------Up;1/12/17 1:35pm-----------App2--------------------Up;1/12/17 1:40pm-----------App1--------------------Down;1/12/17 1:40pm-----------App2--------------------Up;1/12/17 1:45pm-----------App1--------------------Up"
| makemv delim=";" data 
| mvexpand data
| rename data as _raw
| rex "(?<_time>\d{1,2}\/\d{1,2}\/\d{2,4}\s\d{1,2}:\d{1,2}(a|p)m)-+(?<Application>[^-]+)\-+(?<Status>\w+)"
| fields - _raw
| eval _time=strptime(_time,"%d/%m/%y %H:%M%p")
| sort Application _time
| streamstats last(_time) as lastTime by Application current=f window=1
| eval duration=round(_time-lastTime,0)
| fillnull duration value=300
| fieldformat lastTime=strftime(lastTime,"%d/%m/%y %H:%M%p")
| streamstats sum(duration) as StatusDuration by Application Status reset_on_change=true
| reverse
| streamstats values(StatusDuration) as StatusDurationMV by Application Status reset_on_change=true
| eval keepFlg=mvcount(StatusDurationMV)
| search keepFlg=1
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

@nabeel652, seems like you are getting data App Status events every 5 minutes (300 seconds). Based on the same please try the following code with streamstats you should be able to get the duration between previous and current time of the status for each app. Then filter the results where App's status flips in subsequent event or the event with total duration with App's same status.

 <YourBaseSearch>
| sort Application _time
| streamstats last(_time) as lastTime by Application current=f window=1
| eval duration=round(_time-lastTime,0)
| fillnull duration value=300
| fieldformat lastTime=strftime(lastTime,"%d/%m/%y %H:%M%p")
| streamstats sum(duration) as StatusDuration by Application Status reset_on_change=true
| reverse
| streamstats values(StatusDuration) as StatusDurationMV by Application Status reset_on_change=true
| eval keepFlg=mvcount(StatusDurationMV)
| search keepFlg=1

Following is run anywhere search

| makeresults
| fields - _time
| eval data="1/12/17 1:20pm-----------App1--------------------Up;1/12/17 1:20pm-----------App2--------------------Up;1/12/17 1:25pm-----------App1--------------------Down;1/12/17 1:25pm-----------App2--------------------Up;1/12/17 1:30pm-----------App1--------------------Down;1/12/17 1:30pm-----------App2--------------------Up;1/12/17 1:35pm-----------App1--------------------Up;1/12/17 1:35pm-----------App2--------------------Up;1/12/17 1:40pm-----------App1--------------------Down;1/12/17 1:40pm-----------App2--------------------Up;1/12/17 1:45pm-----------App1--------------------Up"
| makemv delim=";" data 
| mvexpand data
| rename data as _raw
| rex "(?<_time>\d{1,2}\/\d{1,2}\/\d{2,4}\s\d{1,2}:\d{1,2}(a|p)m)-+(?<Application>[^-]+)\-+(?<Status>\w+)"
| fields - _raw
| eval _time=strptime(_time,"%d/%m/%y %H:%M%p")
| sort Application _time
| streamstats last(_time) as lastTime by Application current=f window=1
| eval duration=round(_time-lastTime,0)
| fillnull duration value=300
| fieldformat lastTime=strftime(lastTime,"%d/%m/%y %H:%M%p")
| streamstats sum(duration) as StatusDuration by Application Status reset_on_change=true
| reverse
| streamstats values(StatusDuration) as StatusDurationMV by Application Status reset_on_change=true
| eval keepFlg=mvcount(StatusDurationMV)
| search keepFlg=1
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

nabeel652
Builder

Awesome mate. Almost there but a small thing:

The time ranges don't match the outages - for instance.

App1 Down 600 600 2017-12-01 01:30:00 300 1 01/12/17 01:25AM

_time and lasttime show a 5 mins span but outage is 600 secs. Is there a way we can keep both the time ranges also intact as for this one

App1 Down 600 600 2017-12-01 01:35:00 300 1 01/12/17 01:25AM

There are 5 minutes extra in both app results.

Clever solution though (y).

0 Karma

niketn
Legend

@nabeel652, lastTime field is of no significance after the keepFlg filter is applied, as it keeps the _time of previous event for the same App.

There could be two approaches to get startTime
1) Use stats to retain statTime of events to be filtered later. However, this might complicate and will definitely be more expensive.

2) In the final result you have Each App's Status _time and Duration, from which you can capture the start time. Just pipe the following eval as final command in your SPL.

| eval startTime=case(isnull(lastTime), strftime(_time + 300,"%d/%m/%y %H:%M%p") ,true(),strftime(_time - StatusDuration,"%d/%m/%y %H:%M%p"))

PS: 1st Status for Each App is the only event which will have lastTime as NULL(since there was not event preceding the same). So only for that Status the duration is _time + 300 seconds. Hope it makes sense. You can always remove filter | search keepFlg=1 to retain all events and understand how commands are working.

Thanks for kind words. Your issue seemed challenging, first streamstats was easy but then requirement became a bit challenging as same status continued for several events. But there are experts here for whom this would a piece of cake and who knows maybe a better answer 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nabeel652
Builder

thanks for your help mate 🙂

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...