Splunk Search

How to search if a host is down and when it came back up again for a particular event?

splunker9999
Path Finder

Hi Splunkers,

We are looking for a search which should give us host, if it is down, and when it came up again for that particular event.

For Ex: If a host went down at 9.31:23 Seconds and came up around 9.31:40 seconds, we need both of these values.

Sample event looks like as below:

    index=linux sourcetype=restartsplunk host="server1" | rex field=_raw "splunk UF is (?<action>.*)" |

Time    Event
5/10/16 
10:04:01.000 AM 
Tue May 10 10:04:01 EDT 2016  splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16 
10:03:23.000 AM 
Tue May 10 10:03:23 EDT 2016  splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16 
10:03:01.000 AM 
Tue May 10 10:03:01 EDT 2016  splunk UF is not running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1

If we can see events, the host went down around 10.03:01, and it came up around 10:03:23 AM, and current status is running.

Can someone please help?

0 Karma

sundareshr
Legend

See if this gives you what you're looking for

 index=linux sourcetype=restartsplunk | rex "splunk UF is (?<action1>.*)" | streamstats window=1 current=f last(action) as nextstate lastest(_time) as nextstatetime by host | eval state=case(action=="Running" AND nextstate=="Down", "Down", action=="Down" AND nextstate=="Running", "Recovered", 1=1, "No Change") | where state=="Down" OR state=="Recovered" | table *
0 Karma

splunker9999
Path Finder

I tried exploring this,guess there would be some problem with streamstats. Streamstats is not working as expected.

0 Karma

twinspop
Influencer
index=linux sourcetype=restartsplunk splunk is running |
eval status=if(match(_raw,"is running"),"up","down") |
table _time host status |
sort _time |
streamstats global=false current=false window=1 last(status) as previous_status by host |
where status!=previous_status

This search will create a table with the times of all up and down records. After a sort to get it in ascending time order, it adds a column that shows the previous seen state for that host. Finally, we only show those times when state changed, either up to down or down to up. You could change the where clause to status="up" and previous_status="down" to see only the transitions from down to up.

splunker9999
Path Finder

This query is not returning any results.

Note: It is working as expected till sorting time.tried changing status="up" and previous_state=" ".

0 Karma

woodcock
Esteemed Legend

Like this:

index=linux sourcetype=restartsplunk | rex field=_raw "splunk UF is (?<currentState>.*)"
| streamstats count(eval(currentState="not running")) AS sessionID BY host
| eventstats earliest(_time) AS startTime latest(_time) AS endTime count AS sessionEventCount BY sessionID host
| streamstats current=t count AS sessionEventIndex BY sessionID host
| where sessionEventIndex=1 OR sessionEventIndex=sessionEventCount
| eval tempTime = if ((sessionEventIndex=sessionEventCount), startTime, null())
| streamstats current=f last(tempTime) AS startTimeNextSession BY host
| fields - tempTime
| eval sessionUpTime= tostring((endTime - startTime), "duration")
| eval sessionDownTime= tostring((startTimeNextSession - endTime), "duration")
| table host sessionID startTime endTime sessionUpTime sessionDownTime currentState

Note: sessionUpTime=0 means it is still up. This works for all hosts at the same time.

0 Karma

splunker9999
Path Finder

Hi , Thanks for this.

This query retrieving results like below:

 host       sessionID      startTime     endTime       sessionUpTime   sessionDownTime
server1 1   May 09 11:21:01 2016    May 09 21:43:02 2016    10:22:01    00:01:00
  server2   0   May 09 11:21:01 2016    May 10 11:20:02 2016    23:59:01    

Here , End time is server last downtime(May 09 21:43:02 2016):, now I am expecting server came up at (it came around (May 09 21:43:23 2016).

What is sessionUptime here?

Thanks

0 Karma

woodcock
Esteemed Legend

The sessionUpTime is the duration between the first "is running" event (after the previous "is not running" event) and the next "is not running" event for each host. This group of events constitutes a sessoin. The sessionDownTime is the span between this session's "is not running" and the next session's "is running".

0 Karma

splunker9999
Path Finder

Got you, we are very close.

Can we get another field like "Server came up": this field should gives us the time it came up for the last down time .

For Ex: "End Time" field gives us last time server down(May 9 21:43:02 2016)
"Server up time" field should give us result like(May 9 21:44:01 2016) , If it came up in 1 minute.

Thanks

0 Karma

woodcock
Esteemed Legend

I have modified my answer so that it keeps the first and last event for each session. You should be able to modify it to fit from there.

jkat54
SplunkTrust
SplunkTrust

See if this works please:

index=linux sourcetype=restartsplunk
 | rex "splunk UF is (?<action1>.*)" 
 | eval actionTime=_time
 | convert ctime(actionTime)
 | stats latest(actionTime) AS servercameuptime by action1, host
 | where action1="running"
 | table host servercameuptime 
 | appendcols [search index=linux sourcetype=restartsplunk  
   | rex "splunk UF is (?<action2>.*)" 
   | eval actionTime2=_time
   | convert ctime(actionTime2)
   | stats latest(actionTime2) AS lastserverwentdowntime by action2, host
   | where action2="not running"
   | table lastserverwentdowntime]
 | eval currentstatus=if(servercameuptime>lastserverwentdowntime,"running","not running")
 | table host lastserverwentdowntime servercameuptime currentstatus
0 Karma

splunker9999
Path Finder

Hi,

Thanks, this works partially. This would give last down and current status.

But we are looking for last down status time and the uptime (when it came up for that particular down time), as our script writes logs for every minute it returns many events(would be up or down based on results).

Thanks

0 Karma

jkat54
SplunkTrust
SplunkTrust

hey, i edited the answer around the same time you were looking at it. The original answer was incorrect. The new answer works with the data you provided:

host,lastRunningTime,lastNotRunningTime
server1,"05/10/2016 10:04:01","05/10/2016 10:03:01"

0 Karma

jkat54
SplunkTrust
SplunkTrust

Do you want all downtimes for the same host?

0 Karma

splunker9999
Path Finder

Correct, We need all downtimes for a host(we have many hosts) and when it came up for that downtime.

For ex: If we see events below , server went down at 10:03:01 and came up 10:03:23,but rest of all events has status as UP for all times(as logs will generate for every minute).

now we needs table like this:
host, lastserverwentdown,servercameuptime,currentstatus

Time    Event
 5/10/16 
 10:05:02.000 AM    
 Tue May 10 10:04:01 EDT 2016  splunk UF is running
 host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
 5/10/16 
 10:04:01.000 AM    
 Tue May 10 10:04:01 EDT 2016  splunk UF is running
 host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
 5/10/16 
 10:03:23.000 AM    
 Tue May 10 10:03:23 EDT 2016  splunk UF is running
 host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
 5/10/16 
 10:03:01.000 AM    
 Tue May 10 10:03:01 EDT 2016  splunk UF is not running
 host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
0 Karma

jkat54
SplunkTrust
SplunkTrust

Thanks i edited my original answer to give the exact results youre looking for.

0 Karma

splunker9999
Path Finder

Hi,

This is also not pointing me accurate results, instead this is giving incorrect for current status.
Thanks

0 Karma

jkat54
SplunkTrust
SplunkTrust

Change the greater than to a less than in the if statement to reverse the logic.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...