Hi Splunkers,
We are looking for a search which should give us host, if it is down, and when it came up again for that particular event.
For Ex: If a host went down at 9.31:23 Seconds and came up around 9.31:40 seconds, we need both of these values.
Sample event looks like as below:
index=linux sourcetype=restartsplunk host="server1" | rex field=_raw "splunk UF is (?<action>.*)" |
Time Event
5/10/16
10:04:01.000 AM
Tue May 10 10:04:01 EDT 2016 splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16
10:03:23.000 AM
Tue May 10 10:03:23 EDT 2016 splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16
10:03:01.000 AM
Tue May 10 10:03:01 EDT 2016 splunk UF is not running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
If we can see events, the host went down around 10.03:01, and it came up around 10:03:23 AM, and current status is running.
Can someone please help?
See if this gives you what you're looking for
index=linux sourcetype=restartsplunk | rex "splunk UF is (?<action1>.*)" | streamstats window=1 current=f last(action) as nextstate lastest(_time) as nextstatetime by host | eval state=case(action=="Running" AND nextstate=="Down", "Down", action=="Down" AND nextstate=="Running", "Recovered", 1=1, "No Change") | where state=="Down" OR state=="Recovered" | table *
I tried exploring this,guess there would be some problem with streamstats. Streamstats is not working as expected.
index=linux sourcetype=restartsplunk splunk is running |
eval status=if(match(_raw,"is running"),"up","down") |
table _time host status |
sort _time |
streamstats global=false current=false window=1 last(status) as previous_status by host |
where status!=previous_status
This search will create a table with the times of all up and down records. After a sort to get it in ascending time order, it adds a column that shows the previous seen state for that host. Finally, we only show those times when state changed, either up to down or down to up. You could change the where clause to status="up" and previous_status="down"
to see only the transitions from down to up.
This query is not returning any results.
Note: It is working as expected till sorting time.tried changing status="up" and previous_state=" ".
Like this:
index=linux sourcetype=restartsplunk | rex field=_raw "splunk UF is (?<currentState>.*)"
| streamstats count(eval(currentState="not running")) AS sessionID BY host
| eventstats earliest(_time) AS startTime latest(_time) AS endTime count AS sessionEventCount BY sessionID host
| streamstats current=t count AS sessionEventIndex BY sessionID host
| where sessionEventIndex=1 OR sessionEventIndex=sessionEventCount
| eval tempTime = if ((sessionEventIndex=sessionEventCount), startTime, null())
| streamstats current=f last(tempTime) AS startTimeNextSession BY host
| fields - tempTime
| eval sessionUpTime= tostring((endTime - startTime), "duration")
| eval sessionDownTime= tostring((startTimeNextSession - endTime), "duration")
| table host sessionID startTime endTime sessionUpTime sessionDownTime currentState
Note: sessionUpTime=0 means it is still up. This works for all hosts at the same time.
Hi , Thanks for this.
This query retrieving results like below:
host sessionID startTime endTime sessionUpTime sessionDownTime
server1 1 May 09 11:21:01 2016 May 09 21:43:02 2016 10:22:01 00:01:00
server2 0 May 09 11:21:01 2016 May 10 11:20:02 2016 23:59:01
Here , End time is server last downtime(May 09 21:43:02 2016):, now I am expecting server came up at (it came around (May 09 21:43:23 2016).
What is sessionUptime here?
Thanks
The sessionUpTime
is the duration between the first "is running" event (after the previous "is not running" event) and the next "is not running" event for each host. This group of events constitutes a sessoin
. The sessionDownTime
is the span between this session's "is not running" and the next session's "is running".
Got you, we are very close.
Can we get another field like "Server came up": this field should gives us the time it came up for the last down time .
For Ex: "End Time" field gives us last time server down(May 9 21:43:02 2016)
"Server up time" field should give us result like(May 9 21:44:01 2016) , If it came up in 1 minute.
Thanks
I have modified my answer so that it keeps the first and last event for each session. You should be able to modify it to fit from there.
See if this works please:
index=linux sourcetype=restartsplunk
| rex "splunk UF is (?<action1>.*)"
| eval actionTime=_time
| convert ctime(actionTime)
| stats latest(actionTime) AS servercameuptime by action1, host
| where action1="running"
| table host servercameuptime
| appendcols [search index=linux sourcetype=restartsplunk
| rex "splunk UF is (?<action2>.*)"
| eval actionTime2=_time
| convert ctime(actionTime2)
| stats latest(actionTime2) AS lastserverwentdowntime by action2, host
| where action2="not running"
| table lastserverwentdowntime]
| eval currentstatus=if(servercameuptime>lastserverwentdowntime,"running","not running")
| table host lastserverwentdowntime servercameuptime currentstatus
Hi,
Thanks, this works partially. This would give last down and current status.
But we are looking for last down status time and the uptime (when it came up for that particular down time), as our script writes logs for every minute it returns many events(would be up or down based on results).
Thanks
hey, i edited the answer around the same time you were looking at it. The original answer was incorrect. The new answer works with the data you provided:
host,lastRunningTime,lastNotRunningTime
server1,"05/10/2016 10:04:01","05/10/2016 10:03:01"
Do you want all downtimes for the same host?
Correct, We need all downtimes for a host(we have many hosts) and when it came up for that downtime.
For ex: If we see events below , server went down at 10:03:01 and came up 10:03:23,but rest of all events has status as UP for all times(as logs will generate for every minute).
now we needs table like this:
host, lastserverwentdown,servercameuptime,currentstatus
Time Event
5/10/16
10:05:02.000 AM
Tue May 10 10:04:01 EDT 2016 splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16
10:04:01.000 AM
Tue May 10 10:04:01 EDT 2016 splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16
10:03:23.000 AM
Tue May 10 10:03:23 EDT 2016 splunk UF is running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
5/10/16
10:03:01.000 AM
Tue May 10 10:03:01 EDT 2016 splunk UF is not running
host = server1 index = linux source = /home/splunk/log sourcetype = restartsplunk splunk_server = index1
Thanks i edited my original answer to give the exact results youre looking for.
Hi,
This is also not pointing me accurate results, instead this is giving incorrect for current status.
Thanks
Change the greater than to a less than in the if statement to reverse the logic.