All Apps and Add-ons

uptime / downtime pct over 30 days

tmarlette
Motivator

I'm trying to figure out how to show uptime percent of a device in percentage over 30 days that is agnostic to both linux and windows data.  

I am currently using

index=os sourcetype=Unix:Uptime 

as my data set, and it's a default data set that ships with the Linux TA. 

for windows I am using this search:

index=wineventlog LogName=System EventCode=6013 
|rex field=Message "uptime is (?<uptime>\d+) seconds" 
| eval Uptime_Minutes=uptime/60 
| eval LastBoot=_time-uptime 
| convert  ctime(LastBoot) 
| eval uptime=tostring(uptime, "duration")
| stats latest(_time) as time by host, Message, uptime, LastBoot

 

Currently, I can't figure out how to account for a reboot that occurs during the month.  The linux data doesn't have a 'LastBoot' field like the windows data, and I'm not sure how to create one. 

This is the closest that I've gotten is to use something like this for either linux or windows, and simply rename / create the 'uptime' field in seconds. 

index=nix sourcetype=Unix:Uptime 
| rename SystemUpTime as uptime
| streamstats sum(uptime) as total by host
| eval tot_up=(total/157697280)*100
| eval host_uptime=floor(tot_up)
| stats max(host_uptime) as pctUp by host



This is obviously crude, and I'm trying to refine it though i'm looking for any help. I'm obviously missing something, and i'm sure i'm not the first person to ask a question like this though I couldn't find anything specific to this on answers. 

I have a search that shows me total uptime in duration for either windows or linux, and that's great!  I'm just looking for the total uptime in percent over a 30 days span that accounts for reboots, or legitimate system hard down incidents. 

Labels (3)
Tags (3)
0 Karma

tscroggins
Influencer

@tmarlette 

If you're using Splunk Add-on for Unix and Linux and Splunk Add-on for Windows, you can use the uptime tag:

tag=uptime

Both add-ons have uptime inputs with default intervals of 84600 seconds. Both source types have a field named uptime with a value in seconds.

With that understanding in hand, we can assume any value greater than or equal to 86400 represents 86400 seconds of uptime, and any value less than 86400 seconds is that value:

tag=uptime earliest=-30d@d latest=@d
| stats sum(eval(min(uptime, 86400))) as uptime by host
| eval uptime_percent=uptime/2592000 ```86400 seconds * 30 days```

You may want to include an error measurement to allow for variation in uptime polling schedules, downtime following the last available uptime measurement, etc.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...