Re: uptime / downtime pct over 30 days

tmarlette · ‎12-28-2021

I'm trying to figure out how to show uptime percent of a device in percentage over 30 days that is agnostic to both linux and windows data.

I am currently using

index=os sourcetype=Unix:Uptime

as my data set, and it's a default data set that ships with the Linux TA.

for windows I am using this search:

index=wineventlog LogName=System EventCode=6013 
|rex field=Message "uptime is (?<uptime>\d+) seconds" 
| eval Uptime_Minutes=uptime/60 
| eval LastBoot=_time-uptime 
| convert  ctime(LastBoot) 
| eval uptime=tostring(uptime, "duration")
| stats latest(_time) as time by host, Message, uptime, LastBoot

Currently, I can't figure out how to account for a reboot that occurs during the month. The linux data doesn't have a 'LastBoot' field like the windows data, and I'm not sure how to create one.

This is the closest that I've gotten is to use something like this for either linux or windows, and simply rename / create the 'uptime' field in seconds.

index=nix sourcetype=Unix:Uptime 
| rename SystemUpTime as uptime
| streamstats sum(uptime) as total by host
| eval tot_up=(total/157697280)*100
| eval host_uptime=floor(tot_up)
| stats max(host_uptime) as pctUp by host

This is obviously crude, and I'm trying to refine it though i'm looking for any help. I'm obviously missing something, and i'm sure i'm not the first person to ask a question like this though I couldn't find anything specific to this on answers.

I have a search that shows me total uptime in duration for either windows or linux, and that's great! I'm just looking for the total uptime in percent over a 30 days span that accounts for reboots, or legitimate system hard down incidents.

tscroggins · ‎01-02-2022

@tmarlette

If you're using Splunk Add-on for Unix and Linux and Splunk Add-on for Windows, you can use the uptime tag:

tag=uptime

Both add-ons have uptime inputs with default intervals of 84600 seconds. Both source types have a field named uptime with a value in seconds.

With that understanding in hand, we can assume any value greater than or equal to 86400 represents 86400 seconds of uptime, and any value less than 86400 seconds is that value:

tag=uptime earliest=-30d@d latest=@d
| stats sum(eval(min(uptime, 86400))) as uptime by host
| eval uptime_percent=uptime/2592000 ```86400 seconds * 30 days```

You may want to include an error measurement to allow for variation in uptime polling schedules, downtime following the last available uptime measurement, etc.

uptime / downtime pct over 30 days

dashboard

search

troubleshooting

Detecting Remote Code Executions With the Splunk Threat Research Team

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!