I'm trying to figure out how to show uptime percent of a device in percentage over 30 days that is agnostic to both linux and windows data.
I am currently using
index=os sourcetype=Unix:Uptime
as my data set, and it's a default data set that ships with the Linux TA.
for windows I am using this search:
index=wineventlog LogName=System EventCode=6013
|rex field=Message "uptime is (?<uptime>\d+) seconds"
| eval Uptime_Minutes=uptime/60
| eval LastBoot=_time-uptime
| convert ctime(LastBoot)
| eval uptime=tostring(uptime, "duration")
| stats latest(_time) as time by host, Message, uptime, LastBoot
Currently, I can't figure out how to account for a reboot that occurs during the month. The linux data doesn't have a 'LastBoot' field like the windows data, and I'm not sure how to create one.
This is the closest that I've gotten is to use something like this for either linux or windows, and simply rename / create the 'uptime' field in seconds.
index=nix sourcetype=Unix:Uptime
| rename SystemUpTime as uptime
| streamstats sum(uptime) as total by host
| eval tot_up=(total/157697280)*100
| eval host_uptime=floor(tot_up)
| stats max(host_uptime) as pctUp by host
This is obviously crude, and I'm trying to refine it though i'm looking for any help. I'm obviously missing something, and i'm sure i'm not the first person to ask a question like this though I couldn't find anything specific to this on answers.
I have a search that shows me total uptime in duration for either windows or linux, and that's great! I'm just looking for the total uptime in percent over a 30 days span that accounts for reboots, or legitimate system hard down incidents.
If you're using Splunk Add-on for Unix and Linux and Splunk Add-on for Windows, you can use the uptime tag:
tag=uptime
Both add-ons have uptime inputs with default intervals of 84600 seconds. Both source types have a field named uptime with a value in seconds.
With that understanding in hand, we can assume any value greater than or equal to 86400 represents 86400 seconds of uptime, and any value less than 86400 seconds is that value:
tag=uptime earliest=-30d@d latest=@d
| stats sum(eval(min(uptime, 86400))) as uptime by host
| eval uptime_percent=uptime/2592000 ```86400 seconds * 30 days```
You may want to include an error measurement to allow for variation in uptime polling schedules, downtime following the last available uptime measurement, etc.