Splunk Search

How to calculate uptime percentage based on my data?

rakes568
Explorer

Lets say my data is like this:

8/27/12 10:30:00.000 AM server=test1 and status=Down
8/27/12 10:29:00.000 AM server=test2 and status=Up
8/27/12 10:28:00.000 AM server=test3 and status=Down
8/27/12 10:27:00.000 AM server=test4 and status=Up
8/27/12 10:26:00.000 AM server=test1 and status=Up
8/27/12 10:25:00.000 AM server=test2 and status=Down
8/27/12 10:24:00.000 AM server=test3 and status=Up
8/27/12 10:23:00.000 AM server=test4 and status=Down

I want to calculate total uptime % for each server using total uptime(sum of all time differences between up status and next down status) divided by the total time starting when Splunk receives the first status message for a server.

johnward4
Communicator

What did you end up doing for this? I'm trying to do the same calculation but I'm trying to use the 

index=_index source=*splunkd.log (event_message="*Splunkd starting*" OR event_message="*Shutting down splunkd")

0 Karma

cmerriman
Super Champion

this might be a good starting point:

|sort 0 server _time|streamstats current=f window=1 values(status) as prevStatus values(_time) as prevTime by server|eval diff=_time-prevTime

i'm not sure if you're values always go from Up to Down/Down to Up. You might need to add an eval in there that says |eval UpToDown=if(prevStatus="Up" AND status="Down",diff,null()) or something along those lines if you want it from Up to Down.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Is the reporting of the status of the servers on a regular basis? Or does it come in only when the status changes? For example, if it comes in regularly, then I would expect to see an event every 5 minutes (or whatever intervalic is to come in). If it only comes in at a status change, then you might go the entire period of the search without a single entry for a server. This difference makes the approach to solving your problem completely different.

0 Karma

rakes568
Explorer

It's not on a regular basis. It gets reported only when status changes. But there can also be some cases, when two status consecutively received are Up(or Down).

0 Karma

jberwick_splunk
Splunk Employee
Splunk Employee

You could try using transaction this will combine the events and create a duration field which will be the time between the 2 events. "| transaction server startswith=status=Up endswith=status=Down"

You would then need to calculate the time from last 24 hrs for example and then work the percentage.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...