Splunk Search

How to calculate uptime percentage based on my data?

rakes568
Explorer

Lets say my data is like this:

8/27/12 10:30:00.000 AM server=test1 and status=Down
8/27/12 10:29:00.000 AM server=test2 and status=Up
8/27/12 10:28:00.000 AM server=test3 and status=Down
8/27/12 10:27:00.000 AM server=test4 and status=Up
8/27/12 10:26:00.000 AM server=test1 and status=Up
8/27/12 10:25:00.000 AM server=test2 and status=Down
8/27/12 10:24:00.000 AM server=test3 and status=Up
8/27/12 10:23:00.000 AM server=test4 and status=Down

I want to calculate total uptime % for each server using total uptime(sum of all time differences between up status and next down status) divided by the total time starting when Splunk receives the first status message for a server.

johnward4
Communicator

What did you end up doing for this? I'm trying to do the same calculation but I'm trying to use the 

index=_index source=*splunkd.log (event_message="*Splunkd starting*" OR event_message="*Shutting down splunkd")

0 Karma

cmerriman
Super Champion

this might be a good starting point:

|sort 0 server _time|streamstats current=f window=1 values(status) as prevStatus values(_time) as prevTime by server|eval diff=_time-prevTime

i'm not sure if you're values always go from Up to Down/Down to Up. You might need to add an eval in there that says |eval UpToDown=if(prevStatus="Up" AND status="Down",diff,null()) or something along those lines if you want it from Up to Down.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Is the reporting of the status of the servers on a regular basis? Or does it come in only when the status changes? For example, if it comes in regularly, then I would expect to see an event every 5 minutes (or whatever intervalic is to come in). If it only comes in at a status change, then you might go the entire period of the search without a single entry for a server. This difference makes the approach to solving your problem completely different.

0 Karma

rakes568
Explorer

It's not on a regular basis. It gets reported only when status changes. But there can also be some cases, when two status consecutively received are Up(or Down).

0 Karma

jberwick_splunk
Splunk Employee
Splunk Employee

You could try using transaction this will combine the events and create a duration field which will be the time between the 2 events. "| transaction server startswith=status=Up endswith=status=Down"

You would then need to calculate the time from last 24 hrs for example and then work the percentage.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...