Splunk Search

Number of seconds that have events by host

AssafLowenstein
Explorer

Hello experts!
My system is potentially producing several events per second and sometimes even several events at the same millisecond.
When there are events within a second - no matter how many events, I want to count that second. At the end of say 24 hours scan, I have the number of seconds that had events and by that I'm able to produce a down time.
Problem is that I can't figure out how to make that calculation by host.
So, this is my current search string -

sourcetype= ""
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time) AS c_time
| stats count as individual_event by c_time
| stats sum(individual_event) as total_sec by c_time
| stats count AS num_of_seconds_with_errors

I hope my explanation is suffice.
Thanks in advance.
Assaf

Tags (1)
0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Try this, which will get you a count by second:

sourcetype="whatever" index="whatever"... 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time) AS c_time
| stats count AS Events_Per_Second by c_time, host

So, in my case for my home firewall that shows essentially 1 all the way down (with an occasional 2 or 3), and many seconds "missing" because they had no data.

If I add this to the end of that:

... | stats count AS Seconds_With_Data

That gives the number of seconds in my time frame (in this case I used last 24 hours) where there was data. My answer was 8212.

What else did you need?

View solution in original post

woodcock
Esteemed Legend

Try this:

| tstats count where sourcetype=YourSourcetypeHere BY host _time span=1s 
| stats count(eval(count>0)) AS OK_seconds BY host
| addinfo | eval span_in_seconds = info_max_time - info_min_time + 1 | fields - info_*
| eval down_seconds = span_in_seconds - OK_seconds
0 Karma

AssafLowenstein
Explorer

Thanks.
I'm getting Error in 'TsidxStats': WHERE clause is not an exact query
My WHERE clause has sourcetype=<some source type> "<string that I'm looking for>"

0 Karma

woodcock
Esteemed Legend

There is no mention in your question of any <string that I'm looking for>. Because of this additional requirement, the tstats option cannot be used and you should go with the answer from @rich7177 which can accommodate this new detail.

0 Karma

AssafLowenstein
Explorer

Thanks.
I'm kinda new to Splunk so didn't realize that having would make tstats irrelevant. sorry bout that.
Thanks 🙂

niketn
Legend

How about the following:

| tstats count where sourcetpe=<YourSourceType> by host _time span=1s

tstats should work better than stats for the scenario describled above where you are interested only in events aggregated by metadata fields (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Tstats)

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Richfez
SplunkTrust
SplunkTrust

Try this, which will get you a count by second:

sourcetype="whatever" index="whatever"... 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time) AS c_time
| stats count AS Events_Per_Second by c_time, host

So, in my case for my home firewall that shows essentially 1 all the way down (with an occasional 2 or 3), and many seconds "missing" because they had no data.

If I add this to the end of that:

... | stats count AS Seconds_With_Data

That gives the number of seconds in my time frame (in this case I used last 24 hours) where there was data. My answer was 8212.

What else did you need?

AssafLowenstein
Explorer

I'm not looking for events per second. Let me reiterate.. if some event happened then I need to count that second. For example, if for host X I had events in the following milliseconds -
19:00:45:000
19:00:45:030
19:00:45:150
19:00:46:180
19:00:46:240

then for 19:00:45 i'll get 1 (more than 1 event happened in that second so I'm counting that second) and for 19:00:46 i'll get 1 (again, 2 events happened in that second so I'm counting that second).
Eventually, for host X in that time frame the final result will be 2 because we found 2 seconds that had events.
The same should be done for the rest of the hosts.

End goal is to calculate overall system uptime. uptime is defined as time w/o any events. so we wish to calculate uptime for each host and then average across all hosts.

Hope that clears things up now.
Thanks again.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Great. In that case, I think you already have most of the answer.

The second piece, adding | stats count AS Seconds_With_Data to the end isn't a sum of events per second, it's literally a count of individual seconds that had any data at all in whatever time period you chose. So, if you did "last 24 hours" it will be a count of the seconds where there was a 1 event or more, ignoring all the seconds without any events.

Since there are 86400 seconds per day...

your base search here
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time) AS c_time 
| stats count AS Events_Per_Second by c_time, host
| stats count AS Seconds_With_Data
| eval PercentDowntime = Seconds_With_Data / 86400
| eval PercentUptime = 1 - PercentDowntime

I may have the math backwards for your uptime/downtime, but that should be easy to fix.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Also please see woodcock's answer below, which should be much faster.

0 Karma

AssafLowenstein
Explorer

Thanks rich. The answer makes lot of sense.
I do miss the "average" part of the calculation. I need to calculate uptime for each host and then make average across all hosts.

Thanks.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Great, making progress!

So, check the below:

your_base_search_here earliest=-86400s
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time) AS c_time 
| stats count AS Events_Per_Second by c_time, host 
| stats count AS Seconds_With_Data by host 
| eventstats avg(Seconds_With_Data) AS Overall_Seconds_With_Data 
| eval PercentDowntime = Seconds_With_Data / 86400 
| eval OverallPercentDowntime = Overall_Seconds_With_Data / 86400

I hard-coded an "earliest" in there, you can remove it if you don't want it there.

The first stats stays the same, but the second we add by host to the end so now, host by host, we have a list of how many seconds they sent in data on. We then run eventstats to add an overall calculation for the overall downtime.

Now, this isn't perfect. If run over long enough periods is should be pretty good, but it fails to account for anything that hasn't sent in any events in the past seconds. (By that I mean that if system X hasn't sent anything in over the past 24 hours, it will not show it as "zero uptime" but just not show it at all.) Hopefully this isn't a problem, but it is fixable if it is - might be worth a new question though because it's a bit of a topic on its own.

0 Karma

AssafLowenstein
Explorer

Thanks rich for the elaborate explanation.
About your last remark about not being perfect, last few seconds shouldn't be problem; plan is to run this search for a minimum of 24 hours period which should be long enough to eliminate any last few seconds discrepancies.

Last question, might be a bit off topic but I'll try it anyway 🙂 .. If I wish to use visualization in my dashboard, how do I configure the single value formatting to show the uptime column and not the host name column?

Thanks again. I appreciate your time and patience.
Assaf

0 Karma

AssafLowenstein
Explorer

Figured out how to workaround the single value issue.
One just need to place the value in the first index of the table and single value will pick it up.
I used table to rearrange the columns and that's it.

Thanks again!!

0 Karma

woodcock
Esteemed Legend

Did you try my answer? That is exactly what it does.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...