All Apps and Add-ons

Using Timeline visualization to show process runtimes - filling in gaps between events

bensec01
Explorer

Hey folks,

I'm trying to visualize the run times of backup processes within our Exchange environment. I monitor for the CommVault backup process via Nagios, so I have data at approximately 5 minute intervals on whether the backup process is running or not.

Standard disclaimer - I'm not real great with searches and stats, so please speak up if there's a better way to do this. I'm positive there is. 🙂

My current search is:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name

That gives me events like:

[SERVICEPERFDATA] 1540988387    dc2-p-xmail-03  commvault backup process    1.570   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988685    dc2-p-xmail-03  commvault backup process    0.315   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

[SERVICEPERFDATA] 1540988085    dc2-p-xmail-03  commvault backup process    0.324   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988986    dc2-p-xmail-03  commvault backup process    0.332   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

From these transactions, I can create a table that the Timeline visualization can deal with:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name
| table _time host_name duration
| sort host_name

The resulting Timeline viz looks like:

alt text

Yay, look at me go! 🙂 That is literally exactly what I'd like to see - when the backup processes were running on each Exchange host, plotted by time. It makes it much easier for our team's Exchange folks to see when backups were running across all the hosts. However, that viz was done with a "last 7 days" time period. When I zoom into the last 24 hours, the 5 minute polling interval of Nagios makes things ugly:

alt text

Sad trombone. I bet I can use streamstats to "smooth in" the events between Nagios polls, but I haven't struck upon a useful method yet. I've been through the streamstats docs a number of times, but I struggle sometimes with written documentation (I learn much better by example), so I don't think I'm "getting it".

Can someone give me a hand with this? I'm also not really convinced that 'transaction' is a good way to go, but I'm definitely a newbie when it comes to that. All assistance is greatly appreciated.

Thanks folks!

Chris

0 Karma

rapmancz
Explorer

If the backup process has some ID and you have an option to log it, then you can create the transaction using transaction ProcessID startswith....
This will solve your problem.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...