All Apps and Add-ons

Using Timeline visualization to show process runtimes - filling in gaps between events

bensec01
Explorer

Hey folks,

I'm trying to visualize the run times of backup processes within our Exchange environment. I monitor for the CommVault backup process via Nagios, so I have data at approximately 5 minute intervals on whether the backup process is running or not.

Standard disclaimer - I'm not real great with searches and stats, so please speak up if there's a better way to do this. I'm positive there is. 🙂

My current search is:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name

That gives me events like:

[SERVICEPERFDATA] 1540988387    dc2-p-xmail-03  commvault backup process    1.570   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988685    dc2-p-xmail-03  commvault backup process    0.315   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

[SERVICEPERFDATA] 1540988085    dc2-p-xmail-03  commvault backup process    0.324   0.000   CRITICAL: exTiDbBackup.exe: started (critical) 'exTiDbBackup.exe'=1;1;2
[SERVICEPERFDATA] 1540988986    dc2-p-xmail-03  commvault backup process    0.332   0.000   OK: exTiDbBackup.exe: 0 'exTiDbBackup.exe'=0;1;2

From these transactions, I can create a table that the Timeline visualization can deal with:

index=nagios host_name="dc2-p-xmail-0*" service_name="commvault backup process" performance_type=SERVICEPERFDATA
| eval backups_running=case(service_status=="OK", 0, 1==1, 1)
| transaction startswith=eval(service_status=="CRITICAL") endswith=eval(service_status=="OK") by host_name
| table _time host_name duration
| sort host_name

The resulting Timeline viz looks like:

alt text

Yay, look at me go! 🙂 That is literally exactly what I'd like to see - when the backup processes were running on each Exchange host, plotted by time. It makes it much easier for our team's Exchange folks to see when backups were running across all the hosts. However, that viz was done with a "last 7 days" time period. When I zoom into the last 24 hours, the 5 minute polling interval of Nagios makes things ugly:

alt text

Sad trombone. I bet I can use streamstats to "smooth in" the events between Nagios polls, but I haven't struck upon a useful method yet. I've been through the streamstats docs a number of times, but I struggle sometimes with written documentation (I learn much better by example), so I don't think I'm "getting it".

Can someone give me a hand with this? I'm also not really convinced that 'transaction' is a good way to go, but I'm definitely a newbie when it comes to that. All assistance is greatly appreciated.

Thanks folks!

Chris

0 Karma

rapmancz
Explorer

If the backup process has some ID and you have an option to log it, then you can create the transaction using transaction ProcessID startswith....
This will solve your problem.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...