Solved: Re: Data Field Entries Across Different Time Spans...

mmedal · ‎08-14-2012

I have a bunch of SAN usage data that I am inputting into Splunk that looks as follows, with each line representing an entry in Splunk:

Group: diskdg1 Disks: 21 Disk in use: data04 Capacity: 1%  
Group: diskdg2 Disks: 21 Disk in use: data05 Capacity: 1%  
Group: diskdg3 Disks: 5 Disk in use: data01 Capacity: 33%  
Group: diskdg4 Disks: 34 Disk in use: data08 Capacity: 1%  
Group: diskdg5 Disks: 30 Disk in use: data07 Capacity: 1%  
Group: diskdg6 Disks: 38 Disk in use: data09 Capacity: 25%

What I would like to do is display a table with these fields, plus a new field displaying a "change in capacity" since 7 days ago. In other words, I would like to evaluate the difference between the capacity field now and the capacity field for that entry 7 days ago.

Can anyone assist me with a search?

Thanks so much, Matt

dwaddle · ‎08-14-2012

At first glance, the difference should be pretty easy - you can use the delta search command. But, delta lacks a by clause so you could only do one Group at a time - a bit of a limitation. But, I think you can use streamstats to roughly create a delta per-Group.

Assuming that your data above has field extractions for Group and Capacity then a search like this should get you close:

sourcetype=my_san_data 
| streamstats last(Capacity) as high first(Capacity) as low by Group window=7 global=f 
| eval delta=high-low
| table _time,Group,Capacity,delta

You may need to swap around high vs low just to get it to work out mathematically right. There is an assumption here that you are collecting this data once per day. The way this "should" work is streamstats will do a sliding window of 7 events per Group and use the first and last values of Capacity within each of those sliding windows to calculate a delta.

Obviously a sliding window of 7 events is not necessarily strictly 7 days. It depends on you collecting exactly once per day, every day, without missing one. If you are collecting once per hour, then you can adjust window to be 168 instead.

There are some more complicated ways of dealing with this like maintaining state in lookups, or time-oriented subsearches if you need a higher precision than a sliding window. But, unless your accuracy requirements are very very high, this should be "close enough".

View solution in original post

dwaddle · ‎08-14-2012

At first glance, the difference should be pretty easy - you can use the delta search command. But, delta lacks a by clause so you could only do one Group at a time - a bit of a limitation. But, I think you can use streamstats to roughly create a delta per-Group.

Assuming that your data above has field extractions for Group and Capacity then a search like this should get you close:

sourcetype=my_san_data 
| streamstats last(Capacity) as high first(Capacity) as low by Group window=7 global=f 
| eval delta=high-low
| table _time,Group,Capacity,delta

You may need to swap around high vs low just to get it to work out mathematically right. There is an assumption here that you are collecting this data once per day. The way this "should" work is streamstats will do a sliding window of 7 events per Group and use the first and last values of Capacity within each of those sliding windows to calculate a delta.

Obviously a sliding window of 7 events is not necessarily strictly 7 days. It depends on you collecting exactly once per day, every day, without missing one. If you are collecting once per hour, then you can adjust window to be 168 instead.

There are some more complicated ways of dealing with this like maintaining state in lookups, or time-oriented subsearches if you need a higher precision than a sliding window. But, unless your accuracy requirements are very very high, this should be "close enough".

mmedal · ‎08-15-2012

Thanks for the feedback. Great answer to my question, it certainly is "close enough" haha.

Data Field Entries Across Different Time Spans per Entry

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability