Splunk Search

Parallel stats - most efficient structure

JeToJedno
Explorer

I frequently have to create stats reports where some parts are, essentially, executable in parallel with others. An example would be:

search <something> 
  | appendpipe [ eval _time=FLOOR(_time/60)*60 | stats DC(id) unique_devices_minute BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 | stats DC(id) unique_devices_hour BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/86400)*86400 | stats DC(id) unique_devices_day BY device_type, _time ]
  | eval _time=FLOOR(_time/60)*60
  | stats <<some summaries>> SUM(unique_devices_minute) AS unique_devices_minute, SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
  | eval period="minute"
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 
    | stats <<sum per-minute to per-hour>> SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="hour" ]
  | appendpipe [ where period="hour" | eval _time=FLOOR(_time/86400)*86400 
    | stats <<sum per-hour to per-day>>  SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="day" ]
  | eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour,eval period="day",unique_devices_day ) | fields - unique_devices_minute, unique_devices_hour, unique_devices_day 

This gives the results I want in a single report, but is it the most efficient way to structure this?

0 Karma

cmerriman
Super Champion

you could do something like this:

|multireport
[|bucket _time span=1min| stats DC(id) as unique_devices_minute BY device_type _time|eval period="minute"]
[|bucket _time span=1h| stats DC(id) as unique_devices_hour BY device_type _time|eval period="hour"]
[|bucket _time span=1d| stats DC(id) as unique_devices_day BY device_type _time|eval period="day"]
| eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour, period="day",unique_devices_day )
| fields - unique_devices_minute, unique_devices_hour, unique_devices_day

I see that you're doing some sort of summing by minute/hour/day before you eval unique_devices in separate appendpipes, but i'm not entirely sure what that's doing in the end. some sample data and desired output might be more helpful.

0 Karma

JeToJedno
Explorer

The other stats calculated are always average, peak & maximum request rates, and sometimes first-seen-time (in day) unique devices count. Peak rate = max of 2 sec moving average request rate or a close approximation (e.g. 98th percentile within minute of per-second request rate, and then max of per-minute 98th percentile for hour & day).

I was hoping to avoid making multiple independent passes through the whole dataset, but I can see now that's probably not going to be possible as the count distinct over a different time period will always require a separate pass through the base data. 😞

0 Karma

JeToJedno
Explorer

Before anyone says anything, the first appendpipe (line 2) is unnecessary and can be part of the stats command in line 6.

... and there's a spurious "eval" in line 14. 😞

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...