Solved: How to produce empty time buckets

yuanliu · ‎08-05-2014

In order to perform fast Fourier transform (FFT), I need data from equal time intervals. Here is my first attempt:
| bucket span=5m _time | stats count as COUNT by _time
The idea is to use bins for sampling. However, this doesn't work when no events occur in some bins. Splunk will not output rows for any bin in which COUNT = 0 because _raw doesn't contain such data. Is there some way to force/coerce Splunk into producing empty time buckets?

FFT (in R app) is the most hopeful tool for spectrum analysis. Before I explore interpolation in an external tool, I'd like to see if there is a direct Splunk method. I understand that there may not be a meaningful way to fill empty buckets in general. But calendar time is defined in the system, so there should be an easy way to force it.

I looked at zero-count discussions. One post suggests a subsearch over dense data, i.e., data sets that have non-zero counts in each of desired time bucket. Whereas in my range this is possible, this appears to be a really expensive way to implement a simple calendaring function.

somesoni2 · ‎08-06-2014

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

View solution in original post

landen99 · ‎07-12-2015

In order to create zero values in each time bucket, you need append and stats/eventstats. I will introduce a slightly more complicated search to demonstrate the full power of this approach: the sum of the events for each value in "field1" every 5 minutes (even if there are no events from up to all but one of the values in "field1"):

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1

Add append, but be wary that the number of events in the subsearch does not expend the subsearch limit:

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1 | append [yoursearch | bucket span=5m _time | table _time field1 | eventstats values(field1) AS field1 values(_time) AS times | stats count by field1 times | eval events=0 | rename times AS _time] | stats max(events) AS events by _time field1

This approach assumes that every _time 5 minute span contains contains data from at least one value in field1. If this is not the case, then add another data source with values in all time span periods, and then rename the field to field1 at the start of your append, and then remove that value from field1 at the end of your append subsearch.

somesoni2 · ‎08-06-2014

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

nouraali · ‎05-21-2021

Hi ,

given the below input (4 mins of sample access log data):

_time,URI,Bytes
2021-05-18 02:01:00,a,1
2021-05-18 02:01:00,a,1
2021-05-18 02:02:00,a,1
2021-05-18 02:03:00,b,1
2021-05-18 02:03:00,b,1
2021-05-18 02:04:00,a,1

assuming a window of 2 mins, i want to perform some computations (average and standard dev of bytes grouped by URI) as below

source="ds1.csv" host="vgspl11hr" index="sfp" sourcetype="csv"
| table _time,URI,Bytes
| timechart span=1m  avg(Bytes) AS avg_bytes, stdev(Bytes) AS std_bytes by URI limit=0
| fillnull value=""
| untable _time Measure Value
| eval Metric=mvindex(split(Measure,": "),0),uri=mvindex(split(Measure,": "),1)
| fields - Measure
| eval time_uri=_time."__".uri
| fields - uri - _time
| xyseries time_uri Metric Value
| eval _time=mvindex(split(time_uri,"__"),0),uri=mvindex(split(time_uri,"__"),1)
| fields - time_uri

with 2-min time window between (5/18/21 2:01:00.000 AM to 5/18/21 2:03:00.000 AM), below is the output:

_time	            uri	avg_bytes	std_bytes
2021-05-18 02:01:00	a	1			0
2021-05-18 02:02:00	a	1			0

So, the timechart performed the computations on the existing URIs in the first 2 mins time window, in that case the URI=a.
but i want the timechart to consider the existence of the URI = b.

Is there a way to have the timechart consider all the values of the URI in the computation, even if not all of the URI have values in the time window?

I need the output to be as below in the first 2 mins time window:

_time	            uri	avg_bytes	std_bytes
2021-05-18 02:01:00	a	1			0
2021-05-18 02:01:00	b				
2021-05-18 02:02:00	a	1			0
2021-05-18 02:02:00	b

Is that possible?

patrickprodoehl · ‎01-13-2021

Thanks for sharing.

Why is the last | fillnull needed though?

I removed is and the _time bins are still filled with 0 values and interpreted correctly in the dashboard.

briancronrath · ‎02-21-2018

This was pretty helpful to get me going, thought I'd share my finished product w/ others if they might find it useful. What I needed to do was get data for each minutely timebucket in a stats call of mine, but I had multiple fields that I was pivoting on so the solution had to be expanded upon a bit. Here's what I did:

base search |bucket _time span=1m| stats sum(field1) as field1 avg(field2) as field2 by field3 field4 _time| appendpipe [|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3)  |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]

The lookup contains all possible values for field3, so basically filling out all possible values that could show up in the base stats call, then afterward further down the line not pictured I do an sistats call that gets me everything I want and now has "0" values for time buckets that otherwise wouldn't show up for certain variants of field3 and field4

jbuckner85 · ‎10-26-2018

This solution worked for me, thank you briancronrath. One thing that I had to do was ad an additional AS with the portion which says

|stats values(field3) AS field3

[|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3) AS field3 |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]

yuanliu · ‎08-06-2014

The subsearch only uses two lines produced by calendar, thus is super efficient. Great lesson about makemv. Not that it matters, but in the interest of minimalism this takes a couple less evals:

csendekw · ‎10-22-2015

Is there a way to do this when I want to split? When I add "by host" to my timechart this answer stops working. 😞

landen99 · ‎10-25-2015

try my solution below.

csendekw · ‎10-26-2015

I don't understand how to try your solution at all. What is field1? Is that arbitrary? I'm not looking at fields in my search, only event counts. Additionally you state that it only works if there is at least one event from a value in a field. that statement doesn't even make sense at all to me :(. events from a value? My search is basically (60 minute window):
index=blah eventtype=error | timechart span=1m count by host

landen99 · ‎07-12-2015

timechart is very unideal for this case. Try my solution below with append, stats, and eventstats.

yuanliu · ‎08-06-2014

timechart has a very undesirable feature for this purpose: It insists on one extra data point beyond the specified timespan.

yuanliu · ‎08-06-2014

Subsearch using calendar. This fills all, even missing start and end buckets. Brilliant!

yuanliu · ‎08-06-2014

Yes, that's what I need.

strive · ‎08-05-2014

when raw data itself is not for some buckets, you need those buckets as well. is that right understanding?

Suda · ‎08-05-2014

Hello,

Could you try to use "timechart" instead of "bucket"?

(your search) | timechart span=5m count AS COUNT

You may also get COUNT=0 events. I hope it helps you.

Thank you.

yuanliu · ‎08-06-2014

timechart works, as long as start and end buckets have data. Thanks.

How to produce empty time buckets

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk