In order to perform fast Fourier transform (FFT), I need data from equal time intervals. Here is my first attempt:
| bucket span=5m _time | stats count as COUNT by _time
The idea is to use bins for sampling. However, this doesn't work when no events occur in some bins. Splunk will not output rows for any bin in which COUNT = 0 because _raw doesn't contain such data. Is there some way to force/coerce Splunk into producing empty time buckets?
FFT (in R app) is the most hopeful tool for spectrum analysis. Before I explore interpolation in an external tool, I'd like to see if there is a direct Splunk method. I understand that there may not be a meaningful way to fill empty buckets in general. But calendar time is defined in the system, so there should be an easy way to force it.
I looked at zero-count discussions. One post suggests a subsearch over dense data, i.e., data sets that have non-zero counts in each of desired time bucket. Whereas in my range this is possible, this appears to be a really expensive way to implement a simple calendaring function.
Try this workaround
your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull
In order to create zero values in each time bucket, you need append and stats/eventstats. I will introduce a slightly more complicated search to demonstrate the full power of this approach: the sum of the events for each value in "field1" every 5 minutes (even if there are no events from up to all but one of the values in "field1"):
yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1
Add append, but be wary that the number of events in the subsearch does not expend the subsearch limit:
yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1 | append [yoursearch | bucket span=5m _time | table _time field1 | eventstats values(field1) AS field1 values(_time) AS times | stats count by field1 times | eval events=0 | rename times AS _time] | stats max(events) AS events by _time field1
This approach assumes that every _time 5 minute span contains contains data from at least one value in field1. If this is not the case, then add another data source with values in all time span periods, and then rename the field to field1 at the start of your append, and then remove that value from field1 at the end of your append subsearch.
Try this workaround
your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull
Hi ,
_time,URI,Bytes
2021-05-18 02:01:00,a,1
2021-05-18 02:01:00,a,1
2021-05-18 02:02:00,a,1
2021-05-18 02:03:00,b,1
2021-05-18 02:03:00,b,1
2021-05-18 02:04:00,a,1
source="ds1.csv" host="vgspl11hr" index="sfp" sourcetype="csv"
| table _time,URI,Bytes
| timechart span=1m avg(Bytes) AS avg_bytes, stdev(Bytes) AS std_bytes by URI limit=0
| fillnull value=""
| untable _time Measure Value
| eval Metric=mvindex(split(Measure,": "),0),uri=mvindex(split(Measure,": "),1)
| fields - Measure
| eval time_uri=_time."__".uri
| fields - uri - _time
| xyseries time_uri Metric Value
| eval _time=mvindex(split(time_uri,"__"),0),uri=mvindex(split(time_uri,"__"),1)
| fields - time_uri
_time uri avg_bytes std_bytes
2021-05-18 02:01:00 a 1 0
2021-05-18 02:02:00 a 1 0
_time uri avg_bytes std_bytes
2021-05-18 02:01:00 a 1 0
2021-05-18 02:01:00 b
2021-05-18 02:02:00 a 1 0
2021-05-18 02:02:00 b
Thanks for sharing.
Why is the last | fillnull needed though?
I removed is and the _time bins are still filled with 0 values and interpreted correctly in the dashboard.
This was pretty helpful to get me going, thought I'd share my finished product w/ others if they might find it useful. What I needed to do was get data for each minutely timebucket in a stats call of mine, but I had multiple fields that I was pivoting on so the solution had to be expanded upon a bit. Here's what I did:
base search |bucket _time span=1m| stats sum(field1) as field1 avg(field2) as field2 by field3 field4 _time| appendpipe [|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3) |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]
The lookup contains all possible values for field3, so basically filling out all possible values that could show up in the base stats call, then afterward further down the line not pictured I do an sistats call that gets me everything I want and now has "0" values for time buckets that otherwise wouldn't show up for certain variants of field3 and field4
This solution worked for me, thank you briancronrath. One thing that I had to do was ad an additional AS with the portion which says
|stats values(field3) AS field3
[|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3) AS field3 |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]
The subsearch only uses two lines produced by calendar, thus is super efficient. Great lesson about makemv
. Not that it matters, but in the interest of minimalism this takes a couple less eval
s:
| timechart span=1h count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart span=1h max(count) as count
| fillnull
Is there a way to do this when I want to split? When I add "by host" to my timechart this answer stops working. 😞
try my solution below.
I don't understand how to try your solution at all. What is field1? Is that arbitrary? I'm not looking at fields in my search, only event counts. Additionally you state that it only works if there is at least one event from a value in a field. that statement doesn't even make sense at all to me :(. events from a value? My search is basically (60 minute window):
index=blah eventtype=error | timechart span=1m count by host
timechart is very unideal for this case. Try my solution below with append, stats, and eventstats.
timechart has a very undesirable feature for this purpose: It insists on one extra data point beyond the specified timespan.
Subsearch using calendar. This fills all, even missing start and end buckets. Brilliant!
Yes, that's what I need.
when raw data itself is not for some buckets, you need those buckets as well. is that right understanding?
Hello,
Could you try to use "timechart" instead of "bucket"?
(your search) | timechart span=5m count AS COUNT
You may also get COUNT=0 events. I hope it helps you.
Thank you.
timechart works, as long as start and end buckets have data. Thanks.