Comments and answers for "How to perform spectrum analysis?"
https://answers.splunk.com/answers/147907/how-to-perform-spectrum-analysis.html
The latest comments and answers for the question "How to perform spectrum analysis?"Comment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/150746/view.html
Finally figured out how to handle multiple Splunk data series. R also has this concept of "multivalue", hence `mvfft()`.
`| r "
D=length(input)-1
N=length(input[[1]])
N_span=N*input$X_span
output=data.frame(Freq=((1:N)-1)/(N_span),Mod(mvfft(as.matrix(input[2:D]))))
"`
Here, X_span is from Splunk `_span`. (You can also access Splunk _time in X_time.) R app adds "X" to input series names. For example, if you do `timechart count as COUNT by host`, it will output `Freq` and `Xhost1`, `Xhost2`, etc.
Filling 0 in timechart is not the best interpolation for FFT. Better use R's own capability.Thu, 14 Aug 2014 00:24:59 GMTyuanliuComment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/149936/view.html
That's a really interesting bug. It doesn't show in preview mode.Fri, 08 Aug 2014 05:16:04 GMTyuanliuComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/149891/view.html
Lovely writeup... however, you're suffering from a Splunk Answers bug that doesn't let you use more than a certain number of backtick-enclosed code segments `like this`, see those eventstats0 eventstats1 etc. bits near the end.Thu, 07 Aug 2014 21:17:19 GMTmartin_muellerAnswer by yuanliu
https://answers.splunk.com/answering/149675/view.html
Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk;-) as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.
| timechart count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart max(count) as COUNT
| fillnull
| eventstats count as TOTAL
| r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"
Application notes
1. You need to install the **R app**. See @martin_meuller's answer above.
2. For event counts, gaps should be interpreted as 0. The largest part of the above search is to do just that, thanks to @somesoni2's [answer to my question][1].
3. The `eventstats` to obtain `TOTAL` is superficial and a waste of computation. There should be a better way to do this within R.
4. The above only outputs modulus of the transformation because counts are all real numbers. You can output the complex numbers by ridding `Mod()` from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)
5. `Freq` is a dummy sequence for interpretation, expressed in hertz. You can chart over `Freq`, for example.
6. Maximum frequency you can analyse is 0.5/`span`. `span` in both `timechart` calls must be equal.
2. Beware of an undesirable side effect of `timechart` used to fill gaps: It forces an extra interval.
A few F(FT)-words
1. As discrete Fourier transform goes, you only look at half of the output sequence (positive frequencies) when inputs are all real.
2. When analyzing (all-positive) event counts, output at frequency 0 is meaningless, as this component contains the strong DC bias.
4. `fft()` uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.
R-rated notes
1. Object `input` from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The `transform()` function in the above has nothing to do with Fourier *transformation*. The latter is performed in `fft()` function.
2. In addition to fields you pass to R, `input` also passes certain Splunk internal fields as X-rated objects. In the above, X\_span is `span` in the last stats function (`timechart`); you also have access to X\_time which corresponds to \_time in Splunk. (This is perhaps not limited to R app.)
The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.
[1]: http://answers.splunk.com/answers/149425/how-to-produce-empty-time-bucketsThu, 07 Aug 2014 00:18:32 GMTyuanliuComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/149627/view.html
Streamstats isn't expensive in and of itself, it runs over the data once... however, there's two streamstatses and two reverses in there, so for large data sets it's going to add up.Wed, 06 Aug 2014 18:33:09 GMTmartin_muellerComment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/149622/view.html
Not familiar with cost of streamstats, but excellent work on a straight-Splunk interpolation. You may want to give an answer in http://answers.splunk.com/answers/79513/. I made a nuanced analysis there.
For my use case, I need to make sure missing data are treated as 0. @somesoni2 offered an inexpensive way to do this in http://answers.splunk.com/answer_link/149598/.Wed, 06 Aug 2014 18:19:55 GMTyuanliuComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/148172/view.html
First line grabs data and builds a `timechart` with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.
Some results:
_time ev interpolated_ev
2014-07-30 00:55:00 99
2014-07-30 00:55:10 98.000000
2014-07-30 00:55:20 97.000000
2014-07-30 00:55:30 96
2014-07-30 00:55:40 101.000000
2014-07-30 00:55:50 106.000000
2014-07-30 00:56:00 111Wed, 30 Jul 2014 00:28:51 GMTmartin_muellerComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/148171/view.html
Here's a run-anywhere example using `_internal` data coming in every 30s, interpolated to 10s:
index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)Wed, 30 Jul 2014 00:24:42 GMTmartin_muellerComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/148169/view.html
If you have more data points than you need you can make them equally paced using `timechart`.
If you have too few data points you can do the same and throw some `streamstats` shenanigans in the mix... won't be fast for a large data set though.Wed, 30 Jul 2014 00:22:13 GMTmartin_muellerComment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/148133/view.html
Another note: FFT operates only on equally paced samples, i.e., data of constant sampling rate. The majority of Splunk data are not constant-rate. I have yet to find an easy way for interpolation.Tue, 29 Jul 2014 20:25:44 GMTyuanliuComment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/148079/view.html
:-) Or I can just use PDF; in fact, R provides (thoughtfully) EPUB version, too. I'm just extremely uncomfortable reading serious documents on screen. (But of course, I'm not to convert 3K pages into dead trees, either.)Tue, 29 Jul 2014 18:03:28 GMTyuanliuComment by martin_mueller on martin_mueller's answer
https://answers.splunk.com/comments/148078/view.html
You could probably buy a dedicated R-manual-Kindle for the price of printing that :DTue, 29 Jul 2014 17:56:39 GMTmartin_muellerComment by yuanliu on yuanliu's answer
https://answers.splunk.com/comments/148071/view.html
Thank you! With 3,397 pages of reference manual and a 155-page intro, I still have a lot of trees to kill. But yes, FFT is expressed in one function! And the R app makes it all integral within Splunk. Brilliant.Tue, 29 Jul 2014 17:50:06 GMTyuanliuAnswer by martin_mueller
https://answers.splunk.com/answering/147929/view.html
I believe R is capable of FFT, take a look at http://apps.splunk.com/app/1735/ for using R within Splunk.Tue, 29 Jul 2014 07:14:47 GMTmartin_mueller