Splunk Machine Learning Toolkit - How do I use for...

kteng2024 · ‎09-22-2017

Hi there,

I started using splunk machine learning but trying to understand on how to use forecast time series. Can someone please explain what is the holdback , future timespan. When i read the documentation and try to set the values is throwing an error "External search command 'predict' returned error code 1."I am also trying to understand all the methods like LLP5,LL,LLP,LLT. It would be great, if someone could be help to understand the forecast time series in ML.

Sukisen1981 · ‎09-23-2017

to answer briefly -
Holdback - Specifies the number of data points from the end that are not to be used by the predict command.
Eg you have 100 total data points and you specify holdback=10, so the prediction algo will choose only the first 90 data points to make predictions. This is useful if you want to see how accurately your model can predict against actuals. In this case the last 10 predictions can be compared with the ALREADY existing last 10 actual values to see the fit of the prediction
This is often used in conjunction with future timespan, so if i say future_timespan=10 holdback=20, your last 20 data points will not be used by the model to make predictions AND you will get predictions for an added future 10 time spans.

Now, coming to the second part of your question.
It is difficult to explain all the models in one go but here is a close approximation -
LLT- No seasonal data, means something like say you are analyzing greeting card sales, there is no seasonality like sales spike during christmas or mother's day or the like. It analyzes a continuous trend, the seasonality if any is random and the model determines the same.
On the other hand LLP is just the reverse , it deals with seasonal data. the splunk documentation says, Requires the minimum number of data points to be twice the period. Why so? if you say for example that sales during december and july are seasonally high, the model will need at least 2 years data for july and december months to determine if really the sales are seasonal in the respective months.
LL - Simplest of the models , it merely makes predictions assuming no seasonality or trends, which means it will probably just use the last 2 data points(minium data points) make an average of those two and predict the rest, it lands up in a simple average of ALL the data points
LLP5 - Combines LLT and LLP models for its prediction. What does this mean? It means that certian parts of the data are seasonal (but you can not make a prediction liek say sales in december) it might be sales during an announcement by a big company which could be ANY month, BUT the seasonality exsists since each random month an announcement is made, the sales spike. The rest of the months are not seasonal and hence you have to combine LLT and LLP to accurately make predictions for this kind of data.

Tip - Too often we are just carried away by the maths, it is important to think how your data actually behaves? Is it seasonal, is it not? or is it a combination of both? figure this out (a best guesstimate first) and then try applying multiple models. Use holdback and future_timespan to see how accurate the prediction is against the already existing actuals. It will take some time, retro fitting and of course the machine 'learns' as it ingests more and more data and can make better predictions with the chosen model...
Hope it helps 🙂

Splunk Machine Learning Toolkit - How do I use forecast time series?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes