The search below throws the error whenever there are more than two hosts searched for.: **command="predict", Too few data points: -5. Need at least 1 (too many holdbacks (5) maybe?)**
If searching for just one host, the data is perfect. I have 700+ hosts that I need to run this against. Any ideas?
Here is the search that returns the error:
| inputlookup test_diskusage.csv | search host=splunk-indexer-1 | eval _time=strptime(date, "%Y-%m-%d") | timechart span=1d values("/opt/splunk") as "/opt/splunk", values(cold0) AS cold0, values(cold1) AS cold1, values(hot0) AS hot0, values(hot1) AS hot1, values(hot2) AS hot2 | predict "/opt/splunk" "cold0" "cold1" "hot0" "hot1" "hot2" algorithm=LLP5 holdback=5 future_timespan=25 upper95=upper95 lower95=lower95
Hi All,
I'm trying to write a search that looks at creating an alert where there is a significant spike in HTTP POST requests.
I am interested in using the predict command and alerting where the total `count(http_request)` (where `http_request=POST)` requests by `source_ip` breaches the predicted `upper95`.
In theory, it would look something like:
index=web_proxy
| search http_request=POST
| stats count(http_request) AS POST_Count by source_ip
| predict POST_Count by source_ip
| where POST_Count >= upper95
| where POST_Count >= upper95
I know that the predict functions become more accurate when you feed it more data but I don't want to be querying 2 months worth of data in a dashboard that would take like 2 mins to load. Is there a way to get a more accurate prediction without actively querying the past 2 months? or is there a way to do this differently with a different function. FYI I d not have authority to download the MLTK
I know this is a tough question but would like to hear some ideas.
index=summary source="summary_events_2"
orig_source=/var/log/pnr*
ms_region=us-west-1
ms_level=E*
| timechart span=15m sum(count) as count
| predict count as count_prediction period=7 algorithm=LLP5 future_timespan=10 holdback=0 upper50=high_prediction lower5=low_prediction
| rename high_prediction(count_prediction) as high_prediction
| eval deviation=count-round(count_prediction,0)
| streamstats window=300 current=true median(deviation) as median_of_residual
| eval abs_dev=(abs(deviation - median_of_residual))
| streamstats window=300 current=true median(abs_dev) as median_abs_dev
| eval upper_bound=if(median_of_residual + median_abs_dev * 5 < 0,abs(median_of_residual + median_abs_dev), median_of_residual + median_abs_dev * 5)
| eval anomaly=if(deviation > upper_bound,1,0)
| predict deviation as deviation_prediction period=7 algorithm=LLP5 future_timespan=0 holdback=0 upper20=high_prediction lower20=low_prediction
| fields - median_of_residual, median_abs_dev, abs_dev, high_prediction, bounds, count, count_prediction
so when I use the predict command my fields become null
index=summary source="summary_events_2"
orig_source=*pnr*
ms_level=ERROR OR ms_level=error
NOT event=no-event
| timechart span=5m sum(count) as count
| predict count as prediction algorithm=LLP future_timespan=200 holdback=0
| eval residual=count-round(prediction,0)
| streamstats window=200 current=true median(residual) as median_of_residual
| eval abs_dev=(abs(residual - median_of_residual))
| streamstats window=200 current=true median(abs_dev) as median_abs_dev
| eval upper_bound=(median_of_residual + median_abs_dev * 20)
| eval anomaly=if(residual > upper_bound,1,0)
so this is my query and I want to add
|table event, anomaly, count
but for some reason the "event" field is null. can anyone explain why?
so I have this query that detects anomalies in the errors from a specific source based on the mean absolute value of the residual around the median. but my question is does the predict command and the algorithm that I used accounts for the fluctuations of event counts over the weekend the weekends. because with less users using the product over the weekend hence less errors.
index=summary source=some index
orig_source=some source
ms_level=ERROR OR ms_level=error
NOT event=no-event
| sort -_time
| predict count as prediction algorithm=LLP future_timespan=200 holdback=0
| eval prediction=round(prediction,0)
| eval residual= count - prediction
| streamstats window=400 current=true median(residual) as median
| eval abs_dev=(abs(residual- median ))
| streamstats window=400 current=true median(abs_dev) as median_abs_dev
| eval upper_bound=(median + median_abs_dev * 30)
| eval lower_bound=(median - median_abs_dev * 30)
| eval outlier=if(residual > upper_bound OR residual < lower_bound,1,0)
| table _time,event residual, upper_bound
I'm trying to do something like from my output I just need to apply predict function on most varying field. For example
`index=_internal sourcetype=splunkd* | timechart count as Count by sourcetype | predict splunkd_access`
I can use above query if I've to write query manually as I know splunkd_access is most varying field, but in my case I've to find most varying field by query (We can do this by using `stdev` command with `eventstats`). And then for field with highest stdev I need to apply predict function, I also want to remove other fields like splunkd, splunkd_ui_access, so it won't show up in the chart.
Two ways that I know to solve this problem is custom command and java script. But if possible I want to solve it query only.timechartjavascriptpredictgroupbycustomcommandsTue, 29 May 2018 05:43:23 GMTVatsalJaganiPredict with wildcard
How can I use predict command with wildcard, as I have timechart with group by field. See below example query.
Query: `index=_internal sourcetype=splunkd* | timechart count as Count by sourcetype | predict splunkd*`
Above query is giving following error: `command="predict", Unknown field: splunkd*`.
One way to solve is to use custom command, but if possible I don't want to introduce custom command in my app. If anyone have solution with query?
Note: Field name should be displayed on panel, otherwise I can rename all fields with particular name like col1, col2, ... And I can do this but I also want to know use that this prediction is for which field.
Hi
I want to predict values of a field over time.
the result table of my search:
![alt text][1]
In the end of the search I use:
| timechart span=24h sum(sloc) as SLOC
| eval _time = strftime(_time, "%Y-%m-%d")
| fillnull value=0
| predict SLOC
the error I get:
External search command 'predict' returned error code 1.
I am using splunk 6.5.7
the results I would like to see is more days to come with the 'SLOC' predicted value.
I am using splunk 6.5.7

the results I would like to see is more days to come with the 'SLOC' predicted value.
I want to predict end time and start time of some jobs and I am currently using "Linear Regression" algorithm Predict numeric field of Machine learning Toolkit. I am having some personal work of prediction, but I am not sure if this approach is right or wrong.
I need more details on this and I don't have additional details.
Fields that can be used for prediction can - Job Group (Under which group it's coming). But I don't have many fields for predicting start time and end time.
If anybody knows about a different approach, please let me know.
Thanks in Advance.
How do I make a predict function more aggressive?
Below is an example of my predict example, search and graph:
`... | predict Total as predict future_timespan=12 holdback=0 | fields - upper* lower*`
![pic of graph with predict function used][1]
It is something I probably need to understand more of, and I am possibly entering the topic of polynomial or exponential types of growth. And maybe this is the case and the answer lies outside of `predict`.
Appreciate any advice/pointers to further reading/explanations on this.
----------
Some useful questions i have been reading up on this:
[how-to-create-a-search-to-predict-license-violation][2]
[prediction-function-algorithms-questions][3]
[predict-95-confidence-interval][4] - good at explaining some basics
[Predict Documentation][5]
Note: I could use the upperX values, which would be more arressive(give me higher future values) but again I don't think this will be aggressive enough. maybe I need to look at the [forecast option][6]?
tks
[1]: /storage/temp/229817-predict-make-more-aggressive-question.png
[2]: https://answers.splunk.com/answers/187080/how-to-create-a-search-to-predict-license-violatio.html?utm_source=typeahead&utm_medium=newquestion&utm_campaign=no_votes_sort_relev
[3]: https://answers.splunk.com/answers/95610/prediction-function-algorithms-questions.html
[4]: https://answers.splunk.com/answers/514892/predict-95-confidence-interval.html
[5]: https://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Predict
Require the parameters that will affect CPU. So that the prediction can be more effective.
Require the parameters that will affect CPU. So that the prediction can be more effective.cpupredictSun, 25 Feb 2018 13:57:32 GMTkaushik1218How to calculate a linear regression for each field and predict the next possible number in my search query?
Suppose I have this data (but in thousands, is just an example):
ID class qty
1 cup 5
2 cup 6
3 cup 2
4 cup 7
5 mug 1
6 mug 3
7 mug 4
I want to calculate a linear regression for "qty" (or just to use the predict command) for each "class", so I want one predict (or linear regression can work too) next possible number for "cup" and for "mug".
Considering that there are not only 2 different classes but thousands of it, how can I do this?
ps: I have tried with "map" command but it limits my results to only 10 iterations even after I have modified the "maxsearches" parameter but didn't work.
I have to forecast data for next 15 days, based on the last 30 days data. I have used the following query:
sourcetype=mylogs (message=1234*)
| timechart count as msgs span=1m
| timechart avg(msgs) as msgs_daily_avg span=1d
| predict msgs_daily_avg algorithm=LLP period=30 future_timespan=15
The search is getting me the results. But the performance is taken for beating. The time frame to collect data is "Last 30 days".
It is almost taking 20-25 minutes to fetch the entire results along with the predicted values. Is it because of the predict keyword which I used or the time frame which I set?
Can I update the results in a file by running the above query in a report and use the data in the file to populate them in dashboard panel?
How can this be done to avoid the performance issues?
How good is Splunk in terms of forecasting and comfort to separate Train Dataset and Test data set from same source data. Do we also get r2 and p value information to validate model under considerations
I seen some videos from youtube and splunk docs sites but they are very high level. I am exploring predict in our project but I find a lot of limitations with predict in combination with timechart.
Can someone please share their experience or point me to the URL where I can read further? If there are any ebooks I can buy, let me know please. I wonder if splunk ML toolkit is more appropriate rather than using the SPL directly?
many thanks in advance.
I tried various combinations but failed
1. index="flowintegrator" src_port=21
|eval thisUser=src_ip + "="+ dest_ip
| timechart avg(bytes) as volume by
thisUser|predict thisUser
2. index="flowintegrator" src_port=21
|eval thisUser=src_ip + "="+ dest_ip
| timechart avg(bytes) as avg_bytes
by thisUser|predict avg_bytes
This works but I can't predict.
index="flowintegrator" src_port=21 |eval thisUser=src_ip + "="+ dest_ip | timechart avg(bytes) as avg_bytes by thisUser
Help
I have a trend graph that shows some data then its predicting out that data a couple days forward. However, The prediction starts when the normal data starts, when I would rather have the prediction start on the graph when there is no previous data. Basically attaching itself to the previous trendline and adding on with it's prediction. Is there a way to do this?
I've asked about this before and now I've re-loaded the **raw** data without any modifications. It looks like this (without an actual timestamp):
Month,Billing,MsgType,BillSize,Direction
2013-04,BI70276,ORDHDR,5,SENT
2013-04,BI70276,INVFIL,8,RECV
2013-04,BI70276,ORDHDR,5,SENT
2013-04,BI70276,INVFIL,34,RECV
2013-04,BI70276,ORDHDR,20,SENT
2013-04,BI70276,INVFIL,13,RECV
2013-04,BI70276,ORDHDR,7,SENT
2013-04,BI70276,INVFIL,1,RECV
2013-04,BI70276,ORDHDR,1,SENT
2013-04,BI70276,ORDHDR,5,SENT
2013-04,BI70276,INVFIL,4,RECV
2013-04,BI70276,ORDHDR,6,SENT
2013-04,BI70276,INVFIL,9,RECV
2013-04,BI70276,ORDHDR,12,SENT
2013-04,BI70276,INVFIL,178,RECV.................................etc.
I have this data for every CCYY-MM for the last 53 months, c200k events. So, no **actual** timestamp for each event.
If I use this:
index=IX Billing=BI70400 MsgType=ORDHDR Direction=SENT | stats sum(BillSize) as MonthSize by Month
...I get the column chart that I expect/want.
How can I use this to create a prediction for the future? We've tried a few variations, based on this, but without success.
Thank you.
So, I have a graph that shows the total user logins per day for an application and I thought it would be cool to show the ability to predict what the total number of logins for the next month would be. So the current graph just shows the previous month of total user log ins each day and when I use predict:
| predict Users period=30 future_timespan=30
It basically just mirrors the previous month to the future month since it is looking at the past 30 days. Is there a way to grab more "before" data than what I am displaying so that the predict doesn't just mirror the previous 30 days?splunk-enterprisesearchpredictTue, 12 Sep 2017 11:45:40 GMTkdimariaSplunk predict command period vs future_timespan?
I am wondering if anyone has an explanation of exactly what period is and what future_timespan is? I already read the document http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Predict which talks about both of the parameters but I am still really confused on what exactly they do and would like for someone to explain them to me in their own words. Thank you!