Solved: duplicate in dates for stats when using predict

mjm295 · ‎08-15-2017

I have this query to predict CPU usage, looking at real data for last 90 days and predicting ahead 60 days.

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  | eval PctUsed = 100 - pctIdle 
| timechart avg(PctUsed) as PercentUsed 
| predict "PercentUsed" as futures algorithm=LLP future_timespan=60
| eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )
| eval lower95(futures)=if(_time<=now(), Null, 'lower95(futures)' )

Looking at the stats (results) the 10 days from today backwards get duplicated. Today is 16th August. Here is the snip of the stats:

2017-08-02  10.345810   11.2606080643        
2017-08-03  8.371493    11.6832498048        
2017-08-04  8.287087    10.2299365809        
2017-08-05  12.312134   12.2315872649        
2017-08-06  11.367797   10.9899627817        
2017-08-07  17.745977   14.2295366964        
2017-08-08  10.109057   10.1245616922        
2017-08-09  17.496496   14.0287175836        
2017-08-10  8.339878    11.2479039882        
2017-08-11  8.737030    10.0940590718        
2017-08-12  8.032037    9.39042740568        
2017-08-13  7.555324    9.33242169748        
2017-08-14  9.514418    11.8174795236        
2017-08-15  8.862755    8.98957755123        
2017-08-16  8.136355    11.4131114138        
2017-08-06              11.2479039882        
2017-08-07              10.0940590718        
2017-08-08              9.39042740568        
2017-08-09              9.33242169748        
2017-08-10              11.8174795236        
2017-08-11              8.98957755123        
2017-08-12              11.4131114138        
2017-08-13              11.2479039882        
2017-08-14              10.0940590718        
2017-08-15              9.39042740568        
2017-08-16              9.33242169748        
2017-08-17              11.8174795236   4.01416734251   19.6207917047
2017-08-18              8.98957755123   -0.453621346862 18.4327764493
2017-08-19              11.4131114138   1.74019160299   21.0860312246
2017-08-20              11.2479039882   0.874637979426  21.6211699969
2017-08-21              10.0940590718   4.39114905157   15.796969092
2017-08-22              9.39042740568   -4.25403965674  23.0348944681

So the real Data stops on 2017-08-17
BUT then the predicted data start again from 2017-08--6
Before the 95th percentiles kick on the 2nd time we cross 2017-08-17

What could be casuing this? It makes the graphe I am creating look messy.

Thanks
Mark

DalJeanis · ‎08-16-2017

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

View solution in original post

DalJeanis · ‎08-16-2017

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

mjm295 · ‎08-16-2017

Thanks for the "Null" clarification.

mjm295 · ‎08-16-2017

Thanks Dal, looking much tidier now. Just for completeness my final query is:

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  
| eval PctUsed = 100 - pctIdle 
|  timechart avg(PctUsed) as PercentUsed span=1h
| eval PercentUsed=round(PercentUsed,2)
| predict "PercentUsed" as futures algorithm=LLP future_timespan=960 lower90=low upper90=high
| eval futures=round(futures,2) 
| eval high(futures)=if(_time<=now(), null(), 'high(futures)' ) 
| eval low(futures)=if(_time<=now(), null(), 'low(futures)' )
| eval low(futures)=if( 'low(futures)' < 0, 0, 'low(futures)' )
 | streamstats current=f max(_time) as priorbesttime
 | where _time > priorbesttime
 | fields - priorbesttime

DalJeanis · ‎08-17-2017

@mjm295 - Thanks for posting that. It can help other people when they can see the solution that worked.

cmerriman · ‎08-16-2017

what version of Splunk are you using? i just ran your query with some of my own data and it worked fine. I'm on 6.6.2

mjm295 · ‎08-16-2017

its 6.5.1 to be exact.

mjm295 · ‎08-16-2017

Version 6.5 here.

duplicate in dates for stats when using predict

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes