Splunk Search

duplicate in dates for stats when using predict

mjm295
Path Finder

I have this query to predict CPU usage, looking at real data for last 90 days and predicting ahead 60 days.

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  | eval PctUsed = 100 - pctIdle 
| timechart avg(PctUsed) as PercentUsed 
| predict "PercentUsed" as futures algorithm=LLP future_timespan=60
| eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )
| eval lower95(futures)=if(_time<=now(), Null, 'lower95(futures)' )

Looking at the stats (results) the 10 days from today backwards get duplicated. Today is 16th August. Here is the snip of the stats:

2017-08-02  10.345810   11.2606080643        
2017-08-03  8.371493    11.6832498048        
2017-08-04  8.287087    10.2299365809        
2017-08-05  12.312134   12.2315872649        
2017-08-06  11.367797   10.9899627817        
2017-08-07  17.745977   14.2295366964        
2017-08-08  10.109057   10.1245616922        
2017-08-09  17.496496   14.0287175836        
2017-08-10  8.339878    11.2479039882        
2017-08-11  8.737030    10.0940590718        
2017-08-12  8.032037    9.39042740568        
2017-08-13  7.555324    9.33242169748        
2017-08-14  9.514418    11.8174795236        
2017-08-15  8.862755    8.98957755123        
2017-08-16  8.136355    11.4131114138        
2017-08-06              11.2479039882        
2017-08-07              10.0940590718        
2017-08-08              9.39042740568        
2017-08-09              9.33242169748        
2017-08-10              11.8174795236        
2017-08-11              8.98957755123        
2017-08-12              11.4131114138        
2017-08-13              11.2479039882        
2017-08-14              10.0940590718        
2017-08-15              9.39042740568        
2017-08-16              9.33242169748        
2017-08-17              11.8174795236   4.01416734251   19.6207917047
2017-08-18              8.98957755123   -0.453621346862 18.4327764493
2017-08-19              11.4131114138   1.74019160299   21.0860312246
2017-08-20              11.2479039882   0.874637979426  21.6211699969
2017-08-21              10.0940590718   4.39114905157   15.796969092
2017-08-22              9.39042740568   -4.25403965674  23.0348944681

So the real Data stops on 2017-08-17
BUT then the predicted data start again from 2017-08--6
Before the 95th percentiles kick on the 2nd time we cross 2017-08-17

What could be casuing this? It makes the graphe I am creating look messy.

alt text

Thanks
Mark

0 Karma
1 Solution

DalJeanis
Legend

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

View solution in original post

DalJeanis
Legend

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

mjm295
Path Finder

Thanks for the "Null" clarification.

0 Karma

mjm295
Path Finder

Thanks Dal, looking much tidier now. Just for completeness my final query is:

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  
| eval PctUsed = 100 - pctIdle 
|  timechart avg(PctUsed) as PercentUsed span=1h
| eval PercentUsed=round(PercentUsed,2)
| predict "PercentUsed" as futures algorithm=LLP future_timespan=960 lower90=low upper90=high
| eval futures=round(futures,2) 
| eval high(futures)=if(_time<=now(), null(), 'high(futures)' ) 
| eval low(futures)=if(_time<=now(), null(), 'low(futures)' )
| eval low(futures)=if( 'low(futures)' < 0, 0, 'low(futures)' )
 | streamstats current=f max(_time) as priorbesttime
 | where _time > priorbesttime
 | fields - priorbesttime

DalJeanis
Legend

@mjm295 - Thanks for posting that. It can help other people when they can see the solution that worked.

0 Karma

cmerriman
Super Champion

what version of Splunk are you using? i just ran your query with some of my own data and it worked fine. I'm on 6.6.2

0 Karma

mjm295
Path Finder

its 6.5.1 to be exact.

0 Karma

mjm295
Path Finder

Version 6.5 here.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...