Need help to visualize the query

pench2k19 · ‎03-04-2020

Hi Guys,

I have the following query and query result, i am struggling to show it in graph:

index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host 
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S") 
| fields Job host Autosysjob_time Status _time time_epoch 
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=* AND host=*
| sort + time_epoch
| stats count  by _time Job Status host cluster time_epoch
| bin span=2m time_epoch
| makecontinuous  _time span=2m
| filldown _time Job Status host cluster count time_epoch

Query result:

_time   Job Status  host    cluster time_epoch  count
3/3/2020 8:00   1CDH_ING_NBC_ACCT_MSTR_DY_CURR_HG                       STARTING    XXXX    edl-prd-m01 43893.33337 1
3/3/2020 8:00   1CDH_ING_NBC_ACCT_OB_PRIM_CK_DY_TMPRL_BMG     STARTING  XXXX    edl-prd-m01 43893.33338 1
3/3/2020 8:00   1CDH_ING_NBC_EVNT_CUST_ID_CHG_HY_HIST_CIS        RUNNING    XXXXX   edl-prd-m01 43893.33372 1
3/3/2020 8:00   1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS        RUNNING    XXXX    edl-prd-m01 43893.33372 1
3/3/2020 9:00   1CDH_ING_NBC_EVNT_CUST_PH_CHG_HY_HIST_CIS        RUNNING    XXXX    edl-prd-m01 43893.33372 1

Now i am struggling to show how many jobs are running or starting at each minute , can you please help

woodcock · ‎03-14-2020

You should take a look at the concurrency command and the answers.splunk.com articles about it.

sideview · ‎03-09-2020

I think the core problem, or at the very least a major source of confusion, is that you're taking _time, which is already an integer value (the number of seconds since big ben rang out the new year in 1970), and turning it into a string-formatted time, naming that field "time_epoch" which is SUPER confusing because its values are string-formatted times, not epochtime integers.

Then sorting by that string formatted time. This is a little odd but so far not really a problem - just weird since the original _time value can do all that perfectly well too.

Wild guess - you're getting confused by the behavior of the Splunk UI - whenever it sees any field called "_time" it sneakily sneaks in and on-the-fly converts the display of those values to string-formatted times. However the values underneath are epochtime integers. you can always verify that I'm not talking crazy by doing | eval srsly_wat=_time

Then anyway, back to our story the problem arises with this -
| bin span=2m time_epoch

that would make sense if time_epoch was actually an epochtime-valued field, ie if it held integer values. It makes no sense however in this case where the "time_epoch" field holds strings. Bin has no idea what to do with this combination of field and value, and so what it does is -- nothing at all. it throws no error but it does nothing.

I think if you fix that core issue, you might be able to revisit what you're trying to do with the bin+makecontinuous +filldown. re-examining that fresh, with some fresh caffeine, is a good idea.

 index=infra_apps sourcetype=ca:atsys:edemon:txt
 | search Job=*
 | rename hostname as host 
 | fields Job host Autosysjob_time Status _time 
 | lookup datalakenodeslist.csv host OUTPUT cluster
 | mvexpand cluster
 | table Job Status host cluster _time 
 | search cluster=* AND host=*
 | sort + _time
 | stats count  by _time Job Status host cluster

This is bit of an instinct, but I have seen many cases where people where folks are fiddling with bin+makecontinuous+filldown basically trying to recreate one of timechart's core use cases, beacuse they've come to believe that timechart can't do it.

I hope this helps - I don't think it's your whole answer, but I think it clears some of the mess up and puts you in a better place. and i'll update this answer if you post back etc.

pench2k19 · ‎03-10-2020

I just got a query derived for no of running jobs at each time as follows

index=XXX sourcetype=YYYY EventCode=40 AND (Status=Failure OR Status=Terminated OR Status=Running OR Status=Success) AppID=$appid$ Machine=$host$ Job=$job$
| dedup _raw
| lookup datalakenodeslist.csv host as Machine OUTPUT cluster
| search cluster=$clustername$
| eval running=if(Status="RUNNING","1","0"),status=if(Status="RUNNING","start","stop"), time=_time+status
| bin span=2m _time
| stats max(running) as running, earliest(time) as first, latest(time) as last by Job,_time
| xyseries _time Job running first last
| makecontinuous span=2m _time
| streamstats window=2 global=f earliest(last*) as last*
| reverse
| streamstats window=2 global=f earliest(first*) as first*
| reverse
| foreach running*
[ eval <>=if(isnull('<>') AND like('first<>',"%start"),"0",if(isnull('<>') AND like('first<>',"%stop"),"1",if(isnull('<>') AND like('last<>',"%start"),"1",if(isnull('<>') AND like('last<>',"%stop"),"0",'<>'))))]
| fields - first*, last*
| filldown *
| reverse
| filldown *
| reverse
| addtotals fieldname=Running
| fields _time,Running

But i find it too difficult to just understand the query :(, and not able to edit this query to have single query for both RUNNING and STARTING statuses.

Can you please help to have a single query for both the statuses.

pench2k19 · ‎03-10-2020

@sideview thanks for such detailed explanation.

Sorting the resutls before stats command with _time was not working for me , thats why i had to use time_epoch.

And i tried using | bin span=2m time_epoch and | bin span=2m _time but these are giving me different results at times , i still confused what to use on this line.

P S: Example of how _time being extracted in this case is 2020-03-09 10:00:03.863

pench2k19 · ‎03-05-2020

@richgalloway this is not helping....

I just want to give you some back ground about this..I have a single event for each job status in splunk

But to make it continuous i am inserting 2 mins interval using bin and makecontinous command and filling those intervals with the last status it updated,

For example if its running status at 9 am and success status at 10 am, i am creating 2m interval between that 1 hour gap and filling it with Running status.

I suspect that is one of the reason i am not able to get that in graph as i am expexting.

PS: I am getting the correct result till the filldown command, after that i am struggling.

pench2k19 · ‎03-05-2020

index=infra_apps sourcetype=ca:atsys:edemon:txt
| search Job=*
| rename hostname as host 
| eval time_epoch=strftime(_time,"%Y-%m-%d %H:%M:%S")
| fields Job host Autosysjob_time Status _time time_epoch
| lookup datalakenodeslist.csv host OUTPUT cluster 
| mvexpand cluster
| table Job Status host cluster _time time_epoch
| search cluster=EDL-PRD-M01 AND host=*
| sort + time_epoch
| stats count  by _time Job Status host cluster time_epoch
| bin span=2m _time
| makecontinuous span=2m _time
| filldown _time Job Status host cluster count time_epoch
| stats count by _time Status

this is giving me the correct result, but i am not able have a filter in the query like search Status=RUNNING for example, When i add the status filter its giving me different result.

@woodcock @jka can you please help.

pench2k19 · ‎03-09-2020

@woodcock @jkat54 @somesoni2 @sideview can you please help

richgalloway · ‎03-04-2020

Try adding | timechart span=1m count by Status to the end of your query.

---
If this reply helps you, Karma would be appreciated.

Need help to visualize the query

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes