Hi Team
I have following details
One of autosys job is running for 20 hours with the status recording in the logs as RUNNING recording only one event with status in the logs .i.e when it changed from the status STARTING to RUNNING.
I would like to show the same status of the Job for that 20 hours duration in the timechart .
I am using the following query, but no able to get the solution. can you please help.
index=infra_apps sourcetype=ca:atsys:edemon:txt
| rename hostname as host
| fields Job host Autosysjob_time Status
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| search Status=STARTING AND cluster=* AND host="" AND Job=
| dedup Job Autosysjob_time host
| timechart span=5m count(Job) by cluster
You help is much appreciated.
Finally, i solved my own puzzle, Thanks to everyone for spending their time here
Correct Answer:
index=infra_apps sourcetype=ca:atsys:edemon:txt
| stats count by _time Job Status hostname
| bin span=5m _time
| makecontinuous span=5m _time
| filldown _time Job Status
| chart count(Job) OVER _time BY cluster
Finally, i solved my own puzzle, Thanks for everyone for spending their time here
Correct Answer:
index=infra_apps sourcetype=ca:atsys:edemon:txt
| stats count by _time Job Status hostname
| bin span=5m _time
| makecontinuous span=5m _time
| filldown _time Job Status
| chart count(Job) OVER _time BY cluster
You need the concurrency
command. Google for examples and you will find what you need (I have provided some answer before so that may help narrow it down):
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Concurrency
@woodcock Concurrency command looking like doesnt suit to my use case....For example i want to hold the job with running status and get it counted till it changes its next status to success when i audit it with my timechart for every 5 mins.
Any time you use co currency, you need to do some extra work but it is the only command that can easily do what you need. Look for answers by @woodcock and by @sideview.
UPDATE:
index=infra_apps sourcetype=ca:atsys:edemon:txt
| rename hostname as host
| fields Job host Autosysjob_time Status
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| dedup Job Autosysjob_time host
| table Autosysjob_time Job Status cluster host
| sort Autosysjob_time
| streamstats values(Status) as p_Status window=1 current=F by cluster
| eval flag=if(Status!=p_Status,"change",NULL)
| streamstats count(eval(flag="change")) as session by cluster
| rename Autosysjob_time as _time
| bin _time span=5m
| stats count(Job) as job_count count(eval(status="STARTING")) as STARTING count(eval(status="RUNNING")) as RUNNING count(eval(status="SUCCESS")) as SUCCESS by _time cluster session
here is the sample events FYR :
[02/18/2020 10:37:27.6386] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: SUCCESS JOB: Job1
Status = STARTING host = XXXX source = /opt/CA/r1 sourcetype = ca:atsys:edemon:txt
[02/18/2020 10:39:27.6386] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: SUCCESS JOB: Job1
Status = RUNNING host = XXXX source = /opt/CA/r1 sourcetype = ca:atsys:edemon:txt
[02/18/2020 12:37:27.6386] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: SUCCESS JOB: Job1
Status = RUNNING host = XXXX source = /opt/CA/r1 sourcetype = ca:atsys:edemon:txt
the latest query i have defined is as follows
eventtype="sourcetype-atsys_edemon"
| rename hostname as host
| transaction Job startswith=(Status=STARTING) endswith=(Status=RUNNING)
| eval zipped= mvzip(Autosysjob_time,Status,"!!!Status=")
| fields Job host zipped
| mvexpand zipped
| rex field=zipped "^(?.)!!!Status=(?.)"
| fields Job Autosysjob_time host Status
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| fields Autosysjob_time Job host Status cluster
| eval _time=Autosysjob_time
| search cluster=$clustername$ AND host="$host$" AND Job=$job$ AND Status=$jobstatus$
| dedup Job Autosysjob_time host
| chart count(Job) OVER _time BY cluster
I am not sure if this is correct query to check as this query running very slow.
small correction in the sample events
[02/18/2020 10:47:15.1318] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: CFDW_ADHOC_C_AIMSAS_D_INV_LNITEM_BILLING_CHGS_M MACHINE: XXXX
[02/18/2020 10:48:15.1318] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: CFDW_ADHOC_C_AIMSAS_D_INV_LNITEM_BILLING_CHGS_M MACHINE: XXXX
[02/18/2020 11:57:15.1318] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: SUCCESS JOB: CFDW_ADHOC_C_AIMSAS_D_INV_LNITEM_BILLING_CHGS_M MACHINE: XXXX
I am not sure Autosysjob_time
, but my query is like OK.
what's wrong?
@to4kawa Autosysjob_time is the timestamp that is available in the sample event, i see the difference in indexing time and Autosysjob_time. Confused what to populate on the x- axis.
Your query seems to be OK, but i am not able to get how many jobs are in running or Starting state at each minute by each cluster with your query.
how many jobs are in running or Starting state at each minute by each cluster
is not show the of the job as it is untill its changes its status
.
which do you want?ans1. count state=Running and state=Starting each other.
ans2. count the change like my query.
@to4kawa I would like to combine both of them.
I have an event with status=running at 10:48 as shown in the sample event, when i do the auditing for evvery 5 minutes it should get counted in the number of running jobs untill it changes it changes its job status=success at 11:57.
I'm not sure what you want to count the status.
my answer is updated. please confirm.
| search Status=STARTING AND cluster= AND host="" AND Job=*
This query can't find status change. This displays only Status=STARTING.
@to4kawa thanks for the response, the following is my updated query
index=infra_apps sourcetype=ca:atsys:edemon:txt
| rename hostname as host
| fields Job host Autosysjob_time Status
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| dedup Job Autosysjob_time host
| timechart span=5m count(Status) by cluster
I dont mind if you edit the query the way you want but i need to see the number of jobs running with what status by the clustert at any given point of time. And the if a job duration is 3 hours for suppose, the status of the job has to be same through out the 3 hours as RUNINNG.
P S : I have to deal with only two status from this query i.e RUNNING and STARTING
index=infra_apps sourcetype=ca:atsys:edemon:txt
| rename hostname as host
| fields Job host Autosysjob_time Status
| lookup datalakenodeslist.csv host OUTPUT cluster
| mvexpand cluster
| dedup Job Autosysjob_time host
Can you provide the result at this point?
@to4kawa its giving me reulsts with the following as fields when i add table * at the end of above query
Autosysjob_time Job Status cluster host _raw _time
what's Autosysjob_time ?
timechart
uses _ time.
why useless field is remain?
Thats when the autosys update the status in th records, we can use that as _time