Splunk Search

how to mesure the duration of unfinished and uncompleted transactions ?

mataharry
Communicator

Hi

I am collecting events of states of devices, a device is supposed to be go from the state "false"=beginning of transaction to "true"=end of transactions. I would like to retrieve the duration of my transactions, to track the long/unfinished ones.

My problem is that my logs doesn't always contains the end of transactions event.

Here is an example of my logs. I added comments at the end of the events. you can see that the same device/store pair may starts several transactions without finishing them all.

Thu Mar 31 10:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 1 OK start
Thu Mar 31 11:56:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 1 OK end
Thu Mar 31 12:00:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 2 broken start
Thu Mar 31 12:30:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 3 OK start
Thu Mar 31 12:40:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 3 OK end
Thu Mar 31 13:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 4 broken start
Thu Mar 31 14:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 5 broken start
Thu Mar 31 15:56:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 6 OK start
Thu Mar 31 16:56:26 PDT 2011 Device=1 Store=1 Status=true detail=transaction 6 OK end
Thu Mar 31 17:04:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 7 broken start
Thu Mar 31 17:10:26 PDT 2011 Device=1 Store=1 Status=false detail=transaction 8 start - unfinished yet

I would like to retrieve :

  • all the correct transactions with duration
  • every time a new transaction restart, I want to consider the previous one as finished
  • I also would like to have the duration of the last unfinished transactions till now.
1 Solution

yannK
Splunk Employee
Splunk Employee

This is hard but doable :

The difficult points are :

  • You don't have any unique transaction id
  • You are missing the end of transaction events
  • The transaction function is not enough to deal with all the invalid transactions (the startswith and the endswith cannot be to crazy)

The solution is to run 2 searches and append the result :

  • first search is a transaction keeping only the complete valid transactions (false -> true)
  • the second search is for the invalid transactions using a streamstats calculating the time interval between 2 identical device/store pair switching from false->false. (this no unique field is tempid_txn)
  • the result of the 2 searches is regrouped by append. don't forget to tune the limits of the sub-search.
  • for each group of event a unique field is created (id_txn) including the store, device and first timestamp. This field is used to dedup the transactions events existing in the 2 searches. (some false->false are simply wrong, because they may be a valid false->true in the first search)
  • then the duration of unfinished transactions is calculated to the current time.

At the end the useful fields to display are : Store, Device, duration, valid_txn, id_txn

source="*transaction.log" true OR false | eval valid_txn="valid" | eval id_txn=Device+"-"+Store+"-"+_time 
| transaction Device,Store startswith=false endswith=true keepevicted=false maxevents=2 mvlist=true 
| append maxout=100000 timeout=6000 maxtime=6000 
[ 
  search source="*transaction.log" false 
  | eval valid_txn="invalid" 
  | eval id_txn=Device+"-"+Store+"-"+_time 
  | eval tempid_txn=Device+"-"+Store 
  | streamstats range(_time) as duration window=2 by tempid_txn 
] 
| eval id_txn=mvindex(id_txn,0) 
| dedup id_txn 
| eval duration=if(duration=0,now()-_time,duration) 
| eval raw=_raw | table id_txn, Device, Store, duration, Status, closed_txn, valid_txn, raw 
| sort duration

voila !

View solution in original post

yannK
Splunk Employee
Splunk Employee

To provide another simplified example to count the duration of unfinished transactions

events :


2012-06-20 20:00:00.000 uniqueid=22 state=start
2012-06-20 21:00:00.000 uniqueid=22 state=end
2012-06-20 22:00:00.000 uniqueid=23 state=start


* | transaction uniqueid startswith="state=start" endswith="state=end" keeporphans=true
| eval duration_sec=if(isnull(duration),now()-_time,duration)
| eval status=if(closed_txn==1,"completed","still in progress")
| table _time duration_sec status _raw

We are keeping the orphean transactions and manually calculate the duration from first event to now().

Remarks :

  • duration is in seconds
  • closed_txt=1 means that the transaction has been closed, here we replace itby the text field status.
  • the duration will not work for realtime searches (the result of now() will not re-evaluated over the time, and be static)
0 Karma

yannK
Splunk Employee
Splunk Employee

This is hard but doable :

The difficult points are :

  • You don't have any unique transaction id
  • You are missing the end of transaction events
  • The transaction function is not enough to deal with all the invalid transactions (the startswith and the endswith cannot be to crazy)

The solution is to run 2 searches and append the result :

  • first search is a transaction keeping only the complete valid transactions (false -> true)
  • the second search is for the invalid transactions using a streamstats calculating the time interval between 2 identical device/store pair switching from false->false. (this no unique field is tempid_txn)
  • the result of the 2 searches is regrouped by append. don't forget to tune the limits of the sub-search.
  • for each group of event a unique field is created (id_txn) including the store, device and first timestamp. This field is used to dedup the transactions events existing in the 2 searches. (some false->false are simply wrong, because they may be a valid false->true in the first search)
  • then the duration of unfinished transactions is calculated to the current time.

At the end the useful fields to display are : Store, Device, duration, valid_txn, id_txn

source="*transaction.log" true OR false | eval valid_txn="valid" | eval id_txn=Device+"-"+Store+"-"+_time 
| transaction Device,Store startswith=false endswith=true keepevicted=false maxevents=2 mvlist=true 
| append maxout=100000 timeout=6000 maxtime=6000 
[ 
  search source="*transaction.log" false 
  | eval valid_txn="invalid" 
  | eval id_txn=Device+"-"+Store+"-"+_time 
  | eval tempid_txn=Device+"-"+Store 
  | streamstats range(_time) as duration window=2 by tempid_txn 
] 
| eval id_txn=mvindex(id_txn,0) 
| dedup id_txn 
| eval duration=if(duration=0,now()-_time,duration) 
| eval raw=_raw | table id_txn, Device, Store, duration, Status, closed_txn, valid_txn, raw 
| sort duration

voila !

yannK
Splunk Employee
Splunk Employee

the valid_txn column is important to inform that the duration is valid, or based on estimations.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...