Solved: Find the list of consecutive events by a field

langlv · ‎07-13-2017

Hello,

I am trying to find out the list of consecutive card transactions on same terminal in period of time, eg: more than 3 inquiry transactions in a ATM machine from the hold transaction history of the system (from multiple ATMs).

 terminal_id    tran_type
ATM1    INQ
ATM2    CWD
ATM1    INQ
ATM1    INQ
ATM2    CWD
ATM1    INQ
ATM2    INQ
ATM1    CWD
ATM3    INQ
ATM3    INQ
ATM1    CWD
ATM3    INQ
ATM3    CWD
ATM1    INQ

Given the parameter for listing consecutive INQ transactions on same terminal is 3, I want to list out the following:

terminal_id tran_type
ATM1    INQ
ATM1    INQ
ATM1    INQ
ATM1    INQ
ATM3    INQ
ATM3    INQ
ATM3    INQ

where both ATM1 & ATM3 need to be checked (ATM1 has 4 and ATM3 has 3 INQ consecutive transactions individually).

Thank you guys for your help.
Lang

vasanthmss · ‎07-13-2017

Updated,

your base search... |  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
 | autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Sample,

| makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
 |  table _time, terminal_id, tran_type  
|  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
| autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Explanation,

create an raw event with terminal_id and tran_type
use autoregress raw event you will get raw_p1. Prepares your events for calculating the autoregression, or the moving average, by copying one or more of the previous values for field into each event
field sameAsNext will indicate (with 0, 1 )same as next or not.
when the sameAsNext 0 comes create an session id and count the events in the session id you will get number of repeating items
simple where count>=3 will help you.

Old one,

try something like this,

your base search .... |  sort 0 _time, terminal_id, tran_type |  streamstats count by terminal_id, tran_type  |  eventstats max(count)  as count by terminal_id, tran_type  |  where count>=3 | fields - count

Sample search ,

| makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
|  table _time, terminal_id, tran_type 
|  sort 0 _time, terminal_id, tran_type 
|  streamstats count by terminal_id, tran_type 
|  eventstats max(count) as count by terminal_id, tran_type 
|  where count>=3
| fields - count

Hope this helps you!!!

V

View solution in original post

DalJeanis · ‎07-14-2017

Try this -

| makeresults 
| eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" 
| makemv delim="-" input 
| mvexpand input 
| rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
| table time, terminal_id, tran_type 
| streamstats count as recno
| eval _time = _time + recno
| sort 0 _time, terminal_id, tran_type 
| fields - recno
| rename COMMENT as "Above just enters your test data" 

| rename COMMENT as "Mark each new batch whenever tran_type changes, then mark each transaction with their batch number"
| streamstats current=f last(tran_type) as last_type by terminal_id
| eval newbatch=if(coalesce(last_type,"")=tran_type,0,1)
| streamstats sum(newbatch) as batchno by terminal_id

| rename COMMENT as "Count how many in each batch, then let pass only those with 3 or more"
| eventstats count as batchcount by terminal_id, batchno 
| where batchcount>=3
| fields _time, terminal_id, tran_type

vasanthmss · ‎07-13-2017

Updated,

your base search... |  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
 | autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Sample,

| makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
 |  table _time, terminal_id, tran_type  
|  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
| autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Explanation,

create an raw event with terminal_id and tran_type
use autoregress raw event you will get raw_p1. Prepares your events for calculating the autoregression, or the moving average, by copying one or more of the previous values for field into each event
field sameAsNext will indicate (with 0, 1 )same as next or not.
when the sameAsNext 0 comes create an session id and count the events in the session id you will get number of repeating items
simple where count>=3 will help you.

Old one,

try something like this,

your base search .... |  sort 0 _time, terminal_id, tran_type |  streamstats count by terminal_id, tran_type  |  eventstats max(count)  as count by terminal_id, tran_type  |  where count>=3 | fields - count

Sample search ,

| makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
|  table _time, terminal_id, tran_type 
|  sort 0 _time, terminal_id, tran_type 
|  streamstats count by terminal_id, tran_type 
|  eventstats max(count) as count by terminal_id, tran_type 
|  where count>=3
| fields - count

Hope this helps you!!!

V

langlv · ‎07-13-2017

Thanks vasanthmss for your help, but it still count all INQ by ATM1 while I need only consecutive INQ on ATM1. If another CWD on ATM1 occur the counting will be reset. Pls consider following sample events:

terminal_id tran_type
ATM1    INQ
ATM2    CWD
ATM1    INQ
ATM1    INQ
ATM2    CWD
ATM1    INQ
ATM2    INQ
ATM1    CWD
ATM3    INQ
ATM3    INQ
ATM1    CWD
ATM3    INQ
ATM3    CWD
ATM1    INQ

The last ATM1,INQ should be be counted because there is ATM1,CWD happen before.

Thanks,

vasanthmss · ‎07-14-2017

try something like this,

 | makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
 |  table _time, terminal_id, tran_type  
|  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
| autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Actual search will be,

your base search ...  |  eval raw=terminal_id+"-"+tran_type |sort 0 terminal_id
| autoregress raw  | eval sameAsNext=if(raw=raw_p1,1,0) | streamstats current=t count(eval(sameAsNext=0)) AS sessionID | eventstats count AS inArowCount BY sessionID | where inArowCount>=3 | table _time, terminal_id, tran_type

Updating my answer based on this,

V

langlv · ‎07-27-2017

Thanks vasanthmss, combining autoregress with streamstats and eventstats is what i am looking for. It works.

knielsen · ‎07-13-2017

Do you really need the events listed out, or do you want to know which combinations match your limit? The latter could easily be done with
| stats count by terminal_id, tran_type | where count>=3

which will give you a table with terminal_id, tran_type and the respective count.

langlv · ‎07-13-2017

Thanks for your comment, but | stats count... does not meet the business requirement as I need to count only consecutive INQ on terminal. If there is CWD on ATM1 then the INQ counting will be reset. I also really to list out the events for business cross check.

knielsen · ‎07-13-2017

actually, the solution for your exact expected result would be achieved with eventstats instead of stats. I tried it out like that:

| makeresults | eval input="ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM1,INQ-ATM2,CWD-ATM1,INQ-ATM2,INQ-ATM1,CWD-ATM3,INQ-ATM3,INQ-ATM1,CWD-ATM3,INQ" | makemv delim="-" input | mvexpand input | rex field=input "(?<terminal_id>[^,]+),(?<tran_type>.+)$" 
| eventstats count by terminal_id, tran_type | where count >= 3| table terminal_id, tran_type

langlv · ‎07-13-2017

Eventstats also does not help because it count all the INQ by ATM1 while I need only consecutive events. If I add 1 more ATM1,INQ to the event list like this:

Then the last INQ event should not be counted because there is another CWD on ATM1 before INQ happen.

Find the list of consecutive events by a field

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes