Splunk Search

How to improve my current search to count how many times EventB is preceded by EventA within the same transaction?

pm771
Communicator

I have a working query, but since this is the first time I used stats as a replacement for join / transaction so I would appreciate if you can review it.

The task is to count how many times EventB is preceded by EventA within the same transaction (defined by unique ID).

Normally there should not be more than 1 event of each kind.

index=... sourcetype=... 
EID=EventA OR EID=EventA 
| eval EID1Time=if(EID="EventA",  _time, null()) 
| eval EID2Time=if(EID="EventB",  _time, null()) 
| table EID ID _time EID1Time EID2Time
| stats earliest(EID1Time) as E1Time, latest(EID2Time) as E2Time by ID
| search F1Time < F2Time
| stats count as total 
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi pm771,
there are only two updates that I'd test on your search:

 index=... sourcetype=... 
 EID=EventA OR EID=EventA 
 | eval EID1Time=if(EID="EventA", _time, null()), EID2Time=if(EID="EventB", _time, null()) 
 | stats earliest(EID1Time) as E1Time, latest(EID2Time) as E2Time by ID
 | where E1Time < E2Time
 | stats count as total 

In other words the table command gives no help to your search and slow down your search.
In addition I suggest to use where command instead search.

Bye.
Giuseppe

pm771
Communicator
  1. TABLE is just a leftover from debugging.
  2. Why is WHERE better than SEARCH?
0 Karma

gcusello
SplunkTrust
SplunkTrust

because, you haven't to execute a new search, you have only to filter your events.
see http://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/Where
I usually use "where" instead "search" when possible (comparing numbers).

The where command uses the same expression syntax as the eval command. Also, both commands interpret quoted strings as literals. If the string is not quoted, it is treated as a field name. Because of this, you can use the where command to compare two different fields, which you cannot use the search command to do.

There's am interesting answer to this question: https://answers.splunk.com/answers/128739/difference-between-where-and-search-commands.html
Bye.
Giuseppe

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

There's a more official response regarding the difference here - https://answers.splunk.com/answers/306047/is-there-any-performance-impact-when-you-use-searc.html

Basically, "search" may be slightly more efficient, but "where" allows you to compare the values of two fields.

| search x>y

The above searches for values of X that are greater than "y", whereas the below searches for values of X that are greater than the value of Y.

| where x>y

Thus, when it can be used, "search" is slightly more efficient because it isn't analyzing the right side of that equation.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

1) You are switching around your field names unnecessarily.

2) You didn't do anything with EID after putting it in the table, so get rid of it.
3) You may later want to present the data summed by the _time of the transaction, so retain it.

 index=... sourcetype=... 
 EID=EventA OR EID=EventB 
 | eval EventATime=if(EID="EventA", _time, null()) 
 | eval EventBTime=if(EID="EventB",  _time, null()) 
 | table _time RQID EventATime EventBTime
 | stats earliest(_time) as _time, earliest(EventATime) as EventATime, latest(EventBTime) as EventBTime by RQID
 | where EventATime < EventBTime 

In the above
4) I stuck to EventA and EventB in the names rather than switching to EID1 and EID2.
5) I assumed that the start time of the transaction was more useful than the end or the average, for the _time field.

now we take those transactions and go to this to get you a number

 | stats count as total 

or we go to THIS to get you a chart of how many happen over time.

| timestats count  span=1d

There's also a fun little shortcut using curly braces { }

 index=... sourcetype=... 
 EID=EventA OR EID=EventB 
 | eval whichtime = EID."Time"
 | eval {whichtime}= _time
 | table _time RQID EventATime EventBTime
 | stats earliest(_time) as _time, earliest(EventATime) as EventATime, latest(EventBTime) as EventBTime by RQID
 | where EventATime < EventBTime 

Of course, this assumes that EventA and EventB have usable EID names for this.


Edited two places to use "where" instead of search, based on a note by Steveyz [splunk] here - https://answers.splunk.com/answers/306047/is-there-any-performance-impact-when-you-use-searc.html

aljohnson_splun
Splunk Employee
Splunk Employee

One little simple one - replace table with fields - fields will filter whereas table won't.

Aside, you can put both evals on the same line if you want:
| eval EID1Time=if(EID="EventA", _time, null()) , EID2Time=if(EID="EventB", _time, null()), although I dont think it will change performance much.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

1) No, you can't use "fields" to filter at that point, because there would be literally zero results. Only one of EID1Time and EID2Time will ever be populated by that code before the stats command rolls them together.

There's not really a reason to have the fields command there, for the computer anyway, because only the two times and the RQID will matter in that next command, and splunk is smart enough to know that and drop what it doesn't need to transmit.

2) There SHOULD be no performance difference between having the eval on one or two lines. There's no reason that the eventual machine code should not be identical with a comma or a pipe.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...