I have a working query, but since this is the first time I used stats
as a replacement for join
/ transaction
so I would appreciate if you can review it.
The task is to count how many times
EventB
is preceded byEventA
within the same transaction (defined by unique ID).
Normally there should not be more than 1 event of each kind.
index=... sourcetype=...
EID=EventA OR EID=EventA
| eval EID1Time=if(EID="EventA", _time, null())
| eval EID2Time=if(EID="EventB", _time, null())
| table EID ID _time EID1Time EID2Time
| stats earliest(EID1Time) as E1Time, latest(EID2Time) as E2Time by ID
| search F1Time < F2Time
| stats count as total
Hi pm771,
there are only two updates that I'd test on your search:
index=... sourcetype=...
EID=EventA OR EID=EventA
| eval EID1Time=if(EID="EventA", _time, null()), EID2Time=if(EID="EventB", _time, null())
| stats earliest(EID1Time) as E1Time, latest(EID2Time) as E2Time by ID
| where E1Time < E2Time
| stats count as total
In other words the table
command gives no help to your search and slow down your search.
In addition I suggest to use where
command instead search.
Bye.
Giuseppe
because, you haven't to execute a new search, you have only to filter your events.
see http://docs.splunk.com/Documentation/Splunk/6.5.2/SearchReference/Where
I usually use "where" instead "search" when possible (comparing numbers).
The where command uses the same expression syntax as the eval command. Also, both commands interpret quoted strings as literals. If the string is not quoted, it is treated as a field name. Because of this, you can use the where command to compare two different fields, which you cannot use the search command to do.
There's am interesting answer to this question: https://answers.splunk.com/answers/128739/difference-between-where-and-search-commands.html
Bye.
Giuseppe
There's a more official response regarding the difference here - https://answers.splunk.com/answers/306047/is-there-any-performance-impact-when-you-use-searc.html
Basically, "search" may be slightly more efficient, but "where" allows you to compare the values of two fields.
| search x>y
The above searches for values of X that are greater than "y", whereas the below searches for values of X that are greater than the value of Y.
| where x>y
Thus, when it can be used, "search" is slightly more efficient because it isn't analyzing the right side of that equation.
1) You are switching around your field names unnecessarily.
2) You didn't do anything with EID after putting it in the table, so get rid of it.
3) You may later want to present the data summed by the _time of the transaction, so retain it.
index=... sourcetype=...
EID=EventA OR EID=EventB
| eval EventATime=if(EID="EventA", _time, null())
| eval EventBTime=if(EID="EventB", _time, null())
| table _time RQID EventATime EventBTime
| stats earliest(_time) as _time, earliest(EventATime) as EventATime, latest(EventBTime) as EventBTime by RQID
| where EventATime < EventBTime
In the above
4) I stuck to EventA and EventB in the names rather than switching to EID1 and EID2.
5) I assumed that the start time of the transaction was more useful than the end or the average, for the _time field.
now we take those transactions and go to this to get you a number
| stats count as total
or we go to THIS to get you a chart of how many happen over time.
| timestats count span=1d
There's also a fun little shortcut using curly braces { }
index=... sourcetype=...
EID=EventA OR EID=EventB
| eval whichtime = EID."Time"
| eval {whichtime}= _time
| table _time RQID EventATime EventBTime
| stats earliest(_time) as _time, earliest(EventATime) as EventATime, latest(EventBTime) as EventBTime by RQID
| where EventATime < EventBTime
Of course, this assumes that EventA and EventB have usable EID names for this.
Edited two places to use "where" instead of search, based on a note by Steveyz [splunk] here - https://answers.splunk.com/answers/306047/is-there-any-performance-impact-when-you-use-searc.html
One little simple one - replace table
with fields
- fields will filter whereas table won't.
Aside, you can put both evals on the same line if you want:
| eval EID1Time=if(EID="EventA", _time, null()) , EID2Time=if(EID="EventB", _time, null())
, although I dont think it will change performance much.
1) No, you can't use "fields" to filter at that point, because there would be literally zero results. Only one of EID1Time and EID2Time will ever be populated by that code before the stats command rolls them together.
There's not really a reason to have the fields command there, for the computer anyway, because only the two times and the RQID will matter in that next command, and splunk is smart enough to know that and drop what it doesn't need to transmit.
2) There SHOULD be no performance difference between having the eval on one or two lines. There's no reason that the eventual machine code should not be identical with a comma or a pipe.