I am working with time-series data, and I want to groups events based on the same values in three fields: field1, field2, and field3. All events are timestamped.
I want to group the events into transactions where field1, field2, and field3 are the same and all of the events are in chronological order. I don't want events that occur out of chronoglogical order to be in the same transaction. Also, there is no way to know a maximum amount of time between any of the events.
For example, imagine the event sequence looks like the following if you put it into a table.
Time | field1 | field2 | field3
0 | a | b | c
1 | a | b | c
2 | a | b | c
3 | a | b | z
4 | a | b | y
5 | a | b | c
6 | a | b | c
The desired transaction behavior I am trying to achieve would turn the events above into the following transactions
Transaction | Duration | field1 | field2 | field3
1 | 2 - 0 | a | b | c
2 | 3 - 3 | a | b | z
3 | 4 - 4 | a | b | y
4 | 6 - 5 | a | b | c
Well, I think I may have discovered my own answer by chance.
The key was making a new field that indicates the transition or change in the field am I most interested in. This is probably too brittle for a general solution, but it does work when you are interested in a specific field changing.
MY_SEARCH
| streamstats global=f window=2 dc(field3) as marker
| transaction field1 field2 field3 endswith="marker=2"
| sort + _time
Well, I think I may have discovered my own answer by chance.
The key was making a new field that indicates the transition or change in the field am I most interested in. This is probably too brittle for a general solution, but it does work when you are interested in a specific field changing.
MY_SEARCH
| streamstats global=f window=2 dc(field3) as marker
| transaction field1 field2 field3 endswith="marker=2"
| sort + _time
Brilliant. But I think this is better:
MY_SEARCH
| eval keyFields=field1 . "," . field2 . "," field3
| streamstats current=f last(keyFields) AS keyFieldsPrev
| eval newTransaction = if((keyFields != keyFieldsPrev), "TRUE", "FALSE")
| transaction keepevicted=true field1 field2 field3 startswith="newTransaction = TRUE"
| sort + _time
Yes, that should work too and matches closer with what I put in my final version.
One thing to note that is the very last grouping of events is dropped unless you mark the last event (chronologically) with newTransaction=TRUE
I knew that but figured losing 1 event at the edge wouldn't matter.
Unfortunately, you lose more than the tailing event. You lose the whole sequence of events that would make up the transaction for the last group. I get around this by manually marking the last occurring event as "newTransaction" (as you have it). Doing this seems to make everything work well.
If events are not close together, you will be very happy with the performance of the transaction
command over any substantial dataset. Try this instead (you may not need the reverse
command; I am not sure I get how you would like them sorted so try it both ways):
... | reverse | stats list(_raw) BY field1,field2,field3
I am sorry if I was unclear, but that does not keep things in sequence. That just groups all events in raw form for the combinations of field1, field2, and field3.
I need the groupings to be like buckets where a bucket is made up of all events for the unique combination of field1, field2, and field3 and those events happened in succession. So Group all events for unique combination of field1, field2, and field3, but you stop the current group whenever a new event is encountered and one of the fields changes. You will see in my example that the transaction groups 1 and 4 have the same values for field1, field2, and field3, but they are in distinct groups because events at Time 3 and Time 4 happened in between them when looking at chronological order.