Splunk Search

How to make transactions that keep groupings in chronological order

rjthibod
Champion

I am working with time-series data, and I want to groups events based on the same values in three fields: field1, field2, and field3. All events are timestamped.

I want to group the events into transactions where field1, field2, and field3 are the same and all of the events are in chronological order. I don't want events that occur out of chronoglogical order to be in the same transaction. Also, there is no way to know a maximum amount of time between any of the events.

For example, imagine the event sequence looks like the following if you put it into a table.

Time | field1 | field2 | field3
0 | a | b | c
1 | a | b | c
2 | a | b | c
3 | a | b | z
4 | a | b | y
5 | a | b | c
6 | a | b | c

The desired transaction behavior I am trying to achieve would turn the events above into the following transactions

Transaction | Duration | field1 | field2 | field3
1 | 2 - 0 | a | b | c
2 | 3 - 3 | a | b | z
3 | 4 - 4 | a | b | y
4 | 6 - 5 | a | b | c

1 Solution

rjthibod
Champion

Well, I think I may have discovered my own answer by chance.

The key was making a new field that indicates the transition or change in the field am I most interested in. This is probably too brittle for a general solution, but it does work when you are interested in a specific field changing.

MY_SEARCH 
| streamstats global=f window=2 dc(field3) as marker 
| transaction field1 field2 field3 endswith="marker=2" 
| sort + _time

View solution in original post

rjthibod
Champion

Well, I think I may have discovered my own answer by chance.

The key was making a new field that indicates the transition or change in the field am I most interested in. This is probably too brittle for a general solution, but it does work when you are interested in a specific field changing.

MY_SEARCH 
| streamstats global=f window=2 dc(field3) as marker 
| transaction field1 field2 field3 endswith="marker=2" 
| sort + _time

woodcock
Esteemed Legend

Brilliant. But I think this is better:

MY_SEARCH
| eval keyFields=field1 . "," . field2 . "," field3
| streamstats current=f last(keyFields) AS keyFieldsPrev
| eval newTransaction = if((keyFields != keyFieldsPrev), "TRUE", "FALSE")
| transaction keepevicted=true field1 field2 field3 startswith="newTransaction = TRUE" 
| sort + _time

rjthibod
Champion

Yes, that should work too and matches closer with what I put in my final version.

One thing to note that is the very last grouping of events is dropped unless you mark the last event (chronologically) with newTransaction=TRUE

0 Karma

woodcock
Esteemed Legend

I knew that but figured losing 1 event at the edge wouldn't matter.

0 Karma

rjthibod
Champion

Unfortunately, you lose more than the tailing event. You lose the whole sequence of events that would make up the transaction for the last group. I get around this by manually marking the last occurring event as "newTransaction" (as you have it). Doing this seems to make everything work well.

0 Karma

woodcock
Esteemed Legend

If events are not close together, you will be very happy with the performance of the transaction command over any substantial dataset. Try this instead (you may not need the reverse command; I am not sure I get how you would like them sorted so try it both ways):

... | reverse | stats list(_raw) BY field1,field2,field3
0 Karma

rjthibod
Champion

I am sorry if I was unclear, but that does not keep things in sequence. That just groups all events in raw form for the combinations of field1, field2, and field3.

I need the groupings to be like buckets where a bucket is made up of all events for the unique combination of field1, field2, and field3 and those events happened in succession. So Group all events for unique combination of field1, field2, and field3, but you stop the current group whenever a new event is encountered and one of the fields changes. You will see in my example that the transaction groups 1 and 4 have the same values for field1, field2, and field3, but they are in distinct groups because events at Time 3 and Time 4 happened in between them when looking at chronological order.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...