Hi all, is there any known issue on the startswith clause, when using multiple fields to identify complex transactions?
In the following case using endswith alone does a good job, but using startswith or both of them will provide incorrect results.
Here's my case: it's a sample file, manually put together to explore the topic. Transactions are made by two types of events, those with tnsid "open" the transaction and those with altid "terminate" it. Those events are "chained" by an event having both fields in it.
2010-12-01 08:00:01 - init - tnsid=AAA
2010-12-01 08:00:03 - init - tnsid=BBB
2010-12-01 08:00:05 - mid1 - tnsid=AAA
2010-12-01 08:00:07 - mid1 - tnsid=BBB
2010-12-01 08:00:09 - chain - tnsid=AAA altid=111
2010-12-01 08:00:11 - mid2 - altid=111
2010-12-01 08:00:13 - chain - tnsid=BBB altid=222
2010-12-01 08:00:15 - term - altid=111
2010-12-01 08:00:17 - init - tnsid=CCC
2010-12-01 08:00:19 - mid2 - altid=222
2010-12-01 08:00:21 - mid1 - tnsid=CCC
2010-12-01 08:00:23 - chain - tnsid=CCC altid=555
2010-12-01 08:00:25 - term - altid=222
2010-12-01 08:00:27 - init - tnsid=AAA
2010-12-01 08:00:28 - mid2 - altid=555
2010-12-01 08:00:29 - mid1 - tnsid=AAA
2010-12-01 08:00:30 - chain - tnsid=AAA altid=666
2010-12-01 08:00:35 - term - altid=555
I can achieve the expected result through the simplest transam command:
| file /tmp/tnsexp.log | extract | sort - _time | transam tnsid altid
T1:
2010-12-01 08:00:01 - init - tnsid=AAA
2010-12-01 08:00:05 - mid1 - tnsid=AAA
2010-12-01 08:00:09 - chain - tnsid=AAA altid=111
2010-12-01 08:00:11 - mid2 - altid=111
2010-12-01 08:00:15 - term - altid=111
T2:
2010-12-01 08:00:03 - init - tnsid=BBB
2010-12-01 08:00:07 - mid1 - tnsid=BBB
2010-12-01 08:00:13 - chain - tnsid=BBB altid=222
2010-12-01 08:00:19 - mid2 - altid=222
2010-12-01 08:00:25 - term - altid=222
T3:
2010-12-01 08:00:17 - init - tnsid=CCC
2010-12-01 08:00:21 - mid1 - tnsid=CCC
2010-12-01 08:00:23 - chain - tnsid=CCC altid=555
2010-12-01 08:00:28 - mid2 - altid=555
2010-12-01 08:00:35 - term - altid=555
T4:
2010-12-01 08:00:27 - init - tnsid=AAA
2010-12-01 08:00:29 - mid1 - tnsid=AAA
2010-12-01 08:00:30 - chain - tnsid=AAA altid=666
The latest is an "open" transaction which should or should not be returned depending on the keepevicted setting, but having Splunk no knowledge about what's an evicted transaction the parameter has no effect. If I wish to code some logic into the command:
----- Here's its output:
2010-12-01 08:00:01 - init - tnsid=AAA
2010-12-01 08:00:05 - mid1 - tnsid=AAA
2010-12-01 08:00:09 - chain - tnsid=AAA altid=111
2010-12-01 08:00:11 - mid2 - altid=111
2010-12-01 08:00:15 - term - altid=111
2010-12-01 08:00:03 - init - tnsid=BBB
2010-12-01 08:00:07 - mid1 - tnsid=BBB
2010-12-01 08:00:13 - chain - tnsid=BBB altid=222
2010-12-01 08:00:17 - init - tnsid=CCC
2010-12-01 08:00:19 - mid2 - altid=222
2010-12-01 08:00:25 - term - altid=222
2010-12-01 08:00:27 - init - tnsid=AAA
2010-12-01 08:00:28 - mid2 - altid=555
2010-12-01 08:00:35 - term - altid=555
----- Here's its output:
2010-12-01 08:00:01 - init - tnsid=AAA
2010-12-01 08:00:05 - mid1 - tnsid=AAA
2010-12-01 08:00:09 - chain - tnsid=AAA altid=111
2010-12-01 08:00:11 - mid2 - altid=111
2010-12-01 08:00:15 - term - altid=111
2010-12-01 08:00:17 - init - tnsid=CCC
2010-12-01 08:00:19 - mid2 - altid=222
2010-12-01 08:00:21 - mid1 - tnsid=CCC
2010-12-01 08:00:25 - term - altid=222
2010-12-01 08:00:27 - init - tnsid=AAA
2010-12-01 08:00:28 - mid2 - altid=555
2010-12-01 08:00:29 - mid1 - tnsid=AAA
2010-12-01 08:00:35 - term - altid=555
I'd need some help in finding out the rationale behind this behaviour...
Thanks, Paolo
This issue has been fixed in 4.3.x to get the correct behavior you should add: unifyends=true to your transaction command. The idea here is to tell the transaction command that the opening and closing events of a transaction must also share at least one field value with the rest of the events.
So, is you issues with approach 1 simply that you can't determine if the transaction is still open or not? If so there may be a very simple solution.
You could determine if the transaction is complete by simply tacking a search
or eval
to the end.
| file /tmp/tnsexp.log | extract | sort - _time | transam tnsid altid | eval is_complete=searchmatch("term")
It is much better for performance if you can specify startswith
or endswith
because transaction can be evicted once they are complete (uses less memory); but it may be possible to get away without them using appropriate maxspan
and/or maxpause
settings.
I have some more complex approaches that might work here; but they involve subsearches and/or nested transactions, so there's no point going there if the above gets the job done.