Splunk Search

What is the difference between keeporphans and keepevicted in a transaction?

robmoser
Explorer

Hi Folks,

I'm fairly brand new to splunk, and trying to build a transaction out of cisco ASA data. My search looks like this:

index=rob_sandbox  sourcetype="cisco:asa" (message_id=305011 OR message_id=305012)   | transaction src_ip src_port dest_ip dest_port startswith="message_id=305011" endswith="message_id=305012"  keeporphans=1 unifyends=1 maxevents=2 maxspan=1d keepevicted=1 | eval is_orphan=if(_txn_orphan==1, 1, 0) 

(Each transaction should consist of a single event with message_id=305011 followed by a single event with matching source/destination ip/port and message_id 305012.)

The keeporphans and keepevicted settings are there because, at least at this early stage, I wanted to see if there were any holes in my data. But I don't really understand what the difference between those two settings is. When keepevicted is set I am seeing a "closed_txn" field in the output, set to 1 if a matching 305012 event was found, and 0 if not. I also see an "evicted" field in the output, which is essentially 1 when closed_txn is 0, and not present otherwise. The _txn_orphan field, which I renamed to is_orphan, is never present while keepevicted is set, but turns up if I remove that parameter and leave keeporphans set. Since I can't get both to appear in the same output, its difficult to say whether they correlate exactly or not.

Further analysis shows that every single orphan or evicted event is a start event; has message_id=305011, the condition of my startswith clause. (I can't confirm whether this is true in the raw data, but it seems unlikely.) What I'm trying to get is a list of any events - start or end events - which were not paired into transactions.

Thanks for any suggestions,

  • rob.
0 Karma
1 Solution

robmoser
Explorer

To answer my own question... sort of.

The strict definition of keepevicted from the transaction man page is true but a little incomplete. Essentially, the transaction command seems to be building up potential transactions in reverse time order. If it encounters something that invalidates that potential transaction (e.g. hits a maxevents limit without matching the startswith clause) then it throws out the potential transaction and all events previously included in it. Even if those events could have been part of a different, valid transaction. You need to filter your data and/or build your transaction such that an event could never be incorrectly paired (which may sound obvious, but is sometimes easier said than done.) Likewise, since we work backwards in time, anything that matches the endswith clause seems to start another potential transaction to be evaluated, which seems to prevent two events which would otherwise match from being paired together.

Since I've no doubt made that incomprehensible, let me try an example. I have some transaction data about NAT bridges between internal and external ip addresses. Each bridge should start with a Build event and end with a Teardown, and a given ip and port can't be part of more than one NAT at a time. But I don't trust my data so I want to look for Builds without Teardowns and Teardowns without Builds. Here's a sample, including one unmatched Build, and one unmatched Teardown:

2015-06-10T00:11:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-06-10T00:14:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-06-10T00:15:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00
2015-06-10T00:16:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00

If I build my search like this:

index=rob_sandbox  sourcetype="cisco:asa" | transaction src_ip src_port dest_ip dest_port startswith="message_id=305011" endswith="message_id=305012"  keeporphans=1 unifyends=1 maxevents=2 maxspan=1d maxopentxn=10000

Then both Teardown events will match the endswith clause, so instead of being in any way linked, each will be considered to be a potential transaction. The second Build is (correctly) matched with the earlier Teardown. The first Build is then matched to the later Teardown - correctly, based on what we told splunk to do, but incorrectly by the meaning of the data, since it makes no sense for two transactions to be open on the same ip and port at the same time. Neither keeporphans nor keepevicted do us any good here, since splunk thinks than none of these events are orphans or evicted.

If instead I remove the endswith clause, the two Teardown messages will be paired as a single potential transaction. Since maxevents is 2, and we haven't yet matched our startswith clause, it's an invalid transaction; both events are thrown out. Now both of my Build events become separate 1 event transactions, which is almost right... but we've lost the Teardown information for the non-errored transaction, and we don't know when the errored Teardown occurred.

If I also remove the maxevents clause then the two Teardowns and the later Build will be paired as one transaction, and the lone unmatched Build will be it's own transaction. Now I can look for "orphans" by my definition of what an orphan should be:

  • Any transaction with closed_txn=1 and linecount=1 is a Build at time time with no matching Teardown.
  • Any transaction with closed_txn=0 and linecount=1 is a Teardown at time time that hit maxspan without being paired with a Build.
  • Any transaction with linecount>2 is one or more unmatched Teardowns, with the latest of those at max(time).
  • Any transaction with closed_txn=1 and linecount>1 is (also) a complete valid transaction.

All of which is only built up from experimental evidence, so - on the odd chance that anyone reads this far at all - if anything I have said is incorrect, I would appreciate correction. Hopefully this will help someone someday, but at the very least it writing it has helped me to work it out...

View solution in original post

robmoser
Explorer

To answer my own question... sort of.

The strict definition of keepevicted from the transaction man page is true but a little incomplete. Essentially, the transaction command seems to be building up potential transactions in reverse time order. If it encounters something that invalidates that potential transaction (e.g. hits a maxevents limit without matching the startswith clause) then it throws out the potential transaction and all events previously included in it. Even if those events could have been part of a different, valid transaction. You need to filter your data and/or build your transaction such that an event could never be incorrectly paired (which may sound obvious, but is sometimes easier said than done.) Likewise, since we work backwards in time, anything that matches the endswith clause seems to start another potential transaction to be evaluated, which seems to prevent two events which would otherwise match from being paired together.

Since I've no doubt made that incomprehensible, let me try an example. I have some transaction data about NAT bridges between internal and external ip addresses. Each bridge should start with a Build event and end with a Teardown, and a given ip and port can't be part of more than one NAT at a time. But I don't trust my data so I want to look for Builds without Teardowns and Teardowns without Builds. Here's a sample, including one unmatched Build, and one unmatched Teardown:

2015-06-10T00:11:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-06-10T00:14:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-06-10T00:15:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00
2015-06-10T00:16:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00

If I build my search like this:

index=rob_sandbox  sourcetype="cisco:asa" | transaction src_ip src_port dest_ip dest_port startswith="message_id=305011" endswith="message_id=305012"  keeporphans=1 unifyends=1 maxevents=2 maxspan=1d maxopentxn=10000

Then both Teardown events will match the endswith clause, so instead of being in any way linked, each will be considered to be a potential transaction. The second Build is (correctly) matched with the earlier Teardown. The first Build is then matched to the later Teardown - correctly, based on what we told splunk to do, but incorrectly by the meaning of the data, since it makes no sense for two transactions to be open on the same ip and port at the same time. Neither keeporphans nor keepevicted do us any good here, since splunk thinks than none of these events are orphans or evicted.

If instead I remove the endswith clause, the two Teardown messages will be paired as a single potential transaction. Since maxevents is 2, and we haven't yet matched our startswith clause, it's an invalid transaction; both events are thrown out. Now both of my Build events become separate 1 event transactions, which is almost right... but we've lost the Teardown information for the non-errored transaction, and we don't know when the errored Teardown occurred.

If I also remove the maxevents clause then the two Teardowns and the later Build will be paired as one transaction, and the lone unmatched Build will be it's own transaction. Now I can look for "orphans" by my definition of what an orphan should be:

  • Any transaction with closed_txn=1 and linecount=1 is a Build at time time with no matching Teardown.
  • Any transaction with closed_txn=0 and linecount=1 is a Teardown at time time that hit maxspan without being paired with a Build.
  • Any transaction with linecount>2 is one or more unmatched Teardowns, with the latest of those at max(time).
  • Any transaction with closed_txn=1 and linecount>1 is (also) a complete valid transaction.

All of which is only built up from experimental evidence, so - on the odd chance that anyone reads this far at all - if anything I have said is incorrect, I would appreciate correction. Hopefully this will help someone someday, but at the very least it writing it has helped me to work it out...

EisenCL
New Member

I just want to let you know that your answer has been very useful and helped me a lot. The splunk documentation is incomplete for almost every command. So thank you very much for taking the time to answer with such detail.

0 Karma

gjanders
SplunkTrust
SplunkTrust

If you are finding incomplete documentation, then please us the feedback button on the documentation pages.
You can send in relevant comments or links to splunkanswers and that will result in the documentation been updated in most cases.

Doing this contributes to the community as many people rely on the documentation!

0 Karma

EisenCL
New Member

You are right, I'll do that next time I step into something that can be improved in the documentation. Thank you.

0 Karma

robmoser
Explorer

Some additional investigation yields curious results.

I built a block of test data with deliberate holes in it:

2015-05-30T00:11:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-05-30T00:12:00-07:00 internet : %ASA-5-305011: Built dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1
2015-05-30T00:13:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00
2015-05-30T00:16:00-07:00 internet : %ASA-5-305012: Teardown dynamic TCP translation from inside:10.1.1.1/1 to outside:134.114.1.1/1 duration 0:01:00

A build record with no matching teardown, followed by a build and a teardown that match, followed by a gap where a build should be, followed by a teardown with no matching build. The ips on the initial build and final teardown do match (which may or may not happen in live data,) but I had expected that to not matter because the maxevents=2 parameter. What I didn't realise is that the transaction matching algorithm is perfectly happy to have multiple transactions with the same set of "keys" open at once. So in my test data, I get a transaction from 00:12 - 00:13, and another that spans that one, from 00:11 - 00:16. Clearly I'm going to have to change the way I group events into transactions in the first place.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...