Splunk Search

Correlating a start and stop event with transaction

ericdp
Explorer

I'm trying to correlate start and stop events and having a much harder time than what the documentation implies in order to make sure that the correlations are accurate. Say I have 4 events, as follows:

[service timestamp]    |[event]|[item]|[consoleID]|[logonID]|[clientTimestamp]
2010/06/18 22:17:08.041|LS-XUBF|foo   |0635A84D13 |2        |1CB0F33E1376C4C
2010/06/18 22:17:08.041|LS-XUCF|foo   |0635A84D13 |2        |1CB0F33E1DEDC8F
2010/06/15 19:54:24.178|LS-XUBF|foo   |0635A84D13 |2        |1CB0CC475A3ECA0
2010/06/15 19:54:24.178|LS-XUCF|foo   |0635A84D13 |2        |1CB0CC475F2180D

LS-XUBF is the start event, and LS-XUCF is the end event. My goal is to use transaction to group them as follows:

2010/06/15 19:54:24.178|LS-XUBF|foo|0635A84D13|2|1CB0CC475A3ECA0
2010/06/15 19:54:24.178|LS-XUCF|foo|0635A84D13|2|1CB0CC475F2180D

2010/06/18 22:17:08.041|LS-XUBF|foo|0635A84D13|2|1CB0F33E1376C4C
2010/06/18 22:17:08.041|LS-XUCF|foo|0635A84D13|2|1CB0F33E1DEDC8F

I've tried several approaches, with limited success. I think it has something to do with the order that the logs appear in the file, rather than the timestamps on the events.

"transaction ConsoleId LogonId Item startswith=LS-XUBF endswith=LS-XUCF" sounds like exactly what I want, but it appears to not let the start event and end event have the same timestamp. As a result, I get one transaction instead of two, with the start from the first collection and the end from the second collection. Adding "maxevents=2" to this transaction yields the same results. It looks like this:

2010/06/15 19:54:24.178|LS-XUBF|foo|0635A84D13|2|1CB0CC475A3ECA0
2010/06/18 22:17:08.041|LS-XUCF|foo|0635A84D13|2|1CB0F33E1DEDC8F

"transaction ConsoleId LogonId Item maxevents=2" groups them exactly how I want, but is error prone. It's not guaranteed that we'll see an LS-XUCF event after an LS-XUBF event. The process could start, but never finish and never send the event (for example, if the machine doing the logging loses all power). This is unlikely, but still possible, and I don't want my data invalidated.

"transaction ConsoleId LogonId Item maxevents=2 startswith=LS-XUBF" gets me a little closer - I thought this was a perfect way to solve the above inconsistency problem, but instead, I get this, which isn't really any better:

2010/06/18 22:17:08.041|LS-XUBF|foo|0635A84D13|2|1CB0F33E1376C4C

2010/06/15 19:54:24.178|LS-XUBF|foo|0635A84D13|2|1CB0CC475A3ECA0
2010/06/18 22:17:08.041|LS-XUCF|foo|0635A84D13|2|1CB0F33E1DEDC8F

"transaction ConsoleId LogonId Item maxevents=2 endswith=LS-XUCF" looks similar to the above.

2010/06/15 19:54:24.178|LS-XUCF|foo|0635A84D13|2|1CB0CC475F2180D

2010/06/15 19:54:24.178|LS-XUBF|foo|0635A84D13|2|1CB0CC475A3ECA0
2010/06/18 22:17:08.041|LS-XUCF|foo|0635A84D13|2|1CB0F33E1DEDC8F

"transaction ConsoleId LogonId Item maxevents=2 endswith=LS-XUBF" gives me the right results for some reason, but I can't figure out why and this isn't technically want I want.

Tags (1)

Lowell
Super Champion

I'm wondering if your issue is due to your start/end event occurring on the same second, which could mean that the transaction command is seeing these events out of order which could be breaking your startwith/endswith logic, this is just a guess based on some things I've seen before...

The following is an ugly and inefficient hack, but it should prove/disprove my theory; and if correct, perhaps this will lead to a better solution.

your search | eval _time=if(searchmatch("LS-XUCF"), _time+1, _time) | sort _time | transaction ConsoleId LogonId Item startswith="LS-XUBF" endswith="LS-XUCF" | eval duration=duration-1

The basic idea is to bump your end event to occur after your start event (by shifting the end event by 1 second) and then resorting your events. We then subtract 1 from the duration to restore the proper duration of the transaction (if you care about that field).


UPDATE:

Whoops. Let's try this again. I copied your sample event logs to a file on my system to reproduce your transaction search. And I'm now convinced that my initial suspicion is correct, but my first example search was incorrect. (Got some of the logic wrong.)

| file /tmp/txn_test | rex "[^|]+\|(?<Event>[^|]+)\|(?<Item>[^| ]+)\s*\|(?<ConsoleID>[^| ]+)\s*\|(?<LogonID>[^| ]+)\s*\|" | eval _time=if(searchmatch("LS-XUBF"), _time+1, _time) | sort _time | transaction ConsoleId LogonId Item startswith="LS-XUBF" endswith="LS-XUCF"

A perhaps better solution would be to not mess with _time at all, but add a temporary field for sorting purposes, and then dropping it again. Perhaps something like this:

... | eval start_event=if(searchmatch("LS-XUBF"),1,0) | sort _time,start_event | fields - start_event | transaction ConsoleId LogonId Item startswith="LS-XUBF" endswith="LS-XUCF"

It's arguable which approach is "better" (it all comes down to whether or not you think screwing with _time is a bad idea or not) but I'll let that up to you.


BTW. I could also prove this theory using the on-the-fly file technique by simply moving the start/end events up and down. Which is essentially what we are doing with the sort. Remember that splunk returns all events in reverse time order (so latest events are first.) Therefore in your original sample listing the end events actually occur before the start events.

For the sake of clarity, when I used this as the contents of my /tmp/txn_test file:

2010/06/18 22:17:08.041|LS-XUCF|foo   |0635A84D13 |2        |1CB0F33E1DEDC8F
2010/06/18 22:17:08.041|LS-XUBF|foo   |0635A84D13 |2        |1CB0F33E1376C4C
2010/06/15 19:54:24.178|LS-XUCF|foo   |0635A84D13 |2        |1CB0CC475F2180D
2010/06/15 19:54:24.178|LS-XUBF|foo   |0635A84D13 |2        |1CB0CC475A3ECA0

And used this search: (and I got the correct results)

| file /tmp/txn_test | rex "[^|]+\|(?<Event>[^|]+)\|(?<Item>[^| ]+)\s*\|(?<ConsoleID>[^| ]+)\s*\|(?<LogonID>[^| ]+)\s*\|" | transaction ConsoleId LogonId Item startswith="LS-XUBF" endswith="LS-XUCF"

I am curious as to why your events are out of order in the first place. I thought that splunk preserved the proper order of your events. (I'm not certain of this.) I'm assuming that in your actual log file that the LS-XUBF line occurs before the LS-XUCF line. Are you using a load balancer / distributed search?

Lowell
Super Champion

You should see the latest events at the top (splunk stores events in reverse time order, and transaction assumes this.) Therefore in the web view, you must see the LS-XUCF (end) events appearing above the LS-XUBF (start) events for your transaction to work. Here is what gets even more confusing (at least at first): When looking at the output of the transaction command, each individual transaction will show the individual events in normal time order, so you will see the first event first, and the later events at the end; but the transactions themselves are in reverse time order.

0 Karma

ericdp
Explorer

I've tried simply using "| sort clientTimestamp" which appears to sort it properly, but I still get the same issue. If I use the sort command, should they show up in effectively reverse order, or in the order I expect them? i.e. should the event that happened chronologically first be at the top or bottom of the results list?

0 Karma

Lowell
Super Champion

Whoops. There was a typo in my first suggestion. See the update below. I ran the test on my system, so it should work for you.

0 Karma

ericdp
Explorer

No dice. I don't care about the duration field, FYI.

2010/06/18 22:17:08.041|LS-XUBF|foo|0635A84D13|2|1CB0F33E1376C4C
2010/06/15 19:54:24.178|LS-XUCF|foo|0635A84D13|2|1CB0CC475F2180
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...