Splunk Search

Join/Transaction with field comparison operators

richard_whiffen
Explorer

I have some data sources in splunk that are XML formated. The initial request:

<query id=12345-54321>
<Request_1 initialQueryTime="1294214640000"><SearchParms startNum="1" numResults="14">
<TextQuery type="Title">Children's Hour</TextQuery>
</Request_1>
</query>

The Response:

<query id=c12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>0</NumberOfMatches>
<SearchResults>
</SearchResults>
</Response_1>
</query>

or:

<tvsquery id=12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>2</NumberOfMatches>
<SearchResults>
<VODShow closedCaptioning="Y" dolby="Y" hd="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000006823480</AssetID>
</VODShow>
<VODShow closedCaptioning="Y" dolby="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000005278648</AssetID
</VODShow>
</SearchResults>
</Response_1>
</query>

So I have a field id that should be a transaction. If I do a simple search:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" | transaction id

The transactions marry up correctly by id. If I try to look for transactions with specific attributes, I don't get the results I"m expecting. I've tried several variations of the following (with both the transaction command and join command:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" NumberOfMatches < 0 | transaction id [search host="srchhost*" src="/path/to/tomcat/log/qlogs/*" "Request_1" | top TextQuery ]

I think I'm not looking at the problem the right way and am hoping someone out there can put me on the right path. I have a field id that is common to the search and the result set. I want to know what TextQuery people used that returned zero NumberOfMatches. So I ask for a result set of id values for zero matches. I then want to return the TextQuery field for every one of those IDs.

The individual queries work. I can get a list of the zero result IDs. I can get the TextQuery values for a given ID. I just can't do it in a single search pipeline.

Cheers, Rich

Tags (2)
0 Karma
1 Solution

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

View solution in original post

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...