Splunk Search

Impossible not to use a join: Prove me wrong

mrstrozy
Path Finder

Hi,

I have a situation in which I cannot think of any other way to do it besides using a join. This is less than ideal due to the known restrictions and performance issues when using a join, so I thought I'd reach out and see if anyone else had a different take on it.

Currently we receive "events" that happen. Each event has an ID and a code associated with it. The trick is that there can be multiple events that correspond to a single ID, but each have different codes. It could look something like this:

Event 1: id=1 code=1
Event 2: id=1 code=5
Event 3: id=1 code=10
Event 4: id=2 code=1
Event 5: id=2 code=3
...

My goal is to limit the data set to only contain events in which the id of the event is associated to specific codes, and if it not then we discard them.

In the case above it might be something like — I only want events whose id contains code "5". This would give me only events 1, 2, and 3 since the id=1 contains the code "5".

My current way of doing this is by using the stats values command to get all codes per id, then using the where command like "where in(code, "5")" to get only id's with the code "5" in them and finally tabling the ID and joining all the other event information back onto the ids.

The reasoning for doing it this way is that I need to know the timestamps associated with each code so that I can "tell a story" of which codes came in and when, while limiting the id's I look at based on which codes are associated with them.

If anyone could help out that would be awesome.

Thanks!

0 Karma
1 Solution

FrankVl
Ultra Champion

Wouldn't this do the trick:

| eventstats values(code) as idCodes by id
| search idCodes=5

Run anywhere example:

| makeresults | eval id=1 | eval code=1
|append [ | makeresults | eval id=1 | eval code=5 ]
|append [ | makeresults | eval id=1 | eval code=10 ]
|append [ | makeresults | eval id=2 | eval code=1 ]
|append [ | makeresults | eval id=2 | eval code=3 ]
 | eventstats values(code) as idCodes by id
 | search idCodes=5

View solution in original post

0 Karma

macadminrohit
Contributor
| makeresults 
| eval Event1="1;1" 
| makemv Event1 delim=";" 
| eval Code=mvindex(Event1,1) 
| eval ID=mvindex(Event1,0) 
| append 
    [| makeresults 
    | eval Event2="1;5" 
    | makemv Event2 delim=";" 
    | eval Code=mvindex(Event2,1) 
    | eval ID=mvindex(Event2,0) ] 
| append 
    [| makeresults 
    | eval Event3="1;10" 
    | makemv Event3 delim=";" 
    | eval Code=mvindex(Event3,1) 
    | eval ID=mvindex(Event3,0) ] 
| append 
    [| makeresults 
    | eval Event4="2;1" 
    | makemv Event4 delim=";" 
    | eval Code=mvindex(Event4,1) 
    | eval ID=mvindex(Event4,0) ] 
| append 
    [| makeresults 
    | eval Event5="2;3" 
    | makemv Event5 delim=";" 
    | eval Code=mvindex(Event5,1) 
    | eval ID=mvindex(Event5,0) ] 
| table _time ID Code | eventstats values(Code) as Code_EXP by ID | mvexpand Code_EXP | search Code_EXP=5
0 Karma

DalJeanis
Legend

There are multiple ways to do that without a join. Here's the first two that came to mind.


First method - subquery

index=foo your search that gets all events here 
    [index=foo your search that gets only the code 5 events here | table id]
| ... the rest of your search ...

This method does run two queries. The one inside the square braces is run, producing a list of id values, then is formatted as...

( ( id="first value") OR ( id="second value") OR ... )

and dropped into the outer search. For more info on how this works, see the implicit format command.


Second method - eventstats

index=foo your search that gets all events here 
| eventstats max(eval(case(code="5","keepme"))) as keepme by id
| where isnotnull(keepme)
| the rest of your search

This method only does a single search. Once you have all the records together, the eventstats command calculates whether there are any code 5s in the results for each id. Then you drop the records that don't, and process the survivors however you want.

Depending on the number of results you are expecting, either one of these could be more or less efficient than the other. If code 5s are rare, then the first will be more performant. (This is especially true if the field value you are looking for is a rare token like zarfnix="beeblebrox", rather than a common one like code=5.)

0 Karma

FrankVl
Ultra Champion

Wouldn't this do the trick:

| eventstats values(code) as idCodes by id
| search idCodes=5

Run anywhere example:

| makeresults | eval id=1 | eval code=1
|append [ | makeresults | eval id=1 | eval code=5 ]
|append [ | makeresults | eval id=1 | eval code=10 ]
|append [ | makeresults | eval id=2 | eval code=1 ]
|append [ | makeresults | eval id=2 | eval code=3 ]
 | eventstats values(code) as idCodes by id
 | search idCodes=5
0 Karma

mrstrozy
Path Finder

I was laughing because almost immediately after I posted this I realized that I proved myself wrong when I remembered the "eventstats" function... I guess that's what happens when you stare at a screen for waaaay too long. Either way, thank you for your help! Much appreciated

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...