Splunk Search

How to search for field values that appears in another field and vice versa?

chensy
Engager

Basically what I'm trying to ask is, for instance if my data is like this

index, field1, field2
1, apple, boy
2, apple, girl
3, boy, apple
4, boy, girl
5, girl, apple

How do i format my query to give me a results that says:

1 matches with 3
2 matches with 5
3 matches with 1 (optional but good to have)
5 matches with 2 (optional but good to have)

Thanks for your help.

0 Karma
1 Solution

DalJeanis
Legend

This part makes your test data

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

This part builds a synthetic key to match by, finds all matches, then reports on them.

| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage

Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -

 | makeresults 
 | eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
 | makemv mydata 
 | mvexpand mydata 
 | makemv delim="," mydata 
 | eval theindex=mvindex(mydata,0)
 | eval field1=mvindex(mydata,1) 
 | eval field2=mvindex(mydata,2)
 | table theindex field1 field2


 | eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
 | stats values(theindex) as theindex by concatvalues
 | where mvcount(theindex)>1
 | eval saveindex = theindex
 | mvexpand theindex
 | mvexpand saveindex
 | where theindex!=saveindex
 | stats values(saveindex) as saveindex by theindex
 | eval saveindex=mvjoin(saveindex,", ")
 | sort 0 theindex
 | eval mymessage=theindex." matches with ".saveindex
 | table mymessage

produces

1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6

View solution in original post

DalJeanis
Legend

This part makes your test data

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

This part builds a synthetic key to match by, finds all matches, then reports on them.

| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage

Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -

 | makeresults 
 | eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
 | makemv mydata 
 | mvexpand mydata 
 | makemv delim="," mydata 
 | eval theindex=mvindex(mydata,0)
 | eval field1=mvindex(mydata,1) 
 | eval field2=mvindex(mydata,2)
 | table theindex field1 field2


 | eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
 | stats values(theindex) as theindex by concatvalues
 | where mvcount(theindex)>1
 | eval saveindex = theindex
 | mvexpand theindex
 | mvexpand saveindex
 | where theindex!=saveindex
 | stats values(saveindex) as saveindex by theindex
 | eval saveindex=mvjoin(saveindex,", ")
 | sort 0 theindex
 | eval mymessage=theindex." matches with ".saveindex
 | table mymessage

produces

1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6

chensy
Engager

@DalJeanis thank you for your help! works great.

1 question though, how can I then differentiate between duplicated entries from matched entries?

e.g.

1 matches with 3, 6 
1 is duplicate of 7 
2 matches with 5 
3 matches with 1, 7
3 is duplicate of 6 
5 matches with 2 
6 matches with 1, 7
6 is duplicate of 3
7 matches with 3, 6 
7 is duplicate of 1

appreciate your expertise in this area

0 Karma

DalJeanis
Legend

It's a little more complicated, probably could be tightened up but it works.

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple 6,boy,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

| eval matchsame=field2."!!!!".field1
| eval matchall=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| eventstats count as matchallcount by matchall
| where matchallcount>1
| stats list(theindex) as theindex list(matchsame) as matchsame by matchall
| eval theindex = mvzip(theindex,matchsame)
| table theindex
| eval saveindex=theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| makemv delim="," theindex
| eval matchsame=mvindex(theindex,1)
| eval theindex=mvindex(theindex,0)
| makemv delim="," saveindex
| eval matchother=mvindex(saveindex,1)
| eval matchindex=mvindex(saveindex,0)
| table theindex matchsame matchindex matchother
| stats values(matchindex) as matchindex by theindex matchsame matchother
| sort 0 theindex
| eval matchstatement=if(matchsame=matchother," is a duplicate of "," matches with ") 
| eval matchindex=mvjoin(matchindex,", ")
| eval mymessage=theindex.matchstatement.matchindex
| table mymessage

producing the following results -

1 matches with 3, 6
2 matches with 5
3 is a duplicate of 6
3 matches with 1
5 matches with 2
6 is a duplicate of 3
6 matches with 1

By the way, there's nothing splunky or magical about those four exclamation points ("!!!!"). I use those as a delimiter simply because for my particular installation, that particular combination of characters is highly unlikely to happen in my data. If your organization deals with tweets or teens or excitable people, then substitute something else - six percent signs, or whatever.

0 Karma

niketn
Legend

You can apply bubble sort before adding the two fields. Using eval if perform a string comparison of field1 and field2 and whichever is smaller should be first field and bigger should be second field.

<Your Base Search returning id field1 and field2>
| eval sortedFields=if(field1>field2,field2+field1,field1+field2)
| stats count as match values(id) as id values(field1) as field1 values(field2) as field2 by sortedFields

All the rows with count greater than 1 are matches and id column has the IDs that matched.

| search match>1
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketn
Legend

@chensy... Please confirm if you were able to try out above and whether it solves the issue faced by you or not.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

chensy
Engager

hello @niketnilay, thanks for your help. Please see @DalJeanis solution from below

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...