Solved: How to search for field values that appears in ano...

chensy · ‎02-22-2017

Basically what I'm trying to ask is, for instance if my data is like this

index, field1, field2
1, apple, boy
2, apple, girl
3, boy, apple
4, boy, girl
5, girl, apple

How do i format my query to give me a results that says:

1 matches with 3
2 matches with 5
3 matches with 1 (optional but good to have)
5 matches with 2 (optional but good to have)

Thanks for your help.

DalJeanis · ‎02-26-2017

This part makes your test data

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

This part builds a synthetic key to match by, finds all matches, then reports on them.

| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage

Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -

 | makeresults 
 | eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
 | makemv mydata 
 | mvexpand mydata 
 | makemv delim="," mydata 
 | eval theindex=mvindex(mydata,0)
 | eval field1=mvindex(mydata,1) 
 | eval field2=mvindex(mydata,2)
 | table theindex field1 field2


 | eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
 | stats values(theindex) as theindex by concatvalues
 | where mvcount(theindex)>1
 | eval saveindex = theindex
 | mvexpand theindex
 | mvexpand saveindex
 | where theindex!=saveindex
 | stats values(saveindex) as saveindex by theindex
 | eval saveindex=mvjoin(saveindex,", ")
 | sort 0 theindex
 | eval mymessage=theindex." matches with ".saveindex
 | table mymessage

produces

1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6

View solution in original post

DalJeanis · ‎02-26-2017

This part makes your test data

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

This part builds a synthetic key to match by, finds all matches, then reports on them.

| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage

Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -

 | makeresults 
 | eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
 | makemv mydata 
 | mvexpand mydata 
 | makemv delim="," mydata 
 | eval theindex=mvindex(mydata,0)
 | eval field1=mvindex(mydata,1) 
 | eval field2=mvindex(mydata,2)
 | table theindex field1 field2


 | eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
 | stats values(theindex) as theindex by concatvalues
 | where mvcount(theindex)>1
 | eval saveindex = theindex
 | mvexpand theindex
 | mvexpand saveindex
 | where theindex!=saveindex
 | stats values(saveindex) as saveindex by theindex
 | eval saveindex=mvjoin(saveindex,", ")
 | sort 0 theindex
 | eval mymessage=theindex." matches with ".saveindex
 | table mymessage

produces

1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6

chensy · ‎02-26-2017

@DalJeanis thank you for your help! works great.

1 question though, how can I then differentiate between duplicated entries from matched entries?

e.g.

1 matches with 3, 6 
1 is duplicate of 7 
2 matches with 5 
3 matches with 1, 7
3 is duplicate of 6 
5 matches with 2 
6 matches with 1, 7
6 is duplicate of 3
7 matches with 3, 6 
7 is duplicate of 1

appreciate your expertise in this area

DalJeanis · ‎02-27-2017

It's a little more complicated, probably could be tightened up but it works.

| makeresults 
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple 6,boy,apple"
| makemv mydata 
| mvexpand mydata 
| makemv delim="," mydata 
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1) 
| eval field2=mvindex(mydata,2)
| table theindex field1 field2

| eval matchsame=field2."!!!!".field1
| eval matchall=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| eventstats count as matchallcount by matchall
| where matchallcount>1
| stats list(theindex) as theindex list(matchsame) as matchsame by matchall
| eval theindex = mvzip(theindex,matchsame)
| table theindex
| eval saveindex=theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| makemv delim="," theindex
| eval matchsame=mvindex(theindex,1)
| eval theindex=mvindex(theindex,0)
| makemv delim="," saveindex
| eval matchother=mvindex(saveindex,1)
| eval matchindex=mvindex(saveindex,0)
| table theindex matchsame matchindex matchother
| stats values(matchindex) as matchindex by theindex matchsame matchother
| sort 0 theindex
| eval matchstatement=if(matchsame=matchother," is a duplicate of "," matches with ") 
| eval matchindex=mvjoin(matchindex,", ")
| eval mymessage=theindex.matchstatement.matchindex
| table mymessage

producing the following results -

1 matches with 3, 6
2 matches with 5
3 is a duplicate of 6
3 matches with 1
5 matches with 2
6 is a duplicate of 3
6 matches with 1

By the way, there's nothing splunky or magical about those four exclamation points ("!!!!"). I use those as a delimiter simply because for my particular installation, that particular combination of characters is highly unlikely to happen in my data. If your organization deals with tweets or teens or excitable people, then substitute something else - six percent signs, or whatever.

niketn · ‎02-24-2017

You can apply bubble sort before adding the two fields. Using eval if perform a string comparison of field1 and field2 and whichever is smaller should be first field and bigger should be second field.

<Your Base Search returning id field1 and field2>
| eval sortedFields=if(field1>field2,field2+field1,field1+field2)
| stats count as match values(id) as id values(field1) as field1 values(field2) as field2 by sortedFields

All the rows with count greater than 1 are matches and id column has the IDs that matched.

| search match>1

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

niketn · ‎02-25-2017

@chensy... Please confirm if you were able to try out above and whether it solves the issue faced by you or not.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

chensy · ‎02-26-2017

hello @niketnilay, thanks for your help. Please see @DalJeanis solution from below

How to search for field values that appears in another field and vice versa?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life