Basically what I'm trying to ask is, for instance if my data is like this
index, field1, field2
1, apple, boy
2, apple, girl
3, boy, apple
4, boy, girl
5, girl, apple
How do i format my query to give me a results that says:
1 matches with 3
2 matches with 5
3 matches with 1 (optional but good to have)
5 matches with 2 (optional but good to have)
Thanks for your help.
This part makes your test data
| makeresults
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata
| mvexpand mydata
| makemv delim="," mydata
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1)
| eval field2=mvindex(mydata,2)
| table theindex field1 field2
This part builds a synthetic key to match by, finds all matches, then reports on them.
| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage
Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -
| makeresults
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
| makemv mydata
| mvexpand mydata
| makemv delim="," mydata
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1)
| eval field2=mvindex(mydata,2)
| table theindex field1 field2
| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| stats values(saveindex) as saveindex by theindex
| eval saveindex=mvjoin(saveindex,", ")
| sort 0 theindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage
produces
1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6
This part makes your test data
| makeresults
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple"
| makemv mydata
| mvexpand mydata
| makemv delim="," mydata
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1)
| eval field2=mvindex(mydata,2)
| table theindex field1 field2
This part builds a synthetic key to match by, finds all matches, then reports on them.
| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| sort 0 theindex saveindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage
Note- the above will also work appropriately with multiples (more than two matches). For instance, a match-3 will result in 6 lines, a match-4 results in 12 lines, etc. To collapse those multiple lines into one line per index would require a little additional code as below -
| makeresults
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girls,apple 6,boy,apple 7,apple,boy"
| makemv mydata
| mvexpand mydata
| makemv delim="," mydata
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1)
| eval field2=mvindex(mydata,2)
| table theindex field1 field2
| eval concatvalues=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| stats values(theindex) as theindex by concatvalues
| where mvcount(theindex)>1
| eval saveindex = theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| stats values(saveindex) as saveindex by theindex
| eval saveindex=mvjoin(saveindex,", ")
| sort 0 theindex
| eval mymessage=theindex." matches with ".saveindex
| table mymessage
produces
1 matches with 3, 6, 7
2 matches with 5
3 matches with 1, 6, 7
5 matches with 2
6 matches with 1, 3, 7
7 matches with 1, 3, 6
@DalJeanis thank you for your help! works great.
1 question though, how can I then differentiate between duplicated entries from matched entries?
e.g.
1 matches with 3, 6
1 is duplicate of 7
2 matches with 5
3 matches with 1, 7
3 is duplicate of 6
5 matches with 2
6 matches with 1, 7
6 is duplicate of 3
7 matches with 3, 6
7 is duplicate of 1
appreciate your expertise in this area
It's a little more complicated, probably could be tightened up but it works.
| makeresults
| eval mydata="1,apple,boy 2,apple,girl 3,boy,apple 4,boy,girl 5,girl,apple 6,boy,apple"
| makemv mydata
| mvexpand mydata
| makemv delim="," mydata
| eval theindex=mvindex(mydata,0)
| eval field1=mvindex(mydata,1)
| eval field2=mvindex(mydata,2)
| table theindex field1 field2
| eval matchsame=field2."!!!!".field1
| eval matchall=if(field1<field2,field1."!!!!".field2,field2."!!!!".field1)
| eventstats count as matchallcount by matchall
| where matchallcount>1
| stats list(theindex) as theindex list(matchsame) as matchsame by matchall
| eval theindex = mvzip(theindex,matchsame)
| table theindex
| eval saveindex=theindex
| mvexpand theindex
| mvexpand saveindex
| where theindex!=saveindex
| makemv delim="," theindex
| eval matchsame=mvindex(theindex,1)
| eval theindex=mvindex(theindex,0)
| makemv delim="," saveindex
| eval matchother=mvindex(saveindex,1)
| eval matchindex=mvindex(saveindex,0)
| table theindex matchsame matchindex matchother
| stats values(matchindex) as matchindex by theindex matchsame matchother
| sort 0 theindex
| eval matchstatement=if(matchsame=matchother," is a duplicate of "," matches with ")
| eval matchindex=mvjoin(matchindex,", ")
| eval mymessage=theindex.matchstatement.matchindex
| table mymessage
producing the following results -
1 matches with 3, 6
2 matches with 5
3 is a duplicate of 6
3 matches with 1
5 matches with 2
6 is a duplicate of 3
6 matches with 1
By the way, there's nothing splunky or magical about those four exclamation points ("!!!!"). I use those as a delimiter simply because for my particular installation, that particular combination of characters is highly unlikely to happen in my data. If your organization deals with tweets or teens or excitable people, then substitute something else - six percent signs, or whatever.
You can apply bubble sort before adding the two fields. Using eval if perform a string comparison of field1 and field2 and whichever is smaller should be first field and bigger should be second field.
<Your Base Search returning id field1 and field2>
| eval sortedFields=if(field1>field2,field2+field1,field1+field2)
| stats count as match values(id) as id values(field1) as field1 values(field2) as field2 by sortedFields
All the rows with count greater than 1 are matches and id column has the IDs that matched.
| search match>1
@chensy... Please confirm if you were able to try out above and whether it solves the issue faced by you or not.
hello @niketnilay, thanks for your help. Please see @DalJeanis solution from below