I am trying to determine the right SPL to dig through a financial data set and look for duplicate entries. The data generally is unique but occasionally a vendor may submit a duplicate request resulting in bad things.
Test data:
id=11111,vendor=blah,name=tacoco,value=201,date="1/1/18"
id=11112,vendor=abc,name=jump,value=321,date="2/1/18"
id=11113,vendor=sneeze,name=china,value=421,date="3/1/18"
id=11114,vendor=alpha,name=pooch,value=521,date="4/1/18"
id=11115,vendor=splunk,name=tacos,value=221,date="5/1/18"
id=11116,vendor=internet,name=golf,value=621,date="6/1/18"
id=11117,vendor=office,name=mexico,value=721,date="7/1/18"
id=11118,vendor=splunk,name=tacos,value=221,date="5/1/18"
id=11119,vendor=random,name=burger,value=821,date="8/1/18"
id=11120,vendor=opera,name=browser,value=921,date="9/1/18"
I would like to create a search that identifies any time where vendor, name, value, and date all have the same values but id is different. (vendor=splunk rows for example above) There are other fields in the event data but this would be what I'm looking for specifically.
Greetings @uhaba, try this run-anywhere search:
| makeresults
| eval id = "11111" ,
vendor = "blah" ,
name = "tacoco",
value = "201" ,
date = "1/1/18"
| append
[ | makeresults
| eval id = "11115" ,
vendor = "splunk" ,
name = "tacos",
value = "221" ,
date = "5/1/18" ]
| append
[ | makeresults
| eval id = "11118" ,
vendor = "splunk" ,
name = "tacos",
value = "221" ,
date = "5/1/18" ]
| stats count values(id) as ids by vendor name value date
| where count > 1
Output:
vendor name value date count ids
splunk tacos 221 5/1/18 2 11115
11118