Splunk Dev

Remove common words from two fields and keep unique values

samvijay
New Member

Here is an interesting problem, I tried different approaches using regex, mvdbedup, coalesce etc.. it did not work. need guidance from experts.

I have two fields field1 and field2 from a same event, field1 has value of "I want to buy a book" field2 has value of "I want to buy a phone"

As you can see, the content of both the fields are same except the words book and phone. I want the result like below

field1     field2
book      phone

I am simplifying the problem, but in reality each fields can contain a paragraph, but there will be few words which are unique in each field, which I want to extract.

Tags (2)
0 Karma
1 Solution

knielsen
Contributor

Now that's a fun challenge! 🙂

I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....

Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:

| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2

If you got mixed order of words maybe, then this is an approach:

| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2

but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉

View solution in original post

0 Karma

niketn
Legend

@samvijay, this is answered already. I am just throwing in another option:

| makeresults 
| eval field1="I want to buy a book by today" 
| eval field2="I want to buy a phone by tomorrow"
| eval arrField1=split(field1," ")
| eval arrField2=split(field2," ")
| eval combined=mvzip(arrField1, arrField2)
| table combined
| mvexpand combined
| eval field1=replace(combined,"([^,]+),(.+)","\1")
| eval field2=replace(combined,"([^,]+),(.+)","\2")
| table field1 field2
| where field1!=field2
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

samvijay
New Member

Thanks @niketnilay

0 Karma

knielsen
Contributor

Now that's a fun challenge! 🙂

I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....

Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:

| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2

If you got mixed order of words maybe, then this is an approach:

| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2

but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉

0 Karma

samvijay
New Member

Thanks @knielsen, the string are guaranteed to have the same order, however there can be many mismatches like below, I need to get them in the same row separated by comma

| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow" 
 | makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2 field3


field1                                   field2
book, now                          phone, tomorrow
0 Karma

samvijay
New Member

Thanks @knielsen, that was really helpful

0 Karma

knielsen
Contributor

I'll feel probably stupid when someone posts a pretty solution, but you get the correct result with this:

| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow" 
  | makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?[^,]+),(?.+)" | where NOT field1=field2 | stats list(field1) as field1 list(field2) as field2 | eval field1=mvjoin(field1, ",") | eval field2=mvjoin(field2, ",")

🙂

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...