Splunk Search

How can I exclude a list of word in a table or file from a search that shows a table result?

rasamur
Engager

Good morning Splunk Community

I'm currently working on a way on how to use splunk so that it can show the most popular words used in a series of Emails listed in a CSV file. The file has three main columns subject, description, and topic. What I'm doing is clasifying each Email with a Topic i see fitted for the content it has in the subject and description. Now that I have several "Manual" classifications I want to use splunk so that it can let me know the set of words with more popularity by Topic, excluding Pronouns, and prepositions, or any word I see is not an important word for that topic.

I was able to find this content https://answers.splunk.com/answers/62413/how-to-extract-most-popular-words-from-the-source-data.html... in the community, but it reaches as far as listing the words and counting them, but the problem still recides on counting words like "the, work, call" that I do not need, so I started to do it manually basically right clicking and selecting "Exclude from search". This basically resolves in some part, but we are talking about 9000 words, it will take for ever. I then did another approach and use the "*" so that a word and a set of wordlike words are also exluded, but it is not going down as I would imagine.

My idea to resolve this, is:

  1. Use the Search Filter in the content posted in this article.
  2. Create a Table, List, load an additional CSV file with this words I dont want (What ever is best) in Splunk.
  3. Do a type of operation ((SearchFilter) - (Table OR List OR File) = (Result)) ( (A, B, C) - (C) = (A, B))

My question is, How can I create a table, file or use a loaded CSV file to remove the words I dont want from the result it shows the filter in the article?

Search query in article:

source=mybook | sort -_time | rex mode=sed "s/(.|,|;|=|\"|'|(|)|[|]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word

Best regards and I hope there is an answer to this question.

Thank you

Tags (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi rasamur,
if I correctly understood you have a csv file with a list of words to and you want to exclude from your results all the events where there's at least one of the above words, is it correct?

to do this, create a lookup (e.g. my_lookup.csv) with all your words (e.g. in a field called "word") and then run a search like this:

your_search NOT [ | inputlookup my_lookup.csv | rename word AS query | fields query ]
| ...

in this way all the events where there's one of your words are excluded from your results.

Bye.
Giuseppe

View solution in original post

woodcock
Esteemed Legend

This is by no means a complete answer but it should really help. You need to do a google search for "stemming", "lemming", and "sentiment" with "splunk". I did find this app which should contain a framework that gives you a huge leap towards your goal:

"Sentiment Analysis" https://splunkbase.splunk.com/app/1179/

0 Karma

rasamur
Engager

Thank you woodcock, I have installed it and right now I'm looking into how to use it and add my data.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi rasamur,
if I correctly understood you have a csv file with a list of words to and you want to exclude from your results all the events where there's at least one of the above words, is it correct?

to do this, create a lookup (e.g. my_lookup.csv) with all your words (e.g. in a field called "word") and then run a search like this:

your_search NOT [ | inputlookup my_lookup.csv | rename word AS query | fields query ]
| ...

in this way all the events where there's one of your words are excluded from your results.

Bye.
Giuseppe

rasamur
Engager

Hi Giuseppe, thank you for the input, although one question, The .CSV file from where I am reading the fords, needs to have the words organized in an specific order, or can I just put the words in the first sheet in any cell and that would be all?

0 Karma

rasamur
Engager

Ok I got it resolved doing the following

Mysearch | search (NOT [| inputlookup X.csv | rename test_words AS words | fields words ])

Thank you

0 Karma

rasamur
Engager

I've tried to add this part in the filter I added but still not able to make it work. Seems that when adding NOT to the last part after the stats command it doesn't work. How can I add it?

This is the search you refer as "your_search"

source=mybook | sort -_time | rex mode=sed "s/(.|,|;|=|\"|'|(|)|[|]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...