Splunk Search

How can I calculate the term frequency for all the words in a field's values?

mhqssyh
Explorer

I am trying to calculate some term frequency on the field. The field is defined as follow.
rex field=_raw "Notes : (?.*)"
And, the field is generated correctly, but it hasn't any format, such as:

Notes :

Notes : Troubleshooting, I am simply reinstalling.
Notes : program would not start. I am reinstalling.
Notes : Made MacBook too slow
Notes : computer to slow when using the program! Need to install it into another!

There are thousands line of information, and I want to know the term frequency of all the words in the field of notes. I'd like to know whether there is a command to do this, or how can I achieve this in splunk.

Any ideas?
Thanks, Yi

1 Solution

jimodonald
Contributor

Here is a related post.... http://answers.splunk.com/answers/62413/how-to-extract-most-popular-words-from-the-source-data.html

I think the REX from that post should get you going in the right direction. I've pasted the REX below. Please see the original for more details.

source=*mybook* | sort -_time | rex mode=sed "s/(\.|,|;|=|\"|'|\(|\)|\[|\]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word 

View solution in original post

jzapantis
Path Finder

I used the following syntax to count the frequency of terms in my field:

              | rename COMMENTS_4 AS text
              | rex mode=sed field=text "s/[,|.|!]/ /"
              | makemv text
              | mvexpand text
              | eval wordCount = mvcount(text)
              | stats sum(wordCount) as "Word Map Text Analysis" by text

the line: | rename COMMENTS_4 AS text
just names my field variable to "text". So assuming you rename your field variable with text, you can count the terms using MV* commands

0 Karma

jimodonald
Contributor

Here is a related post.... http://answers.splunk.com/answers/62413/how-to-extract-most-popular-words-from-the-source-data.html

I think the REX from that post should get you going in the right direction. I've pasted the REX below. Please see the original for more details.

source=*mybook* | sort -_time | rex mode=sed "s/(\.|,|;|=|\"|'|\(|\)|\[|\]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word 

mhqssyh
Explorer

Thanks, jimodonald! I tried the REX. It works. But now I have another question that can I cluster some similar words to one class, such as fast, quick, rapid, swift.

0 Karma

jzapantis
Path Finder

you have to use a lexicon. Look up the nodejs library for Word Net. Upload that library. Then build a new app in splunk. Once that is done, create a .js file that calls the word net library, then define a search manager in the .js file that returns your splunk search. Loop through all the words, and pass each one to the word net library to built a temporary sysnonym dictionary. You can optionally save this dictionary as a KV store and continually update.

I know I didnt give details, but thats because it is a highly involved solution. But it is possible. Start poking around with Word Net and the capabilities.

Keep in mind, that all custom Splunk apps are basically Node.js apps - at least that is my current understanding. Community, let me know if I am wrong!

jimodonald
Contributor

Splunk is not going to know what words are synonyms. It could likely be done with a case statement or a lookup table. Either way the synonyms would need to be identified and linked back to a common word.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...