All Apps and Add-ons

How to use differente languages in NLP Text Analytics app

celianouguier
Explorer

Hi,

I am using the app NLP Text Analytics with texts in english.

I have two questions about this app :

  1. Is there a way to use the command " | vader [...] " with some other languages (french for example)
  2. Does the command " | TruncatedSVD [...] " take into consideration the language of the texts ?
0 Karma
1 Solution

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

View solution in original post

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

celianouguier
Explorer

Thank you so much for your answer @worshamn !
Are you aware of a date when we will have a multilingual version in vader?
I don't know how to do text analysis in French in Splunk or if there is an effective and easy workaround to be at my level of competence....
Thank you for answering my questions anyway!

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...