All Apps and Add-ons

How to use differente languages in NLP Text Analytics app

celianouguier
Explorer

Hi,

I am using the app NLP Text Analytics with texts in english.

I have two questions about this app :

  1. Is there a way to use the command " | vader [...] " with some other languages (french for example)
  2. Does the command " | TruncatedSVD [...] " take into consideration the language of the texts ?
0 Karma
1 Solution

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

View solution in original post

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

celianouguier
Explorer

Thank you so much for your answer @worshamn !
Are you aware of a date when we will have a multilingual version in vader?
I don't know how to do text analysis in French in Splunk or if there is an effective and easy workaround to be at my level of competence....
Thank you for answering my questions anyway!

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...