All Apps and Add-ons

How to use differente languages in NLP Text Analytics app

celianouguier
Explorer

Hi,

I am using the app NLP Text Analytics with texts in english.

I have two questions about this app :

  1. Is there a way to use the command " | vader [...] " with some other languages (french for example)
  2. Does the command " | TruncatedSVD [...] " take into consideration the language of the texts ?
0 Karma
1 Solution

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

View solution in original post

worshamn
Contributor
  1. Well currently the answer for this is no, as looking at the lexicon for vader ($SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/sentiment/vader_lexicon/vader_lexicon.txt) it is strictly English but it would seem a later version of vaderSentiment (that is not packaged with NLTK) may have a translation service available that I would need to look into (https://stackoverflow.com/a/45490928) but may still not be ideal. Related, I originally only intended the app to be packaged with English support but of course this could changed based on need/requests, however any langauge additions can be downloaded from http://www.nltk.org/nltk_data/ and placed in appropriate folders (which is sometimes difficult to figure out) in $SPLUNK_HOME/nlp-text-analytics/bin/nltk_data/, however for example using the cleantext command also is set to english and I will need to provide options to adjust that in the future.
  2. Yes, TruncatedSVD works completely on context (and normally requires a large corpus to be effective) and expects that the text has first been converted into a term document matrix using the TFIDF algorithm. Here is somewhat a visualization of how the math works http://matpalm.com/lsa_via_svd/intro.html.

celianouguier
Explorer

Thank you so much for your answer @worshamn !
Are you aware of a date when we will have a multilingual version in vader?
I don't know how to do text analysis in French in Splunk or if there is an effective and easy workaround to be at my level of competence....
Thank you for answering my questions anyway!

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...