All Apps and Add-ons

Machine Learning tool kit v3.4 model not returning result

teresachila
Path Finder

I just upgraded the MLTK from v2.2 to v3.4, along with the latest python SA. After this change, I realize that my Random Forest model is returning empty result for some rows. (I apply the model to a few thousand rows each time.) At first I thought that it was an input data problem. But when I took a row that had empty result before, ran it individually (i.e. doing a |head 1), then the model returned result. Then I thought maybe the model was built in v2.2, so I rebuilt (or fit again) the model in v3.4, again it was returning empty results for some rows, but a different subset of rows this time.

Has anyone seen the same issue? Should I revert back to the old version??

I don't see anything in search.log that will help, but I always see this:
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/etc/apps/Splunk_ML_Toolkit/bin/util/search_util.py", line 114, in add_distributed_search_info
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: raise RuntimeError('Failed to load model "%s": ' % (process_options['model_name']))
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: KeyError: 'model_name'
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - Error in 'apply' command: (KeyError) 'model_name'

Is it trying to distribute the apply command to the indexers? Can I run it locally on the search head, since all my input data (csv and kvstore) are on the search head?

0 Karma
1 Solution

teresachila
Path Finder

I think I found the issue. For some reason, the new version does not like null or anything close to null being passed to the model. It does not like empty string (i.e. "", or len=0), and it does not like string values "NA" or "N/A" or "null" either. (The "NA" was returned by an external API.)

So far I observed three different symptoms: 1) the model returns an empty prediction value, no other messages in the log, 2) the model fails with an error message about null values being passed, 3) the model returns a prediction, but with warning message in search.log about null value in the model. Which symptom manifests when depends on how many rows are being processed. If I apply the model with 1 row, it usually returns a prediction value. If I apply it to thousands of rows, it usually returns empty value.

To remediate, I added this code:

| fillnull value="NoValue"
| foreach prefix_*  [eval <<FIELD>>=if(len(<<FIELD>>)=0 OR <<FIELD>>="N/A" OR <<FIELD>>="NA" OR <<FIELD>>="null","NoValue",<<FIELD>>)]

View solution in original post

0 Karma

teresachila
Path Finder

I think I found the issue. For some reason, the new version does not like null or anything close to null being passed to the model. It does not like empty string (i.e. "", or len=0), and it does not like string values "NA" or "N/A" or "null" either. (The "NA" was returned by an external API.)

So far I observed three different symptoms: 1) the model returns an empty prediction value, no other messages in the log, 2) the model fails with an error message about null values being passed, 3) the model returns a prediction, but with warning message in search.log about null value in the model. Which symptom manifests when depends on how many rows are being processed. If I apply the model with 1 row, it usually returns a prediction value. If I apply it to thousands of rows, it usually returns empty value.

To remediate, I added this code:

| fillnull value="NoValue"
| foreach prefix_*  [eval <<FIELD>>=if(len(<<FIELD>>)=0 OR <<FIELD>>="N/A" OR <<FIELD>>="NA" OR <<FIELD>>="null","NoValue",<<FIELD>>)]
0 Karma

grana_splunk
Splunk Employee
Splunk Employee

Is it a distributed or Search head cluster setup? Are you using streaming apply on all the indexers?? If yes, did you upgraded PSC on all the indexers? You need to recreate your model after upgrading PSC version.

0 Karma

teresachila
Path Finder

It is set up for distributed search to multiple indexers. Not a search head cluster. How do I know if I'm using streaming apply? I only upgraded PSC on the search head, not the indexers.

0 Karma

grana_splunk
Splunk Employee
Splunk Employee

Open mlspl.conf file under $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/default/mlspl.conf or $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local/mlspl.conf and check if streamily apply has been set to true or not.

Also, if you have upgraded the setup and streaming apply is true., Please upgrade PSC on all your indexers.

0 Karma

teresachila
Path Finder

Thanks! stream_apply is set to false.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...