Splunk IT Service Intelligence

In the Machine learning toolkit, apply command with probabilities=true returns very few results.

KrithikaRamakri
Explorer

Hi everyone, I am trying to apply logistic regression in Splunk to predict phishing, this is my query:

sourcetype="incoming_email"
| apply tfidf_sender | apply tfidf_subject | apply tfidf_sender_ip | apply tfidf_url | apply tfidf_Attachments_MD5
| apply test_model probabilities=true | table Sender Subject Sender_ip "predicted(Is Phishing)" "probability(Is Phishing=Yes)"

I am applying tfidf on the fields followed by the test_model which is my logistic regression, the value for probability is populated only for a very few fields, for the rest of the fields it is empty. Can someone please help me on how to populate this value? Is there any other way to identify based on which fields, logistic regression has classified my email?

0 Karma

astein_splunk
Splunk Employee
Splunk Employee

When we look at "Understanding fit and apply" from the MLTK docs, we see that apply can use null fields, unlike fit, when applying models to generate an predicted field . However you may not get all the functionality of the algorithm (like probabilities) if those other functionalities are reliant on good data.

Is it possible that the fields you logistic regression is being applied to are null? So the probabilities field isn't being populated because there isn't a continuous/valid value for each field?

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...