Splunk IT Service Intelligence

In the Machine learning toolkit, apply command with probabilities=true returns very few results.

KrithikaRamakri
Explorer

Hi everyone, I am trying to apply logistic regression in Splunk to predict phishing, this is my query:

sourcetype="incoming_email"
| apply tfidf_sender | apply tfidf_subject | apply tfidf_sender_ip | apply tfidf_url | apply tfidf_Attachments_MD5
| apply test_model probabilities=true | table Sender Subject Sender_ip "predicted(Is Phishing)" "probability(Is Phishing=Yes)"

I am applying tfidf on the fields followed by the test_model which is my logistic regression, the value for probability is populated only for a very few fields, for the rest of the fields it is empty. Can someone please help me on how to populate this value? Is there any other way to identify based on which fields, logistic regression has classified my email?

0 Karma

astein_splunk
Splunk Employee
Splunk Employee

When we look at "Understanding fit and apply" from the MLTK docs, we see that apply can use null fields, unlike fit, when applying models to generate an predicted field . However you may not get all the functionality of the algorithm (like probabilities) if those other functionalities are reliant on good data.

Is it possible that the fields you logistic regression is being applied to are null? So the probabilities field isn't being populated because there isn't a continuous/valid value for each field?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...