All Apps and Add-ons

How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

ruwalbi
Explorer

I am trying to train clustering model but keep running in the memory limit error because the data is big. I would like to use event sampling but I am not aware of the command for it.
How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

How big is your sample data? Do you need to train on this large of sample data? Why not train on a smaller sample set if it represents a good percentage of the data needed?

Are you sure you're not bumping into limits as opposed to running out of memory?

0 Karma

ruwalbi
Explorer

Skoelpin , I have 500k observation. I want to limit to smaller set because I am just using a MLTK sandbox to judge if MLTK is a right solution for us before configuring it in PROD.

Let me know if you have solution. Thanks!

skoelpin
SplunkTrust
SplunkTrust

The point I'm trying to make is, why sample a larger data set when you can just reduce the size of the training data set.

Are you sure you're not bumping into limits as opposed to running out of memory?

Lastly, the MLTK is a collection of libraries imported into Splunk. It will work if you're giving it the right data and ask the right questions.

0 Karma

ruwalbi
Explorer

When we just reduce the size of the training data set, it doesn't randomly select the observation(rows/events). As a result, the data can't closely represent the whole population data-set.
If we using sampling, the data is randomly selected and it is more representative of our data-set.

You are right, I am bumping into limits. I have already requested to increase the limit. In the meantime, I wanted to learn about how I can sample using SPL to serve the immediate needs.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You can use event sampling above the search bar to accomplish this. You can also use certain SPL techniques to do sampling such as

|eval samplingperc=20 
| eval search=ceil(100/samplingperc)

Which means, sample 20% of the data.

Lastly, you can control these limits in the MLTK UI directly under the Settings tab in the nav bar. If this answered, your question, please accept it

0 Karma

ruwalbi
Explorer

Thanks a lot.

0 Karma

ruwalbi
Explorer

These commands doesn't seem to work. Are there any limitations ?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...