Splunk Search

Random sampling ratio in subsearch (long OR list) only?

alancalvitti
Path Finder

Is it possible, via Splunk's Python SDK, to specify event sampling ratio (say 1:1000) or some equivalent random evaluation in a subsearch which returns a long OR expression while specifying that the outer search does not sample?

For concreteness, the subsearch is:

[search index=my_index   |  rex "(?i)deviceId=(?P<DevId>[^ ]+)" | dedup DevId | return 1000000 $DevId]

This returns a long OR lists, each of which can match one or more events. It is critical to extract all events associated with the randomly sampled device.

0 Karma

DavidHourani
Super Champion

Hi @alancalvitti,

Event sampling applies on the result of your search. If you use a subsearch to generate an ORfilter the filter itself will not be subject to sampling but the result of the filtered search will be.

As mentioned here : https://docs.splunk.com/Documentation/Splunk/latest/Search/Retrieveasamplesetofevents

If a search matches 1,000,000 events when sampling is not used, using a sample ratio value of 100 would result in returning approximately 10,000 events.

So whatever you filter on in your search will be applied as is and then the sampling will take place.

Hope that helps.

Cheers,
David

0 Karma

alancalvitti
Path Finder

Thanks, but I need the logic the other way around: sampling (with specified ratio) in subsearch, and no sampling in outer search. Is there a way to emulate this behavior?

0 Karma

DavidHourani
Super Champion

You can use modulus to do so in your subsearch, making it look something like this :

[search index=my_index | rex "(?i)deviceId=(?P[^ ]+)"  | dedup DevId | streamstats count as sampler | eval sampler=sampler%5| where sampler=0 | return 1000000 $DevId]

This will use a fixed sampling rate of 20% (modulus 5).

0 Karma

alancalvitti
Path Finder

That's clever. That sampler strategy, coupled with the outer query to return events, seems to return reasonable results for short time spans, eg 1hour, but when increasing time range to, say 24hr, only a few events are matched (keeping sampler rate fixed at say sampler%1000). Any idea why?

0 Karma

DavidHourani
Super Champion

Could be that the subsearch is timing out and returning what it can after timeout, test how long the subsearch is taking by checking the job inspector or by running it seperately.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...