Splunk Search

How to get a random sample of iis events each day for the last X days to build control charts?

reachskhm
New Member

On iis logs, suppose I have 60000 transactions per 24 hours. How can I get a random sample of say 5000 events? I need to get a random sample for each day for suppose last 50 days. I want to build control charts based on response time (time_taken) from the iis logs.

0 Karma

woodcock
Esteemed Legend

Here is a random sample macro I use:

From macros.conf:

[Random_Sample(1)]
args = RandomSamplePercentEventsToKeep
definition = eval RandomSampleSeed = random()\
| sort 0 -RandomSampleSeed\
| eventstats count AS RandomSmpleTotalEventCount\
| eval RandomSampleNumberToKeep = ceil($RandomSamplePercentEventsToKeep$ * RandomSmpleTotalEventCount / 100)\
| streamstats count AS RandomSampleSerialNumber\
| where RandomSampleSerialNumber<=RandomSampleNumberToKeep\
| fields = RandomSample*
iseval = 0
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The latest Splunk Cloud version has recently gotten an event sampling feature, so it'd be reasonable to assume that's coming to Splunk Enterprise some day as well.
http://docs.splunk.com/Documentation/Splunk/6.3.1511/Search/Retrieveasamplesetofevents

Until then, you could fake a sampling rate of 1:60 by only looking at a specific date_second, or a sampling rate of 1:30 by looking at two seconds, and so on. If your data is sufficiently well-spread, this not-random sampling should work well enough.

For both sampling approaches, make sure you don't mess up your transactions if they comprise of multiple events per transaction.

Alternatively, just run over all your data without sampling.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

For 1:60 sampling, add date_second = 42 to your search. Any other second will do.

To check if this gives you a reasonable sampling, you could run statistics by second to see if there are any outliers, e.g. lots of events generated at the second zero from cronjobs.

You really should first consider running over your entire data set though. 60000 events over 24 hours really isn't that much if you have reference-spec hardware or better.

0 Karma

reachskhm
New Member

can you give me example of how to fake the sampling ?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...