Splunk Search

How to get a random sample of iis events each day for the last X days to build control charts?

reachskhm
New Member

On iis logs, suppose I have 60000 transactions per 24 hours. How can I get a random sample of say 5000 events? I need to get a random sample for each day for suppose last 50 days. I want to build control charts based on response time (time_taken) from the iis logs.

0 Karma

woodcock
Esteemed Legend

Here is a random sample macro I use:

From macros.conf:

[Random_Sample(1)]
args = RandomSamplePercentEventsToKeep
definition = eval RandomSampleSeed = random()\
| sort 0 -RandomSampleSeed\
| eventstats count AS RandomSmpleTotalEventCount\
| eval RandomSampleNumberToKeep = ceil($RandomSamplePercentEventsToKeep$ * RandomSmpleTotalEventCount / 100)\
| streamstats count AS RandomSampleSerialNumber\
| where RandomSampleSerialNumber<=RandomSampleNumberToKeep\
| fields = RandomSample*
iseval = 0
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The latest Splunk Cloud version has recently gotten an event sampling feature, so it'd be reasonable to assume that's coming to Splunk Enterprise some day as well.
http://docs.splunk.com/Documentation/Splunk/6.3.1511/Search/Retrieveasamplesetofevents

Until then, you could fake a sampling rate of 1:60 by only looking at a specific date_second, or a sampling rate of 1:30 by looking at two seconds, and so on. If your data is sufficiently well-spread, this not-random sampling should work well enough.

For both sampling approaches, make sure you don't mess up your transactions if they comprise of multiple events per transaction.

Alternatively, just run over all your data without sampling.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

For 1:60 sampling, add date_second = 42 to your search. Any other second will do.

To check if this gives you a reasonable sampling, you could run statistics by second to see if there are any outliers, e.g. lots of events generated at the second zero from cronjobs.

You really should first consider running over your entire data set though. 60000 events over 24 hours really isn't that much if you have reference-spec hardware or better.

0 Karma

reachskhm
New Member

can you give me example of how to fake the sampling ?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...