Monitoring Splunk

Tuning SEDCMDs -- how do you measure gains?

twinspop
Influencer

I'm using SEDCMD to cleanup (and reduce) iislogs:

# remove all path info but the (unique) file name
SEDCMD-uritrim = s% /commonurlbase[^ ]*/% ./%
# reduce chrome version (chrome mentions safari, so separate sed needed?)
SEDCMD-chrome = s% Mozilla[^ ]*Chrome.([0-9.]*)[^ ]*% Chrome-\1%
# reduce agent name version
SEDCMD-agents = s% Mozilla[^ ]*(Safari|Firefox|MSIE).([0-9.]*)[^ ]*% \1-\2%
# trim sid query string from referral url
SEDCMD-reftrim = s%.aspx\?s[iI][dD]=[^ ]*%.aspx%
# trim portal sign-on shenanigans from referral
SEDCMD-portalreftrim = s%/\!ut[^ ]*%%

I was thinking of ways to combine (some of) these, and/or maybe try to come up with a more efficient regex on some of them. What are some options for testing the performance effect of the changes? With 200+ GB of logs passing through daily, I want to be sure we're as efficient as we can be -- allowing that the logs need to be 'cleaned' as outlined above.

Thanks,

jon

0 Karma

dwaddle
SplunkTrust
SplunkTrust

This might be rather difficult to measure. Assuming that your daily indexing volume remains mostly-flat day to day, you might be able to come up with a measurement based on CPU seconds used by the indexing process day over day. The main issue is that these regexes will be firing as events come in, potentially changing the raw value of the event. Each test of "does this regex match?" uses a miniscule amount of CPU time, and each substitution if it does match uses a only a little more.

Your most accurate bet (which is a lot of work) would be to implement a simple regex profiler. We know that Splunk uses PCRE, which is open source. You could build a test harness to evaluate the use of each of these regexes, over a sample of several hundred thousand events, in a controlled fashion. No, not easy at all - but it would be more accurate than trying to measure it in-situ in a running indexer.

0 Karma

twinspop
Influencer

Well, not so much interested in specific regex tips, but how to evaluate whether a new regex is helpful or hurtful. Log example: 2012-02-27 21:57:00 172.20.90.43 POST /websiterooturi/subfolder/somepage.aspx sID=abcdef1234567890ABCDEF 80 - 172.20.176.20 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+Trident/4.0;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) https://snazzywebsitedotcom/websiterooturi/subfolder/referringpage.aspx?sID=abcdef1234567890ABCDEF 200 1146 4870 375

0 Karma

Masa
Splunk Employee
Splunk Employee

Need sample logs to provide if you are really looking for better regex.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...