I am indexing web logs in Splunk and one thing I am trying to do is attempt to match the URI against a list of regexes to categorize the type of request...
index=weblog | replace *wp-login.php* with "WordPress Login" in uri_path | replace *wp-content* with "WordPress Content", *wp-include* with "WordPress Include", *wp-comment* with "WordPress Comment" in uri_path | replace *wp-admin* with "WordPress Admin Access" in uri_path |replace *wpad.dat* with "WebProxy AutoDetection" in uri_path | ...
What I would like to do is add a request_type field to the events that contains that information. The problem is that not everything is a * wildcard. Some of the request_type information I want to capture is more of a regex. For example:
/[Mm][aA4][iIl1][1lL][eE3][rR].php
/[Mm][aA4][iIl1][eE3][1lL][rR].php
Is there a way to do this via a lookup table? I could do it with an external script, but I seem to run into issues when I have more than a couple hundred things to lookup (I'll see results while the list is small, but then as the list grows, the lookup results start to disappear).
index=weblog | stats count by uri_path | lookup REQUEST_lookup uri_path OUTPUT request_type
... View more