Splunk Search

subsearch of regexed data versus index

neilstuartcraig
New Member

Hello all

I am trying to create a scheduled search to run every 15 minutes, scanning from -15m to now. This search uses regex to chop out fields from IIS logs e.g. uri, query string, http status code etc. I want to also include a subsearch against an index which has the same regexed fields stored in it as the main search though the index only stores data from 15m ago and older. My aim is to produce an alert if "new" errors occur, that is errors which occurred in the last 15 minutes but do not feature in the index - this is to provide quick feedback on new website code deployments etc.

I thought i understood what needed to be done but it is not working so if anyone can suggest what i have done wrong, i would very much appreciated it.

My search which includes the subsearch is:

earliest_time=-15m sourcetype="IISLogs" (host="xxxx-vmweb-p04" OR host="xxxx-vmweb-p05" OR host="xxxx-vmweb-p06" OR host="xxxx-vmweb-p07" OR host="xxxx-vmweb-p08") | 
rex field=_raw "(?<year>\d{4})-(?<month>\d{1,2})-(?<date>\d{1,2}) (?<hours>\d{1,2}):(?<minutes>\d{1,2}):(?<seconds>\d{1,2}) (?<server_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<request_method>\w{3,4}) (?<uri>\S*) (?<query_string>.*) (?<server_port>\d{2,5}) - (?<remote_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<user_agent>.*) (?<http_status_code>\d{3}) (?<http_sub_status_code>\d{1,3}) (?<time_taken>\d{1,10}) (?<num_bytes>\d{1,10})$" | 
rex field=source "^\w{1}:.*W3SVC(?<site_id>\d{1,3}).*$" | 
cluster field=uri t=0.99 countfield=numErrors | 
sort -numErrors | 
search [search index="CLIENT_AAA__deploy_arch" | table uri] | 
table numErrors http_status_code site_id uri query_string

My index of historic errors is basically built from the same search as above but with different timescale (-30m to -15m via schedule) and of course doesn't have the subsearch:

sourcetype="IISLogs" (host="xxxx-vmweb-p04" OR host="xxxx-vmweb-p05" OR host="xxxx-vmweb-p06" OR host="xxxx-vmweb-p07" OR host="xxxx-vmweb-p08") | 
rex field=_raw "(?<year>\d{4})-(?<month>\d{1,2})-(?<date>\d{1,2}) (?<hours>\d{1,2}):(?<minutes>\d{1,2}):(?<seconds>\d{1,2}) (?<server_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<request_method>\w{3,4}) (?<uri>\S*) (?<query_string>.*) (?<server_port>\d{2,5}) - (?<remote_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<user_agent>.*) (?<http_status_code>\d{3}) (?<http_sub_status_code>\d{1,3}) (?<time_taken>\d{1,10}) (?<num_bytes>\d{1,10})$" | 
rex field=source "^\w{1}:.*W3SVC(?<site_id>\d{1,3}).*$" | 
cluster field=uri t=0.99 countfield=numErrors | 
sort -numErrors | 
table numErrors http_status_code site_id uri query_string

I am unsure if i have the subsearch in the right place and also whether i need to somehow specify the fields to compare from the search and subsearch.

I have a reasonable amount of experience with Splunk searches but have never used a subsearch before (as you can probably tell!).

All suggestions very much appreciated.

Thanks
Neil

Tags (2)
0 Karma

lguinn2
Legend

I have to disagree on defining field extraction at index time. Yes, define your field extractions in props.conf - that is correct, and it is actually at search time.

As stefano points out, having the field extractions defined will simplify your search dramatically. It will also make it easier to create reports and other items.

There is a built-in sourcetype named iis - if you use it, you might get the field extractions without needing to do anything at all. IIS sometimes shows up as iis-2 or iis-3, etc. So if you still need to add field extractions to props.conf, you can use the following stanza header and your field extractions will apply to all of the IIS variants:

[(?:::){0}iis*]
# your field extractions here
EXTRACT-e1=(?<year>\d{4})-(?<month>\d{1,2})-(?<date>\d{1,2}) (?<hours>\d{1,2}):(?<minutes>\d{1,2}):(?<seconds>\d{1,2}) (?<server_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<request_method>\w{3,4}) (?<uri>\S*) (?<query_string>.*) (?<server_port>\d{2,5}) - (?<remote_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?<user_agent>.*) (?<http_status_code>\d{3}) (?<http_sub_status_code>\d{1,3}) (?<time_taken>\d{1,10}) (?<num_bytes>\d{1,10})$

Also, I would propose this search

earliest_time=-15m sourcetype="iis*" AND
(host="xxxx-vmweb-p04" OR host="xxxx-vmweb-p05" OR host="xxxx-vmweb-p06" OR host="xxxx-vmweb-p07" OR host="xxxx-vmweb-p08") AND
( [search index="CLIENT_AAA__deploy_arch" | table uri ] )
| stats count as numErrors by http_status_code site_id uri query_string
| sort -numErrors

This combines the subsearch into the initial search, which will be faster. But the subsearch over the CLIENT_AAA__deploy_arch index is running over all time. Perhaps you should include an earliest= parameter in the subsearch as well?

0 Karma

stefano_guidoba
Communicator

Hi Neil,

the simplest solution to your problem is to define field extraction for iis sourcetype at indexing time, by editing your app's props.conf file.
In your $SPLUNK_HOME/etc/system/default you will also find Splunk props.conf which contains field extraction for Apache access_combined logs, that are very similar to IIS ones. You can take a glance there and provide your custom field extraction under $SPLUNK_HOME/etc/system/local or under your IIS app home folder (should be something like $SPLUNK_HOME/etc/apps/iis).

Remember to specify the sourcetype name in props.conf stanza (for example [iis]) and assign, in inputs.conf, the same sourcetype, so that when gathered, that logs will automatically be parsed to extract the fields you wanted.

At this point, your subsearch should become way simpler.
Please refer to Splunk Docs for more information on subsearches.
Regards,
Stefano

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...