I have a handful of different sourcetypes that all get written to log files in /var/log/app. I also have more than one index that the files can be sent to depending on their type. From my testing, and from what I can understand from the documentation I can't have multiple [monitor] stanzas with whitelist/blacklists for logs that live in the same directory. Therefore in order to assign sourcetype and index based on some filename regex it would seem my choices are limited to the following:
an example of option 2 would be as follows:
example Log files:
inputs.conf (on the LWF)
[monitor:///var/log/app/]
whitelist = \w+\.(?:\d{4}-\d{2}-\d{2}|log)$
# i want to ignore zipped files
blacklist = \.(gz|bz2|z|zip)$
**props.conf (on my indexers)**
# billing logs
# eg coolservice-billing.log or coolservice-billing.2011-03-03
# or coolservice-billing.2011-03-03.log
[source::\w+-billing\.(?:\d{4}-\d{2}-\d{2}|log)$
sourcetype = custom-billing
TRANSFORMS-index = billingindex
priority = 200
# web logs
[source::\w+-(?:web|req)\.(?:\d{4}-\d{2}-\d{2}|log)$
sourcetype = log4j
TRANSFORMS-index = mainindex
# log4j service logs
[source::\w+\.(?:\d{4}-\d{2}-\d{2}|log)$
sourcetype = log4j
TRANSFORMS-index = mainindex
transforms.conf (on my indexers)
[billingindex]
REGEX = .*
DEST_KEY = _MetaData:Index
FORMAT = billingindex
[mainindex]
REGEX = .*
DEST_KEY = _MetaData:Index
FORMAT = mainindex
so my questions are: Is this the best/only way to do it? Will I suffer any indexing perf problems doing it this way?
If you are using 4.1 or newer, you can have multiple stanzas in inputs.conf where the whitelist is implied by the stanza name. for example:
[monitor:///var/log/app/\w+.log*]
sourcetype = log4j
index = main
[monitor:///var/log/app/\w+-(web|req).log*]
sourcetype = access_common
index = main
[monitor:///var/log/app/\w+-billing.log*]
sourcetype = custom_billing
index = billing
Note that the trailing *
is needed to convince us to treat the pattern as a regex.
However, your props/transforms approach will work fine without any performance degradation.
actually the "sourcetype" statements in props.conf source stanzas are processed on the forwarders, the same as the inputs.conf, so if you were going to go that route, you would put it on the forwarders. See: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F . But Stephen's method is probably best, especially since you are also setting "index".
If you are using 4.1 or newer, you can have multiple stanzas in inputs.conf where the whitelist is implied by the stanza name. for example:
[monitor:///var/log/app/\w+.log*]
sourcetype = log4j
index = main
[monitor:///var/log/app/\w+-(web|req).log*]
sourcetype = access_common
index = main
[monitor:///var/log/app/\w+-billing.log*]
sourcetype = custom_billing
index = billing
Note that the trailing *
is needed to convince us to treat the pattern as a regex.
However, your props/transforms approach will work fine without any performance degradation.
Note that the simple regexes in your original answer do seem to work, but i never could get any complex [monitor:///] regexes (e.g. with optional capture groups) to work consistently.
The .* here is like .* in path globbing from a Unix shell, in that it's the literal . followed by anything up to the path separator. The way to interpret this is: if we see a "" or a "...", we will transition to globbing mode. We first translate "" to "[^/]", "..." to "." and "." to ".". At this point, any remaining regexes are left in, as is. So the regex above will find files that start with one or more of \w, followed by a literal ".", followed by any characters until the end of the filename.
i'm pulling my hair out trying to get this to work. I really don't understand how you can mix syntax. your example: [monitor://F:\var\log\app\\w+.] uses a '.' at the end while documentation shows that ellipses, '...', should be used in place of .* in a monitor stanza: http://www.splunk.com/base/Documentation/4.1.7/Admin/Specifyinputpathswithwildcards. where can i find out definitively what regex syntax i can use in a monitor definition? should i be able to use advanced regex features like capture groups and optional terms?
Right, we should be able to use [monitor:///var/log/app/\w+.], [monitor:///var/log/app/\w+-web.], [monitor:///var/log/app/\w+-billing.]. The windows variants should be: [monitor://F:\var\log\app\\w+.] and so forth.
no - this is to expand the regexes above to allow for file rotation suffixes and also set up regexes for windows paths. i should have just used the previous log names for an example. so:
coolservice.log, coolservice.1.log, coolservice.log.2011-03-07
coolservice-web.log, coolservice-web.1.log, coolservice-web.log.2011-03-07
etc. (same as original question)
and the windows path for those logs will be F:\weblogs\
I can paste in what i've tried if you want. the debug splunk logs show the regex getting parsed all funky (e.g. a \ becomes \ so things like \w become \w and don't work)
If the characteristic here is to distinguish the above files from service-* files, I'd suggest [monitor:///var/log/app/service.*].
can you help me with some more complicated windows path regexes? say logs are in F:\weblogs and log names i'd like to match with a single monitor stanza regex are service.log service.1.log and service.log.2011-03-04. and ideally this would be reusable for a bunch of other services. let me know if this is better asked as a new question.
No, I can't find that clearly documented, but will make sure it gets into the docs. Also, no need to escape the '.', we will treat it as a literal.
i'm testing this now. is there a document somewhere that explains putting a * at the end of a monitor stanza treats the pattern as a regex? and is there a need to escape the '.' to make it a literal match? e.g. . in [monitor:///var/log/app/\w+.log*]
I designed this to not have any overlap. \w will only match [A-Za-z_], so the "-" will not match.
You can absolutely have a blacklist per-stanza.
how are the regex overlaps handled? for example I think a log with name coolservice-web.log will match the monitor regex for your first monitor stanza and also your 2nd monitor stanza. should i expect to be able to use a blacklist in each stanza?