According to this document: Specifyinputpathswithwildcards
The asterisk wildcard matches anything
in that specific directory path
segment.Unlike "...", "*" doesn't recurse
through any subdirectories.
However, this doesn't seem to be case.
For instance, I have many inputs like this:
[monitor:///usr/local/vnd/*/server/logs/stdout.log]
disabled=false
sourcetype=log4j
blacklist=data
I would think that this would only look at the first level of directories but the output of /opt/splunk/bin/splunk list monitor shows that there are thousands upon thousands of monitored directories. This seems to cause forwarding agents to use up to 1GB of memory. Am I doing something wrong? How do I limit the directory depth when using a wildcard?
Monitored Directories:
$SPLUNK_HOME/etc/apps/sample_app/logs
/opt/splunk/etc/apps/sample_app/logs/maillog
/opt/splunk/etc/apps/sample_app/logs/maillog.1
$SPLUNK_HOME/var/log/splunk
...
/usr/local/vnd/*/server/logs/stdout.log
/usr/local/vnd/application1
/usr/local/vnd/application1/java
/usr/local/vnd/application1/java/bin
/usr/local/vnd/application1/java/db
/usr/local/vnd/application1/java/demo
/usr/local/vnd/application1/java/include
/usr/local/vnd/application1/java/jre
/usr/local/vnd/application1/java/lib
/usr/local/vnd/application1/java/man
/usr/local/vnd/application1/java/sample
/usr/local/vnd/application1/logs
/usr/local/vnd/application1/resin
/usr/local/vnd/application1/resin/automake
/usr/local/vnd/application1/resin/bin
/usr/local/vnd/application1/resin/conf
/usr/local/vnd/application1/resin/contrib
/usr/local/vnd/application1/resin/lib
/usr/local/vnd/application1/resin/modules
/usr/local/vnd/application1/resin/php
/usr/local/vnd/application1/resin/webapps
/usr/local/vnd/application1/resin/win32
/usr/local/vnd/application1/server
...
This is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java
will be enumerated because they are under /usr/local/vnd/
(because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.
The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.
While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.
Update:
There are a couple of ways to try to work around this:
*
specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completelyThis is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java
will be enumerated because they are under /usr/local/vnd/
(because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.
The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.
While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.
Update:
There are a couple of ways to try to work around this:
*
specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completelyIt would run exactly the same, and do the exact same thing. Updating the answer with more suggestions.
That's odd, because I actually don't get the correct files. I'm deploying apps that actually have a lot of monitor inputs like the one listed in the OP, so maybe there is some kind of overlap? Can I use full regex support then? Would something like this work?
[monitor:///usr/local/vnd/[^\]+/server/logs/stdout.log]
I would think that would be faster and use less resources as well, no? If, of course, it's possible.