Getting Data In

Why is my wildcard descending into directories?

pheezy
Explorer

According to this document: Specifyinputpathswithwildcards

The asterisk wildcard matches anything
in that specific directory path
segment.

Unlike "...", "*" doesn't recurse
through any subdirectories.

However, this doesn't seem to be case.

For instance, I have many inputs like this:

[monitor:///usr/local/vnd/*/server/logs/stdout.log]
disabled=false
sourcetype=log4j
blacklist=data

I would think that this would only look at the first level of directories but the output of /opt/splunk/bin/splunk list monitor shows that there are thousands upon thousands of monitored directories. This seems to cause forwarding agents to use up to 1GB of memory. Am I doing something wrong? How do I limit the directory depth when using a wildcard?

Monitored Directories:
    $SPLUNK_HOME/etc/apps/sample_app/logs
            /opt/splunk/etc/apps/sample_app/logs/maillog
            /opt/splunk/etc/apps/sample_app/logs/maillog.1
    $SPLUNK_HOME/var/log/splunk
...
    /usr/local/vnd/*/server/logs/stdout.log
            /usr/local/vnd/application1
            /usr/local/vnd/application1/java
            /usr/local/vnd/application1/java/bin
            /usr/local/vnd/application1/java/db
            /usr/local/vnd/application1/java/demo
            /usr/local/vnd/application1/java/include
            /usr/local/vnd/application1/java/jre
            /usr/local/vnd/application1/java/lib
            /usr/local/vnd/application1/java/man
            /usr/local/vnd/application1/java/sample
            /usr/local/vnd/application1/logs
            /usr/local/vnd/application1/resin
            /usr/local/vnd/application1/resin/automake
            /usr/local/vnd/application1/resin/bin
            /usr/local/vnd/application1/resin/conf
            /usr/local/vnd/application1/resin/contrib
            /usr/local/vnd/application1/resin/lib
            /usr/local/vnd/application1/resin/modules
            /usr/local/vnd/application1/resin/php
            /usr/local/vnd/application1/resin/webapps
            /usr/local/vnd/application1/resin/win32
            /usr/local/vnd/application1/server
...
Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

This is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java will be enumerated because they are under /usr/local/vnd/ (because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.

The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.

While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.


Update:

There are a couple of ways to try to work around this:

  • If you know the names of the individual subdirectories represented by * specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completely
  • If you don't know them ahead of time, instead periodically runs a separate script that looks and creates symbolic links to the directories you want. Put these links in another dedicated location, and monitor that other location. This deals with the problem by removing the non-matching directories from Splunk's path so it doesn't see them. Instead, your script does that work.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

This is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java will be enumerated because they are under /usr/local/vnd/ (because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.

The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.

While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.


Update:

There are a couple of ways to try to work around this:

  • If you know the names of the individual subdirectories represented by * specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completely
  • If you don't know them ahead of time, instead periodically runs a separate script that looks and creates symbolic links to the directories you want. Put these links in another dedicated location, and monitor that other location. This deals with the problem by removing the non-matching directories from Splunk's path so it doesn't see them. Instead, your script does that work.

gkanapathy
Splunk Employee
Splunk Employee

It would run exactly the same, and do the exact same thing. Updating the answer with more suggestions.

0 Karma

pheezy
Explorer

That's odd, because I actually don't get the correct files. I'm deploying apps that actually have a lot of monitor inputs like the one listed in the OP, so maybe there is some kind of overlap? Can I use full regex support then? Would something like this work?
[monitor:///usr/local/vnd/[^\]+/server/logs/stdout.log]

I would think that would be faster and use less resources as well, no? If, of course, it's possible.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...