Easy one, this, but I can't seem to get it right.
I'm monitoring a series of directories which are rsync'd from other servers. Splunk, being ever so efficient, is managing to index the . files that rsync creates, as well as the files after they arrive. This has resulted in rather a lot of unnecessary data.
The answer, to me, should be either whitelists or blacklists.
For one of the directories, I can whitelist, as the files are all "blah.log" and thus "blah.log$" should work fine.
However, in other directories the files are named all sorts of things, and there's no easy regex to whitelist. So a blacklist should do the trick. But I can't seem to get a regex working for "any file starting with a ."
Hints?
There was something definitely amiss with the ability to parse recursive directories and use whitelist/blacklists, so I've gone ahead and created a monitor stanza in my inputs.conf for each of the 8 files. That was the only thing that got Splunk to actually show the content of those files in a query.
There was something definitely amiss with the ability to parse recursive directories and use whitelist/blacklists, so I've gone ahead and created a monitor stanza in my inputs.conf for each of the 8 files. That was the only thing that got Splunk to actually show the content of those files in a query.
blacklist = /\.[^/]+$
should do it
What does your current regex look like? Make sure you're not forgetting to put a slash in front of the dot, or it will think it's a wildcard.
Have you tried just:
blacklist=^\.
(For older versions of Splunk, use _blacklist
instead of blacklist
)
I've put in gkanapathy's for now, but, I think something is wrong with my whitelist -- is there any potential interaction between whitelists and monitoring directories which have sub-directories (and it's in the sub-directories where my files are)?
I now have:
[monitor:///Volumes/A/b/c] crcSalt = <SOURCE> disabled = false followTail = 0 host = strawberry index = submarine whitelist = submarine\.out$ sourcetype = log4j
However, my files are actually located in:
/Volumes/A/b/c/cluster3/data/instance/box-4/logs /Volumes/A/b/c/cluster2/data/instance/box-3/logs /Volumes/A/b/c/cluster2/data/instance/box-1/logs
And so on. A list of about 8 or so locations, but, since they're all under "c" I just pointed Splunk at that.
According to the inputstatus Tailing Processor URL, it's found "c" and some files in "c" which did not match the whitelist, but there's no indication that data in the rest of the path, and it's definitely not in the index (yesterday's data is, before I made this whitelist change).