Hi,
4.2.3 UF on AIX
I have a folder structure like
/inputs/b/1/2/34/...
/inputs/b/1/2/3
/inputs/b/1/2/35
/inputs/b/1/2/36
/inputs/b/1/2/36/file092.log
/inputs/b/1/2/36/file123.log
/inputs/c/1/2/37/
/inputs/c/1/2/34/...
/inputs/c/1/2/3
/inputs/c/1/2/35
/inputs/c/1/2/36
/inputs/c/1/2/36/file092.log
/inputs/c/1/2/36/file123.log
/inputs/c/1/2/37/
...
/inputs/d/1/2/34/...
/inputs/d/1/2/3
/inputs/d/1/2/35
/inputs/d/1/2/36
/inputs/d/1/2/36/file092.log
/inputs/d/1/2/36/file123.log
/inputs/d/1/2/37/.../
...
Where dots are folders or subfolders, up to thousands nested.
At each level, on each folders many files.
This is on few hundreds of servers.
How do I specify a monitor:// stanza which will not cause Splunk to go and scan every folder (seeing lots statx() called and high cpu, when I use the below:
[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
index = main
crcSalt =
So I've tried to force pure regex with
[monitor:///inpu*/[a-zA-Z0-9\/]+file[0-9]+.log]
index = main
crcSalt =
Getting
DEBUG TailingProcessor - Adding implicit whitelist '^/input[^/]*/[a-zA-Z0-9\/]+file[0-9]+.log$' on path 'monitor://'.
According to DEBUG log seems to do the job (not hitting all the unwanted folders) but my files are not picked up - What am I missing and/or is it possible to achieve the desired result on 4.2.3 AIX?
Thanks in advance
Antonio
With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"
The only way I know to avoid it is to have a monitor for each specific path
[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]
…
This code is not special to AIX at all, it will be pretty much identical for all UNIX types.
For an input like
[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
We have to set up a watch on /inputs, because everything beyond this point is a pattern match.
We do use pcre partial-match testing, so that if we reach a directory that PCRE can tell us will never match, no matter how much additional text is added, we can skip it.
Thus, for example if we find a dir such as
/inputs/q/2
we should be able to skip over this, because no matter how much additional text is added, the 2 will never match the 1 in the regex.
However, I'm a little unclear about the case of
/inputs/q/1/2/3
I think we try to force this to fail to match here by adding a slash after the directory name, but I'm not certain. We might descend into this directory. I would recommend testing locally in a simple setup.
Yann's answer to specify only the exact dirs you want observed will certainly work.
As for your attempted workaround, I think it's a little sketchy to ask Splunk to monitor /. However your regex which gets built out as ^/input[^/]*/[a-zA-Z0-9/]+file[0-9]+.log$ is very permissive. It allows any sequence of dir names that contain only ascii alphanumerics, followed a numbered filename. This regex should allow tailing to look at every single file in the hierarchy.
Antonio
With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"
The only way I know to avoid it is to have a monitor for each specific path
[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]
…