Splunk Search

Large number of subfolders : Splunk is calling hundreds of statx() on folders not meant to be monitored

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

4.2.3 UF on AIX

I have a folder structure like

/inputs/b/1/2/34/...
/inputs/b/1/2/3
/inputs/b/1/2/35
/inputs/b/1/2/36
/inputs/b/1/2/36/file092.log
/inputs/b/1/2/36/file123.log
/inputs/c/1/2/37/
/inputs/c/1/2/34/...
/inputs/c/1/2/3
/inputs/c/1/2/35
/inputs/c/1/2/36
/inputs/c/1/2/36/file092.log
/inputs/c/1/2/36/file123.log
/inputs/c/1/2/37/
...
/inputs/d/1/2/34/...
/inputs/d/1/2/3
/inputs/d/1/2/35
/inputs/d/1/2/36
/inputs/d/1/2/36/file092.log
/inputs/d/1/2/36/file123.log
/inputs/d/1/2/37/.../
...

Where dots are folders or subfolders, up to thousands nested.
At each level, on each folders many files.
This is on few hundreds of servers.

How do I specify a monitor:// stanza which will not cause Splunk to go and scan every folder (seeing lots statx() called and high cpu, when I use the below:

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
index = main
crcSalt =

So I've tried to force pure regex with

[monitor:///inpu*/[a-zA-Z0-9\/]+file[0-9]+.log]
index = main
crcSalt =

Getting

DEBUG TailingProcessor - Adding implicit whitelist '^/input[^/]*/[a-zA-Z0-9\/]+file[0-9]+.log$' on path 'monitor://'.

According to DEBUG log seems to do the job (not hitting all the unwanted folders) but my files are not picked up - What am I missing and/or is it possible to achieve the desired result on 4.2.3 AIX?

Thanks in advance

Tags (2)
1 Solution

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

View solution in original post

jrodman
Splunk Employee
Splunk Employee

This code is not special to AIX at all, it will be pretty much identical for all UNIX types.

For an input like

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

We have to set up a watch on /inputs, because everything beyond this point is a pattern match.
We do use pcre partial-match testing, so that if we reach a directory that PCRE can tell us will never match, no matter how much additional text is added, we can skip it.

Thus, for example if we find a dir such as

/inputs/q/2

we should be able to skip over this, because no matter how much additional text is added, the 2 will never match the 1 in the regex.

However, I'm a little unclear about the case of

/inputs/q/1/2/3

I think we try to force this to fail to match here by adding a slash after the directory name, but I'm not certain. We might descend into this directory. I would recommend testing locally in a simple setup.

Yann's answer to specify only the exact dirs you want observed will certainly work.

As for your attempted workaround, I think it's a little sketchy to ask Splunk to monitor /. However your regex which gets built out as ^/input[^/]*/[a-zA-Z0-9/]+file[0-9]+.log$ is very permissive. It allows any sequence of dir names that contain only ascii alphanumerics, followed a numbered filename. This regex should allow tailing to look at every single file in the hierarchy.

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...