Splunk Search

Large number of subfolders : Splunk is calling hundreds of statx() on folders not meant to be monitored

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

4.2.3 UF on AIX

I have a folder structure like

/inputs/b/1/2/34/...
/inputs/b/1/2/3
/inputs/b/1/2/35
/inputs/b/1/2/36
/inputs/b/1/2/36/file092.log
/inputs/b/1/2/36/file123.log
/inputs/c/1/2/37/
/inputs/c/1/2/34/...
/inputs/c/1/2/3
/inputs/c/1/2/35
/inputs/c/1/2/36
/inputs/c/1/2/36/file092.log
/inputs/c/1/2/36/file123.log
/inputs/c/1/2/37/
...
/inputs/d/1/2/34/...
/inputs/d/1/2/3
/inputs/d/1/2/35
/inputs/d/1/2/36
/inputs/d/1/2/36/file092.log
/inputs/d/1/2/36/file123.log
/inputs/d/1/2/37/.../
...

Where dots are folders or subfolders, up to thousands nested.
At each level, on each folders many files.
This is on few hundreds of servers.

How do I specify a monitor:// stanza which will not cause Splunk to go and scan every folder (seeing lots statx() called and high cpu, when I use the below:

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
index = main
crcSalt =

So I've tried to force pure regex with

[monitor:///inpu*/[a-zA-Z0-9\/]+file[0-9]+.log]
index = main
crcSalt =

Getting

DEBUG TailingProcessor - Adding implicit whitelist '^/input[^/]*/[a-zA-Z0-9\/]+file[0-9]+.log$' on path 'monitor://'.

According to DEBUG log seems to do the job (not hitting all the unwanted folders) but my files are not picked up - What am I missing and/or is it possible to achieve the desired result on 4.2.3 AIX?

Thanks in advance

Tags (2)
1 Solution

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

View solution in original post

jrodman
Splunk Employee
Splunk Employee

This code is not special to AIX at all, it will be pretty much identical for all UNIX types.

For an input like

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

We have to set up a watch on /inputs, because everything beyond this point is a pattern match.
We do use pcre partial-match testing, so that if we reach a directory that PCRE can tell us will never match, no matter how much additional text is added, we can skip it.

Thus, for example if we find a dir such as

/inputs/q/2

we should be able to skip over this, because no matter how much additional text is added, the 2 will never match the 1 in the regex.

However, I'm a little unclear about the case of

/inputs/q/1/2/3

I think we try to force this to fail to match here by adding a slash after the directory name, but I'm not certain. We might descend into this directory. I would recommend testing locally in a simple setup.

Yann's answer to specify only the exact dirs you want observed will certainly work.

As for your attempted workaround, I think it's a little sketchy to ask Splunk to monitor /. However your regex which gets built out as ^/input[^/]*/[a-zA-Z0-9/]+file[0-9]+.log$ is very permissive. It allows any sequence of dir names that contain only ascii alphanumerics, followed a numbered filename. This regex should allow tailing to look at every single file in the hierarchy.

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...