Getting Data In

Tailing processor and rsync (dot files) - blacklist?

howyagoin
Contributor

Easy one, this, but I can't seem to get it right.

I'm monitoring a series of directories which are rsync'd from other servers. Splunk, being ever so efficient, is managing to index the . files that rsync creates, as well as the files after they arrive. This has resulted in rather a lot of unnecessary data.

The answer, to me, should be either whitelists or blacklists.

For one of the directories, I can whitelist, as the files are all "blah.log" and thus "blah.log$" should work fine.

However, in other directories the files are named all sorts of things, and there's no easy regex to whitelist. So a blacklist should do the trick. But I can't seem to get a regex working for "any file starting with a ."

Hints?

Tags (1)
0 Karma
1 Solution

howyagoin
Contributor

There was something definitely amiss with the ability to parse recursive directories and use whitelist/blacklists, so I've gone ahead and created a monitor stanza in my inputs.conf for each of the 8 files. That was the only thing that got Splunk to actually show the content of those files in a query.

View solution in original post

0 Karma

howyagoin
Contributor

There was something definitely amiss with the ability to parse recursive directories and use whitelist/blacklists, so I've gone ahead and created a monitor stanza in my inputs.conf for each of the 8 files. That was the only thing that got Splunk to actually show the content of those files in a query.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee
blacklist = /\.[^/]+$

should do it

southeringtonp
Motivator

What does your current regex look like? Make sure you're not forgetting to put a slash in front of the dot, or it will think it's a wildcard.

Have you tried just:

blacklist=^\.

(For older versions of Splunk, use _blacklist instead of blacklist)

0 Karma

howyagoin
Contributor

I've put in gkanapathy's for now, but, I think something is wrong with my whitelist -- is there any potential interaction between whitelists and monitoring directories which have sub-directories (and it's in the sub-directories where my files are)?

I now have:


[monitor:///Volumes/A/b/c]
crcSalt = <SOURCE>
disabled = false
followTail = 0
host = strawberry
index = submarine
whitelist = submarine\.out$
sourcetype = log4j

However, my files are actually located in:

/Volumes/A/b/c/cluster3/data/instance/box-4/logs
/Volumes/A/b/c/cluster2/data/instance/box-3/logs
/Volumes/A/b/c/cluster2/data/instance/box-1/logs

And so on. A list of about 8 or so locations, but, since they're all under "c" I just pointed Splunk at that.

According to the inputstatus Tailing Processor URL, it's found "c" and some files in "c" which did not match the whitelist, but there's no indication that data in the rest of the path, and it's definitely not in the index (yesterday's data is, before I made this whitelist change).

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...