Getting Data In

monitoring files - how does splunk count the size?

Vladimir
Path Finder

Hi,

I've configured a directory for monitoring in inputs.conf ([monitor://path_to_dir]) and separated index for this folder several days ago. Everything is ok except one thing... the total size of files is ~500 Mb but splunk shows (in index activity->index volume) that it indexing ~800 Mb per hour ... how is it possible? There is 10 Mb of new logs/day only. Does splunk resend the whole file if it has been changed (even if added 1 row)?
The total amount of events is ~800-900 per 1 hour. My rsyslog index with ~12-15 000 events/h is increased ~100 Mb/h only.

The same situation I have for one more monitored folder.
splunk v4.3,

Tags (1)
0 Karma
1 Solution

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

View solution in original post

Lamar
Splunk Employee
Splunk Employee

It shouldn't resend the whole file again. It should only send the parity of the file.

Keep in mind that when you monitor to path, are you doing a recursive search of the directory and all directories below (this is the default behavior)?

Additionally, are there files buried deep in that directory that might be causing your file size to blow up?

Without having intimate knowledge of your environment, I'm having to hypothesize about what might be occurring here.

Lamar
Splunk Employee
Splunk Employee

Keep in mind that your whitelist/blacklist needs to be in regex form. So, you would want:


whitelist = \.log$
blacklist = \.zip$

This should work a bit better for what you're trying to accomplish.

0 Karma

Vladimir
Path Finder

hm.. it doesn't work
I can still see in _internal index splunk is polling the data from archive. Current configuration is:

followTail = 1
recursive = false
disabled = 0
whitelist = *.log
blacklist = *.zip #tried to exclude somehow zip files 🙂

0 Karma

Lamar
Splunk Employee
Splunk Employee

It would in fact.

0 Karma

Vladimir
Path Finder

will recursive = false help in this case?

0 Karma

Vladimir
Path Finder

there is no any subfolders but I figured out there are several archive files (*.zip with old files) and looks like (in metrics.log) splunk unzipped it and indexed... arrrhh

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...