Getting Data In

Monitoring large number of files

joonradley
Path Finder

We have a server that generates 100k log files a day. The logs must be forwarded to an indexer. Due to the critical nature of the server we can only install a light forwarder. The files only need to be loaded once monitoring is not needed.

Using monitor slows down the server to a crawl and we cannot use BATCH as the data must be preserved. Sadly we cannot copy the files to another directory for BATCH input.

Tried using fschange, but it does not forward the actual files to the indexer when sendCookData=false.

Any ideas?

Tags (1)
0 Karma
1 Solution

brianirwin
Path Finder

Using monitor I would issue is the time_before_close, this exists to tell Splunk to not close a file until x seconds after the last write. Default value for this is 3 seconds, and with only 86400 seconds in a day just opening and closing 100K files uses up more time than you have.

Looking at the manual it seems when you override this for monitor in inputs.conf you can only set to an integer, so even if you go to 1, you will be in trouble.

You could try setting it to time_before_close = 1, but if you have 100K files you are still going to take longer than you want.

To the earlier point you may need to tarball, or cat x number of files together and send to a separate directory where you sinkhole/batch them or do anything to reduce the number of files to be eaten. If nothing else I think your inode tables will thank you if you can combine some of these files.

View solution in original post

eashwar
Communicator

hello you got a spell error!!

sendCookedData = false

i am leaning splunk!! i set up an forwarder and indexer working perfectly. the forwarded logs get indexed in the MAIN index which is default.

i want to know how to index the data in a custom index.

thanks in advance

0 Karma

stefandagerman
Path Finder

How about you create your own topic, given the completely different nature of your question, once you have determined the the Splunk documentation at http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf does not provide the answer to your question?

Please don't hijack threads as it is unlikely that you will get a response.

0 Karma

brianirwin
Path Finder

Using monitor I would issue is the time_before_close, this exists to tell Splunk to not close a file until x seconds after the last write. Default value for this is 3 seconds, and with only 86400 seconds in a day just opening and closing 100K files uses up more time than you have.

Looking at the manual it seems when you override this for monitor in inputs.conf you can only set to an integer, so even if you go to 1, you will be in trouble.

You could try setting it to time_before_close = 1, but if you have 100K files you are still going to take longer than you want.

To the earlier point you may need to tarball, or cat x number of files together and send to a separate directory where you sinkhole/batch them or do anything to reduce the number of files to be eaten. If nothing else I think your inode tables will thank you if you can combine some of these files.

Genti
Splunk Employee
Splunk Employee

perhaps you could tarball the files into a .gz and have splunk monitor that instead.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...