Getting Data In

How can we systematically index many large directories with log files?

ddrillic
Ultra Champion

We have a case in which the client has directories, each containing a couple of thousands of log files, like -

2016120300
2016120301
2016120302
2016120303
2016120304
2016120305
2016120306
2016120307
2016120308
2016120309
2016120310
2016120311
2016120312
2016120313
2016120314
2016120315
2016120316
2016120317
2016120318
2016120319
2016120320
2016120321
2016120322
2016120323

This set of directories holds one day of data as it's organized by YYYYMMDDHH.

The forwarder didn't index all this data when we pointed to it as is and it held two months of data.

We would like now to index one directory at a time, ensure the data is indexed and move on. Is it possible to build a script to do it?

We are trying to avoid setting the monitor stanza for each directory (an hour of data), bounce the forwarder, index the data and move on to the next hour - that's a bit too much.

Any ideas?

Tags (2)
0 Karma

Claw
Splunk Employee
Splunk Employee

The problem is that you have too many file descriptors to manage.


Manage how many files you are working with....

  1. Can you move these logs to another location after you have ingested them?
  2. Can you copy them to another specific location long enough to ingest them?

Allow more file handles….

You should check how many file handles you have configured on your machine. It is possible that you are seeing this problem because you are running out of file handles.

https://answers.splunk.com/answers/13313/how-to-tune-ulimit-on-my-server.html

But be aware that if you use large numbers of file handles, it will take a long time to start the Splunk forwarder. (I have seen an hour in bad scenarios) and a long time to stop the Forwarder.

Setting the ulimit to a very high number could be and acceptable way to get the files loaded initially


Skip looking at older files.

After you get the old files ingested, you could set the following value in the indexes.conf to tell it to ignore files older than some time. This still means that you’re system has to look at all of the file handles to see how old they are.

ignoreOlderThan=1s 1s --> ignores any files older than one sec
ignoreOlderThan=1m 1m --> ignores any files older than one minute
ignoreOlderThan=1h 1h --> ignores any files older than one hour

ignoreOlderThan=1d 1d --> ignores any files older than one day

Set up a scripted input using the excellent Splunk Add-On Builder.

http://blogs.splunk.com/2016/12/07/easily-create-mod-inputs-using-splunk-add-on-builder-2-0-part-iv/

You will be responsible for inventing the logic in the script to make this possible but with this pass you are in control.

0 Karma

ddrillic
Ultra Champion

We are working on the folders/files organization.

For now, we are looking for a way to ingest this set of 1.5 million pending files. I understand from you that the scripted input using the excellent Splunk Add-On Builder is one way.

0 Karma

ddrillic
Ultra Champion

Is there a way to do it from the forwarder's command line without installing a UI?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi ddrillic,
I don't know why your forwarder doesn't indexes all your directories, could you share your inputs.conf file?
Every way, you should have something like this (if your files are *.log):

[monitor:///logs/.../*.log]
index=your_index
sourcetype=your_sourcetype
disable=0

using the three dots you load all logs in all folders of your log folder.
If you have many logs, surely the load process will be slow (also hours), but after the necessary time you'll have all the logs in you index.

Bye.
Giuseppe

0 Karma

ddrillic
Ultra Champion

Thank you, but the thing is that there are roughly 1.6 million files to index, so I'm looking for a gradual approach to index them...

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...