Getting Data In

How to monitor a folder for newest files only? (files with a change date of 2015, and then only monitor changes after that)

hartfoml
Motivator

I want to monitor a folder that has 24 thousand files. I only want to collect data from the files that have a change date of 2015 and then only monitor the changes after that.

I guess I will have to do an initial collection from the files with a change date of this year, then change the collection to only monitor files that change.

Any help would be appreciated

Location for the files is c:\log\data

0 Karma
1 Solution

woodcock
Esteemed Legend

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

View solution in original post

0 Karma

woodcock
Esteemed Legend

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

0 Karma

hartfoml
Motivator

That's what I needed.

I think I can put in the "ingnoreOlderThan" attribute in inputs.conf

Then after collecting all the historical data from 2015 I can change the monitor to only tail files that are created or added or changed. If someone puts in old files then I might get some old info.

Does this sound right

When I change to Monitor using the tail function i should not get any of the older files since they have not changed in years. If the files do somehow change I should be able to capture the files.

OH I think these file are overwritten not appended too. will tail still work with overwrite?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

I think you don't need the tail, simple monitoring will just do fine. Also, you can keep the ignoreOlderThan setting (my previous comment had spelling mistake, so don't copy that) on, as any new file or any change will make the modified date within your ignoreOlderThan limit, so they will get ingested.

I've not use followTail setting but as per caveat in the documentation about it's usage, I wouldn't suggest it using in ongoing fashion.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Not sure if there is any straight forwarder way to do this. You might have to use ingnoreOlderThan attribute in inputs.conf to give a value based on current date to include only the files modified in 2015 (e.g. 259 days as of today) to monitor files. If you don't expect any files with 2015 modified date dropped in future, this should do it.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...