Getting Data In

Data Input: Monitor a directory for new files and delete when indexed

vivsplunk
Engager

I'm trying to use "Monitor Files & Directories" as data input. I got two Data Input sources,

  1. One is script that runs every 10 min and puts a data file on Splunk file system (/opt/splunk/var/ps_search/)
  2. Second data input is the "Monitor Files & Directories" that basically is supposed to look under the /opt/splunk/var/ps_search directory and index all the incoming files.

The incoming files are of "csv" type and have unique file name (timestamp in the file name). I see only the first csv file getting indexed and not the subsequent ones that are generated by the script. I've read http://answers.splunk.com/questions/4103/directory-monitoring-not-picking-up-new-files and http://www.splunk.com/base/Documentation/latest/Admin/Monitorfilesanddirectories, but not sure what else I need to do. Few Questions,

  1. In the documentation it says the monitor would only check for new files every 24 hours - is that right? How else can I make it to continously look for new files in the directly? Do I need to use crawl?

  2. Is it possible to use monitor to do the above and when the file is indexed delete that file (similar to using sinkhole)?

In my case once a file is copied into the directory it's not changed, so I basically just want to delete it once Splunk has indexed it.

Genti
Splunk Employee
Splunk Employee
  1. No, if the docs say that then they need to be corrected. Splunk monitor actually checks directories every second (unless it's backed up, which might mean a little less often than that)
  2. Yes, you can do one of the following:
    • Use `[batch://]` instead of `[monitor://]`
    • save your script output to /opt/splunk/var/spool/splunk which acts as a sinkhole. It will index everything you want then delete the files

However, i think you should look into why your monitor is not reading the additional csv files that are being created. Check your `splunkd.log` for any logs related to this. Perhaps the files are too similar and you are getting a crccheck issue (where the crc of the files is too similar and splunk doesnt index because it thinks its the same file. Basically the first 250 chars of the files are the same, in this case look for `crcSalt` in `inputs.conf`)

Please read input.conf.spec for more information on [batch://] and crcSalt.

Genti
Splunk Employee
Splunk Employee

spool directory is like big brother, always watching for files being dropped there. Once it reads the file, it deletes it. Think of it as a sinkhole.. You can always try it out, dump some files there and youll experience it first hand. The only issue with this method, i think, is that you cant really specify source, sourcetype, which index the data should go to etc.. I believe 4.2 will have some improvements in this area.

0 Karma

vivsplunk
Engager

Thanks. I added "crcSalt = " in the inputs.conf and it started indexing other csv files. I guess the problem (as you suggested) was that all my csv file start with a header row and the first row (with almost 10 column names) is same. I'm not sure if Splunk should consider csv differently as they can have the same header row. Anyway that part works for now.

My other issue is how do I delete the indexed files and still keep continous inputs. The "batch" option seems to work only one-time. Would writing to spool directory continuously read new files - just like monitor?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...