FTP download is the only way this particular system is allowing us to access its logs. Files are dumped into the FTP area and then never changed, so they are perfect for a Splunk batch folder. But, we need to get them over to the Splunk box first.
Does anyone have a scripted input for monitoring a FTP site? I assume it would have to keep a listing of files already downloaded, but also keep an eye on that listing so it doesn't get too large.
If you must access it via FTP, a couple of solution may work. Both of these share in common that they "mount" an FTP server as local disk via a "drive" or "mountpoint"
http://curlftpfs.sourceforge.net/ (for linux)
http://www.webdrive.com/products/webdrive/index.html (for windows/mac)
It is unknown whether these are sufficient to Splunk in order to support a monitor://
input - but I think they would work with a batch input just fine.
Anthony solution works well if you have the ability to install the spunk universal forwarder on the server. It keeps track of the files it has sent you for indexing.
Another option to to monitor the log files remotely and basically achieve the same functionality with out installing additional software.
As for the other part of your question with monitoring what's been downloaded, splunk will keep track of whatcit ingested. As for watching the size, I had a similar problem that I solved with a small script that I kicked off every night at one minute passes midnight to delete previous days files.
That is actually what I do. I ready the files and with a script delete everything that is over 1 day old at one past midnight every day.
I had been planning on using a Splunk Batch directory to immediately index and delete downloaded files, but I guess I could use a Monitor directory instead and remove old files periodically.
I still need to have a script that doesn't re-download every file every time. Does anyone have any examples?
Why not put a splunk application on the box sending the data back to the central splunk server? Install the application, turn it to a light forwarder, and configure it to watch the folder.
I am a fan of the Splunk Forwarder. However, the data is generated on a closed system (no OS access) and FTP download is the only way we have to gather this data.