Getting Data In

Monitoring FTP gz files

alextsui
Path Finder

Hi,
I am planning a Splunk deployment that involves indexing large number of gz files FTP from multiple sources.
Can I configure Splunk to monitor the directories containing these files directly?
My concern is that the directory I am thinking to have Splunk monitored is the FTP upload folder, and because the files are in the gz format. Will Splunk be confused when it sees a gz file still in the middle of transfer? Can I monitor the upload folder directly with Splunk? The plan is to use universal forwarder for the monitoring.

Thanks

Tags (3)
0 Karma
1 Solution

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

View solution in original post

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

Takajian
Builder

Splunk uncompress gz file before indexing it, then splunk index the uncompressed text file. You can monitor the FTP upload folder directly and Splunk index the uploaded file. But I do not recommend it. If ftp transferring have netowrk connectivity issue, ftp client try to resend the file. However splunk can not distingush the file is new one or already index one. Since splunk recognize the file with hash algorithm, splunk does not understand the gz file is new one or not. It means there is possibiliy for splunk to index duplicated events. I faced this issue before. So, I do not recommend.

0 Karma

alextsui
Path Finder

Thanks for your reply. How does the gz file applied in your answer? I mean if the files were regular text files, could I monitor the FTP upload folder directly provided that there would be no special file extension to distinguish from completed files and files in transit.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...