Getting Data In

Monitoring FTP gz files

alextsui
Path Finder

Hi,
I am planning a Splunk deployment that involves indexing large number of gz files FTP from multiple sources.
Can I configure Splunk to monitor the directories containing these files directly?
My concern is that the directory I am thinking to have Splunk monitored is the FTP upload folder, and because the files are in the gz format. Will Splunk be confused when it sees a gz file still in the middle of transfer? Can I monitor the upload folder directly with Splunk? The plan is to use universal forwarder for the monitoring.

Thanks

Tags (3)
0 Karma
1 Solution

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

View solution in original post

Takajian
Builder

It depends on your environment if you can monitor the upload folder directly with splunk or not.

If the file under transferring via ftp have contains file extension like ".inprogress", you can avoid to index the file until the ftp transferring is done. Please use bracklist setting like as bellow for such a case.

In inputs.conf put a "blacklist = .inprogress$"

If the file under transferring via ftp does not have contains any extra extension like above, you should not monitor the file directly. In such a case, you need to move uploaded file to monitored directory by using script outside of splunk.

Hope this help.

Takajian
Builder

Splunk uncompress gz file before indexing it, then splunk index the uncompressed text file. You can monitor the FTP upload folder directly and Splunk index the uploaded file. But I do not recommend it. If ftp transferring have netowrk connectivity issue, ftp client try to resend the file. However splunk can not distingush the file is new one or already index one. Since splunk recognize the file with hash algorithm, splunk does not understand the gz file is new one or not. It means there is possibiliy for splunk to index duplicated events. I faced this issue before. So, I do not recommend.

0 Karma

alextsui
Path Finder

Thanks for your reply. How does the gz file applied in your answer? I mean if the files were regular text files, could I monitor the FTP upload folder directly provided that there would be no special file extension to distinguish from completed files and files in transit.

0 Karma
Get Updates on the Splunk Community!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...