All Apps and Add-ons

Hadoop Connect - how to change the polling frequency of HDFS file lists, to a longer duration.?

splunkears
Path Finder

I'm using HadoopConnect. But, its creating pressure on name node with too many frequent requests for listing of files with -lsr recursive. How do we change the frequency to say every 5 or 10 minutes than every minute.
I'm looking for something similar to "auto" in Hadoop DB connect / settings. where we can configure the poll frequency.

Thanks.

0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

I'd recommend that you follow the docs on modular inputs and then follow the usage/definition of "whitelist"/"blacklist". Just like with any other default app resouce changes you'd have to be careful during an upgrade of the app, as the new version would overwrite any changes you might have made.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

splunkears
Path Finder

Hi,Thanks for the answer. Yes, your assumption is correct. Its coming from indexed HDFS input folder.

Regarding the fix, could you please suggest, what changes need to be made to introduce a sleep time variable PER indexed HDFS input folder?

Thanks again.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...