All Apps and Add-ons

Hadoop Connect - how to change the polling frequency of HDFS file lists, to a longer duration.?

splunkears
Path Finder

I'm using HadoopConnect. But, its creating pressure on name node with too many frequent requests for listing of files with -lsr recursive. How do we change the frequency to say every 5 or 10 minutes than every minute.
I'm looking for something similar to "auto" in Hadoop DB connect / settings. where we can configure the poll frequency.

Thanks.

0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

I'd recommend that you follow the docs on modular inputs and then follow the usage/definition of "whitelist"/"blacklist". Just like with any other default app resouce changes you'd have to be careful during an upgrade of the app, as the new version would overwrite any changes you might have made.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

splunkears
Path Finder

Hi,Thanks for the answer. Yes, your assumption is correct. Its coming from indexed HDFS input folder.

Regarding the fix, could you please suggest, what changes need to be made to introduce a sleep time variable PER indexed HDFS input folder?

Thanks again.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...