All Apps and Add-ons

Why does Splunk use FileContext for monitoring Hadoop?

rangasv
New Member

Hi,

From Splunk User Manual for Hadoop App, I found that Splunk is using FileContext to get the metrics from Hadoop services.

When JMX also provides the same metrics, why does Splunk use FileContext?

As FileContext requires parsing of the log data, it may provide poor performance when compared to JMX.

Could you please clarify?

Thanks,

Ranga

0 Karma

pierre4splunk
Splunk Employee
Splunk Employee

Hi Ranga,

For HadoopOps, we evaluated several approaches for Metrics collection: JMX client, HTTP endpoints, custom SplunkContext, and Hadoop's FileContext. We went with FileContext in version 1.0 based on performance and resource footprint, ease of setup, and stable behavior across versions:

  • Performance/Footprint - overhead from metrics logs was negligible since forwarders were already processing other log files. Other tradeoffs were inconclusive (e.g. disk space vs. extra process invocation)

  • Ease of Setup - FileContext was the simplest to set up, especially for admins unfamiliar with JMX stack. Enabling JMX / HTTP required another remote interface to be securely accessed, adding overhead for pilot projects

  • Stability - FileContext was more stable than other approaches. Hadoop JMX was missing the mbean for "mapred" metrics in early releases, while the HTTP servlet broke in 0.20.203 until it was replaced by a different endpoint in 0.20.205. SplunkContext was ruled out for the same reasons; GangliaContext broke more than once as the metrics system was overhauled between Hadoop Metrics and Hadoop Metrics2.

IMPORTANT:
The assumptions above may have changed since 1.0, and may not apply for your Hadoop operations! The most successful customers use the HadoopOps App as a starting point for configuration; Splunk can easily support metrics collection using any approach you prefer...

rangasv
New Member

Thanks for the detailed answer.

  • Ranga
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...