All Apps and Add-ons

Hadoop Connect with Apache Hadoop 1.0.3

ramonpin
New Member

I'm configuring my cluster on latest version of Hadoop Connect following the video on that application page on splunkbase : http://www.splunk.com/view/SP-CAAAHBZ

Even while the Hadoop version I'm using is the same as the one used on that video I'm getting an error when trying to save the cluster configuration.

After filling in all the cluster information I'm getting a "Failed to get remote Hadoop version (namenode=headnode, port=50070): 'Version' keyword is not found."

I'm running on CentOS 5.5.

Is there any known reason for this?

Thanks,
Ramón Pin

0 Karma

ramonpin
New Member

Hi every one. Finally this problem seems to be resolved. Our hadoop machines are not listed on our DNS, we are using /etc/hosts to asign a name to them. It seems that the aplication is issuing a DNS request for the machine name and not getting the name from /etc/hosts. All the Hadopo commands and processes use /etc/hosts normally. We have configured the Hadoop URL using headnode's IP and it register the cluster.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please file a support case and include a diag so we can take a look at the log files as well?

0 Karma

kosako2007
New Member

We'll do it as soon as we can. Thank you for your support.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Hadoop Connect tries to find the version of the cluster by using jmx or a fallback mechanism. What does this URL return in your evinronment:

 http://[namenode-host]:50070/jmx?qry=*adoop:service=NameNode,name=NameNodeInfo
0 Karma

ramonpin
New Member

{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 27,
"HostName" : "headnode",
"Used" : 362116714496,
"Version" : "1.0.3, r1335192",
"Total" : 570697924608,
"UpgradeFinalized" : true,
"Free" : 175745486848,
"Safemode" : "",
"NonDfsUsedSpace" : 32835723264,
"PercentUsed" : 63.451557,
"PercentRemaining" : 30.794834,
"TotalBlocks" : 4495,
"TotalFiles" : 7345,
...}

I cut the result to fit the comment's size.

0 Karma

kosako2007
New Member

I tried with nc from Splunk's machine:

$ nc -vz headnode 50070
Connection to headnode 50070 port [tcp/*] succeeded!

Also tried hdfs access as described on the video-tutorial:

$ /hadoop-1.0.3/bin/hadoop dfs -ls /

drwxr-xr-x - hadoop supergroup 0 2012-05-21 13:29 /_distcp_logs_y8txnu

drwxr-xr-x - hadoop supergroup 0 2012-07-25 09:00 /benchmarks

drwxr-xr-x - hadoop supergroup 0 2013-03-11 16:05 /user

0 Karma

csharp_splunk
Splunk Employee
Splunk Employee

You've verified you have connectivity to that host on that port? No firewall issues? Setup is generally pretty straightforward. I've not seen that error previously, so I'm guessing this is due to a basic environment issue like connectivity. By verified connectivity, I mean using telnet or nc to verify you can actually open a TCP connection to that host and port.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...