All Apps and Add-ons

Hadoop Connect with Apache Hadoop 1.0.3

ramonpin
New Member

I'm configuring my cluster on latest version of Hadoop Connect following the video on that application page on splunkbase : http://www.splunk.com/view/SP-CAAAHBZ

Even while the Hadoop version I'm using is the same as the one used on that video I'm getting an error when trying to save the cluster configuration.

After filling in all the cluster information I'm getting a "Failed to get remote Hadoop version (namenode=headnode, port=50070): 'Version' keyword is not found."

I'm running on CentOS 5.5.

Is there any known reason for this?

Thanks,
Ramón Pin

0 Karma

ramonpin
New Member

Hi every one. Finally this problem seems to be resolved. Our hadoop machines are not listed on our DNS, we are using /etc/hosts to asign a name to them. It seems that the aplication is issuing a DNS request for the machine name and not getting the name from /etc/hosts. All the Hadopo commands and processes use /etc/hosts normally. We have configured the Hadoop URL using headnode's IP and it register the cluster.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please file a support case and include a diag so we can take a look at the log files as well?

0 Karma

kosako2007
New Member

We'll do it as soon as we can. Thank you for your support.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Hadoop Connect tries to find the version of the cluster by using jmx or a fallback mechanism. What does this URL return in your evinronment:

 http://[namenode-host]:50070/jmx?qry=*adoop:service=NameNode,name=NameNodeInfo
0 Karma

ramonpin
New Member

{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 27,
"HostName" : "headnode",
"Used" : 362116714496,
"Version" : "1.0.3, r1335192",
"Total" : 570697924608,
"UpgradeFinalized" : true,
"Free" : 175745486848,
"Safemode" : "",
"NonDfsUsedSpace" : 32835723264,
"PercentUsed" : 63.451557,
"PercentRemaining" : 30.794834,
"TotalBlocks" : 4495,
"TotalFiles" : 7345,
...}

I cut the result to fit the comment's size.

0 Karma

kosako2007
New Member

I tried with nc from Splunk's machine:

$ nc -vz headnode 50070
Connection to headnode 50070 port [tcp/*] succeeded!

Also tried hdfs access as described on the video-tutorial:

$ /hadoop-1.0.3/bin/hadoop dfs -ls /

drwxr-xr-x - hadoop supergroup 0 2012-05-21 13:29 /_distcp_logs_y8txnu

drwxr-xr-x - hadoop supergroup 0 2012-07-25 09:00 /benchmarks

drwxr-xr-x - hadoop supergroup 0 2013-03-11 16:05 /user

0 Karma

csharp_splunk
Splunk Employee
Splunk Employee

You've verified you have connectivity to that host on that port? No firewall issues? Setup is generally pretty straightforward. I've not seen that error previously, so I'm guessing this is due to a basic environment issue like connectivity. By verified connectivity, I mean using telnet or nc to verify you can actually open a TCP connection to that host and port.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...