About techdiverdown

techdiverdown · ‎11-05-2014

Here is a working configuration for Mapr 4.0.1 using MR2 (YARN). In addition, the mapr client needed to be setup correctly on the Splunk client box using configure.sh. [provider:psb_mapr] vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1 vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle vix.family = hadoop vix.fs.default.name = maprfs:/// vix.mapreduce.framework.name = yarn vix.splunk.home.hdfs = /user/splunk vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.xxx.net:8032 vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.xxx.net:8030 vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar vix.splunk.impersonation = 0 vix.env.MAPREDUCE_USER =

techdiverdown · ‎11-03-2014

OK it seems to work now. I changed the HADOOP_HOME setting to this: /opt/mapr/hadoop/hadoop-0.20.2 also the correct file system url is maprfs:///

techdiverdown · ‎11-03-2014

I tried this, same error. I have it set to maprfs://hadoop-dev-00.ngid.centurylink.net:7222 and I still receive the same error.

techdiverdown · ‎10-31-2014

Good Catch, I updated the classpath to this: /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar Same error....

techdiverdown · ‎10-31-2014

1) ON the hadoop nodes, i can run mapr jobs fine. 2) On the splunk node I can see hdfs, and I can stream files from Hunk as well fine. Here is the config from indexes.conf [provider:psb_mapr] vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar vix.env.HADOOP_HOME = /opt/mapr/hadoop/hadoop-2.4.1 vix.env.JAVA_HOME = /usr/lib/jvm/java-7-oracle vix.family = hadoop vix.fs.default.name = maprfs:/// vix.mapreduce.framework.name = yarn vix.splunk.home.hdfs = /user/splunk vix.yarn.resourcemanager.address = hadoop-dev-00.ngid.centurylink.net:8032 vix.yarn.resourcemanager.scheduler.address = hadoop-dev-00.ngid.centurylink.net:8030 vix.yarn.application.classpath = /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/common/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/hdfs/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/lib/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/:/opt/mapr/hadoop/hadoop-2.3.0/share/hadoop/yarn/lib/*

techdiverdown · ‎10-31-2014

Using Mapr Version 4.01 (MR2) and Hunk 6.2. Configured a mapr provider and virtual index. Simple search works (index=my_virtual_index) but when I add a condition and a MR2 job tries to kick off in MapR I get the following error: [psb_mapr] Error while running external process, return_code=255. See search.log for more info [psb_mapr] JobStartException - Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_litf-mom.ip.qwest.net_1414779195.4344_0 ] and [ Does not contain a valid host:port authority: HS_IP:10020 ]

techdiverdown · ‎06-27-2014

Snappy lib 1.0.2, using python-snappy from github. $hadoop version Hadoop 2.3.0-cdh5.0.2 Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8 Compiled by jenkins on 2014-06-09T16:20Z Compiled with protoc 2.5.0 From source with checksum 75596fe27f833e512f27fbdaaa7b0ab This command was run using /usr/lib/hadoop/hadoop-common-2.3.0-cdh5.0.2.jar

techdiverdown · ‎06-27-2014

Dumb question - How do I upload these logs? I cannot paste them into this window.

techdiverdown · ‎06-27-2014

The above command works fine. hadoop fs -ls hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy 14/06/27 15:02:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rwxr-xr-x 3 netflow netflow 67108864 2014-06-27 14:54 hdfs://cloudera-node0:8020/user/netflow/2014-05-24-17-30-01.csv.snappy

techdiverdown · ‎06-25-2014

I have an existing virtual index with some data and it works fine. I decided to compress the data with snappy and I moved this data to another directory in HDFS. I then created a new virtual index to read the compressed data and I get the following: 06-25-2014 10:03:22.810 INFO ERP.psb_cloudera - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 06-25-2014 10:03:22.811 INFO ERP.psb_cloudera - at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123) Please advise how to increase the memory in Hunk, assuming that I use the vix.env.HADOOP_HEAPSIZE. I pushed this to 1024 from 512 and still getting the same error. I could increase more, I am wondering why the needed increase if the compress/decompress is mostly CPU cycles. My original files were 2014-05-24-16-00.01.csv, the new files are 2014-05-24-16-00.01.csv.snappy. The size was about 300 MB per file, ow the size is approx. 60-80 MB per file. By the way the job seems to die within a few seconds even if i remove all but one of the files in the directory. *ADDITIONAL INFO** If I use gzip compression, everything works fine. I will try bzip2 and lzo as well. I believe I want a splitable compression for the file storage in HDFS so that can be seen from this link: http://comphadoop.weebly.com/index.html

techdiverdown · ‎06-23-2014

Cloudera CDH 5.02 and Hunk 6.2 I believe, or whatever is the latest version of both. Anyway, I was trying to turn on snappy compression, which i did from Cloudera, but there are several compression settings that should be pushed from the job level. So here are the configs in the Virtual Index Provider, please verify they are correct. So far I see no performance benefit. vix.mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.SnappyCodec vix.mapreduce.output. fileoutputformat. compress.type = BLOC vix.mapreduce.output.fileoutputformat. compress = true Are these correct and if so, why am I seeing no performance difference? I am processing netflow data, each file is about 300 MB for 15 minutes of netflow data. So using a date range and verbose mode, it takes about 10 minutes to process 94 files X 300MB per file. Note the netflow data is not compresses in HDFS. If these need to be compressed on HDFS I assume LZO or Snappy? Thanks.

techdiverdown · ‎05-27-2014

Using CDH5 (MR2) and Hunk 6.1 on Centos 6.4... I have my netflow ascii data in the HDFS file system in 15 minute increments with each day being a higher level directory and each file having 15 minutes of netflow data. Something like this: /user/netflow/2015-05-25/asciiflow2014-05-25-02-45-01.csv /user/netflow/2015-05-25/asciiflow2014-05-25-03-00-01.csv .. .. /user/netflow/2015-05-26/asciiflow2014-05-26-02-45-01.csv .. Given this I am wondering about the virtual index configuration I have, listed below, is correct? I seem to search the same amount of time no mater what the time period is.... Time Capturing Regex is "/user/netflow/(\d+)-(\d+)-(\d+)/" Time Format is "yyyyMMdd" Time Adjustment is 15 Minutes?? Time Range is 1 day ??

techdiverdown · ‎05-27-2014

I have created a virtual index with CDH5 and Hunk 6.1. A simple query like the following: index=tomnetflow destination_address="71.214.56.38" runs about 28 minutes on our small, 5 node lab cluster with 32GB of memory and 3 HDFS nodes. There is some 35 million netflow records. My question is twofold: 1) When i run the same query over and over, the performance is very linear. Would I expect an index to be created somewhere so subsequent queries run faster? 2) In terms of improving Splunk/Hunk/Hadoop performance, if I segregate the data into directories in HDFS based on date for example (2014-05-26, 2014-05-27) will performance increase (provided i narrow my search to last 24 hours for example)? Thank You.

techdiverdown · ‎02-19-2014

In the configuration section you should have the following: namenode (x.x.x.x:8020) where x.x.x.x is your namenode ip hadoop home (/opt/hadoop-2.2.0 on my centos box, this is the base directory where your hadoop client is installed) java home (/usr/local/java/jdk1.6.0_31/ for me, your should be something like this: C:\Program Files\Java\jre or wherever you root java install is.

Posts	17
Solutions	1
Karma Given	1
Karma Received	4
Member Since	‎08-12-2013

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

Why am I getting a "Failed to start MapReduce Job"...

Hunk Job get OutOfMemory Error

I would like to use compression when submitting th...

Creating Hunk 6.1 Virtual Index

Does Hunk create an index?

Re: Why am I getting a "Failed to start MapReduce ...

Re: Why am I getting a "Failed to start MapReduce ...

Re: Why am I getting a "Failed to start MapReduce ...

Re: Why am I getting a "Failed to start MapReduce ...

Re: Why am I getting a "Failed to start MapReduce ...

Why am I getting a "Failed to start MapReduce Job"...

Re: Hunk Job get OutOfMemory Error

Re: Hunk Job get OutOfMemory Error

Re: Hunk Job get OutOfMemory Error

Hunk Job get OutOfMemory Error

I would like to use compression when submitting th...

Creating Hunk 6.1 Virtual Index

Does Hunk create an index?

Re: Error configuring Hadoop Connect on Windows.