All Apps and Add-ons

Splunk Hadoop Connect --Detailed explanation of how the scheduled export job works?

newbie2tech
Communicator

Hi Team,

Can someone explain how the Splunk Hadoop Connect export job works.

I have scheduled export job which moves data from Splunk to Hadoop. My schedule is every 5 minutes. I see that job runs every 5 minutes but the data is being exported only once in 30 minutes i.e. I can see the file in the Hadoop every 30 minutes only and so do the transfer statements in HadoopConnect log files.

Questions

1) Where can i find a copy of the export files in search head, if yes, what location and how long the files are retained by default?

In HadoopConnect.log I could see below transfer statement

cli - Args: ['/sbclocal/apps/splunk/hadoop-2.6.0-cdh5.10.1/bin/hadoop', 'fs', '-moveFromLocal', '/apps/splunk/splunk/var/run/splunk/dispatch/1505830828.289/dump/20170919/fehj50a895481caf96c7d7503c340c504_1505828952_1505830752_2_0.csv.gz', 'hdfs://kbc/user/abcd/20170919/fehj50a895481caf96c7d7503c340c504_1505828952_1505830752_2_0.csv.gz.hdfs'

However, I could not locate them @ $splunkhome$/var/run/dispatch/1505830828.289/dump/20170919

2) Will the exported files be available on the search head [or they get deleted after export?], if so how long will they be?

3) Documentation says below statement, what does it mean

As the job runs, the Splunk platform processes chunks of data received from the search and creates compressed files, locally on the search head. These files are moved to HDFS or the mounted file system if they reach 64MB or if cumulatively they consume more than 1GB, or the search finishes successfully.

Link to Documentation -->http://docs.splunk.com/Documentation/HadoopConnect/1.2.1/DeployHadoopConnect/ExporttoHDFS

Any other information regarding its working would be great to know in cases of troubleshooting the issue.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

Running it every 5 minutes will not help you if it takes 30 minutes to finish the search
As you highlighted unless the job is larger then 64MB (default) in GZ file, splunk will wait for the Job to finish before creating the GZ file

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...