Hi Team,
Can someone explain how the Splunk Hadoop Connect export job works.
I have scheduled export job which moves data from Splunk to Hadoop. My schedule is every 5 minutes. I see that job runs every 5 minutes but the data is being exported only once in 30 minutes i.e. I can see the file in the Hadoop every 30 minutes only and so do the transfer statements in HadoopConnect log files.
Questions
1) Where can i find a copy of the export files in search head, if yes, what location and how long the files are retained by default?
In HadoopConnect.log I could see below transfer statement
cli - Args: ['/sbclocal/apps/splunk/hadoop-2.6.0-cdh5.10.1/bin/hadoop', 'fs', '-moveFromLocal', '/apps/splunk/splunk/var/run/splunk/dispatch/1505830828.289/dump/20170919/fehj50a895481caf96c7d7503c340c504_1505828952_1505830752_2_0.csv.gz', 'hdfs://kbc/user/abcd/20170919/fehj50a895481caf96c7d7503c340c504_1505828952_1505830752_2_0.csv.gz.hdfs'
However, I could not locate them @ $splunkhome$/var/run/dispatch/1505830828.289/dump/20170919
2) Will the exported files be available on the search head [or they get deleted after export?], if so how long will they be?
3) Documentation says below statement, what does it mean
As the job runs, the Splunk platform processes chunks of data received from the search and creates compressed files, locally on the search head. These files are moved to HDFS or the mounted file system if they reach 64MB or if cumulatively they consume more than 1GB, or the search finishes successfully.
Link to Documentation -->http://docs.splunk.com/Documentation/HadoopConnect/1.2.1/DeployHadoopConnect/ExporttoHDFS
Any other information regarding its working would be great to know in cases of troubleshooting the issue.
Running it every 5 minutes will not help you if it takes 30 minutes to finish the search
As you highlighted unless the job is larger then 64MB (default) in GZ file, splunk will wait for the Job to finish before creating the GZ file