I'm trying to figure out how much of an additional delay Hadoop Connect would add to my existing Splunk log latency to get data into Hadoop. I.e., Most of my logs are available in Splunk within seconds of being created. How long before they would be available in Hadoop? Minutes/Hours/Days?
Thank you
Using Hadoop Connect Export, every 5 minutes is the minimum frequency allowed.
So at the minimum - every 5 minutes a search will start .. As the job runs, Splunk processes chunks of data received from the search and creates compressed files, locally on the search head. These files are moved to HDFS if they reach 64MB size or if cumulatively they consume more than 1GB, or the search completes successfully.
Therefore, for a short search with little results I would say maybe every 6 minutes you will get a new file into HDFS. For a larger results, it will take longer for the file to get upto 64MB and to move the 64MB into HDFS.
Using Hadoop Connect Export, every 5 minutes is the minimum frequency allowed.
So at the minimum - every 5 minutes a search will start .. As the job runs, Splunk processes chunks of data received from the search and creates compressed files, locally on the search head. These files are moved to HDFS if they reach 64MB size or if cumulatively they consume more than 1GB, or the search completes successfully.
Therefore, for a short search with little results I would say maybe every 6 minutes you will get a new file into HDFS. For a larger results, it will take longer for the file to get upto 64MB and to move the 64MB into HDFS.