We have a forwarder which has 12cpu's and 12 GB memory.
we have not yet set the parallelingeationpipelines.
we have a lot of CSV files ( over 40000) to index almost daily and frequently see delays in the CSV files being indexed.
recently we see it takes close to 6-7 hours for csv files to index, so there is a delay.
in this regard, can we increase the parallel ingestion to 2 , is this cpu/memory sufficient to handle this new setting.
seeking suggestion..
I would do try to move up to 6 pipelines but BY FAR the most important thing is that you are doing fast and efficient deletion of the files. It is probably the case that your files are atomic and are being dropped onto the filesystem in a complete state that never changes, right? If so, be sure that you use [batch]
with move_policy=sinkhole
so that Splunk itself deletes them as it eats them. If you have thousands of files that splunk has to sort through, it will NEVER be able to keep up.
you looks to have enough ressources to increase pipelines (could try with 8 )
additionnaly, I would configure :
MAX_DAYS_AGO (the value depend how the data arrive)
crcsalt (if the files are not renamed, use the )
are your files local to the uf and how do they arrive (there could be a race condition issue here, delaying the collection) ? how do they get purged ?
if the number of files to scan and the delay is still not acceptable after tuning, then you would probably have to rethink how your collection works to improve the situation.
thanks for the inputs . i would look at your options..
i do have crcsalt source and max days ago, but we are not purging the files.
i will use the batch mode to purge them. i believe that should resolve my problem.
I would do try to move up to 6 pipelines but BY FAR the most important thing is that you are doing fast and efficient deletion of the files. It is probably the case that your files are atomic and are being dropped onto the filesystem in a complete state that never changes, right? If so, be sure that you use [batch]
with move_policy=sinkhole
so that Splunk itself deletes them as it eats them. If you have thousands of files that splunk has to sort through, it will NEVER be able to keep up.
Yea, good point, this is exactly what i think i am going through.
the batch mode , that can read and delete looks to be a good option , so that there are only un-indexed files to scan for, ie less number of files.
This is a very good point, let me try on this and get back..
meanwhile, is there any limitation on how many files a forwarder can process comfortably, is there any standard to this, considering i have 12 cpus and 12 GB ram.
3 cores / 2 pipelines is what i have heard which maxes you out around 6.
"Splunk PS" is required to make the pipeline 3 or more.
It is necessary to check the operation of the UF while it is delayed. The following site is helpful.
https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs
If you increase the pipeline, 6 hours may be 3 hours, but this is not a solution.
It is necessary to confirm at which stage the delay is occurring.
https://wiki.splunk.com/Community:HowIndexingWorks
Queue state
host=your_hostname source="*metrics.log*" group=queue
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size)
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size)
| eval fill_perc=round((curr/max)*100,2)
| timechart max(fill_perc) by name useother=false limit=15
Block status
host=your_hostname source="*metrics.log*" group="queue" blocked=true
tailreader status
host=your_hostname source="*metrics*" group=tailingprocessor name="tailreader*"
| eval host=case(isnull(ingest_pipe),host,1=1,host."_".ingest_pipe)
| timechart max(max_queue_size) max(current_queue_size) max(files_queued) sum(new_files_queued) max(fd_cache_size)
batchreader status
source="*metrics*" source="*metrics*" group=tailingprocessor name="batchreader*"
| eval host=case(isnull(ingest_pipe),host,1=1,host."_".ingest_pipe)
| timechart max(max_queue_size) max(current_queue_size) max(files_queued) sum(new_files_queued) max(fd_cache_size)
HiroshiSatoh, Here is the behavior i see, when i add new files for data input , usually csv files.
i see they are not detected under "number of files". it shows blank for a very long time, as suggested, for 6 hours or more . once i see the number of files there ,the indexing is immediate.. and i see the data.
the delay is for forwarder to detect the files. so i was guessing , that there are too many files for splunk to scan and hence this is causing the delay for detecting those files.
as mentioned, once these are detected and i see numbers under "number of files" the indexing happens quickly..
here is where i see issue
splunk website -> Settings -> Data Inputs -> Files & Directories-> Look for my input files-> the "number of files" column is blank -> This stays blank for more than 6 hours -> Once i can see numbers here, the indexing is quick
ill also check the points mentioned by you above..
The general rule of thumb is at least 1.5 cores per pipeline. So 12 cores should be more than sufficient to enable 2 pipelines. Of course this all depends on what else that machine is doing that takes up CPU cores. So have a look at current CPU and Memory consumption and see if there is sufficient capacity left (and of course keep a close eye on it after adding the extra pipeline).
To what extend it will resolve your problem is a valid question, but that is a separate topic altogether.
Hi Frank, Thank you for your input, that was good info ... I still have to do some more reading before implementing this..
How is the CSV file generated? If one file size is large and processing is delayed, it is not effective even if two pipes are made.
Please check the process that is really delayed using DMC.
Pretty new to DMC, could you help where i should start to look for CSV indexing delay issue.. Too many options to look. The CSV files arent large in size , but large in number.
or can you direct me to a DMC tutorial/document
Thanks for the document link