Splunk Search

Splunk 6.4: Hunk + Hive: Inconsistent removal of files in dispatch

burwell
SplunkTrust
SplunkTrust

We have Splunk 6.4 and are using Hunk + Hive. Our jobs produce 100,000+ files in dispatch.

What is the expected behavior of removal of files in dispatch?

I have seen older files in dispatch get removed when I run a new job (yeah!) but not always. More often than not the files stay around. I have to schedule a script to remove the files and sometimes I cannot even keep one days worth of files.

Thanks.

Tags (3)
0 Karma
1 Solution

kschon_splunk
Splunk Employee
Splunk Employee

I assume you are referring to dispatch dirs on HDFS? If so, some of the files in the dispatch dir are deleted when the search completes, but some stay in place, so that the search head can re-read them if necessary.

Once the corresponding dispatch dir on the search head is no longer present, the dispatch dir on HDFS is eligible to be deleted. As you noted, this happens when a new search is run. A "reaper" daemon thread will be launched, which crawls the HDFS dispatch area, looking for dirs that no longer correspond to searches the SH is managing, and deleting them.

Your dispatch directories could be persisting for a couple of reasons:
1) The dispatch dir on the SH still exists. The TTL for a search varies depending on different properties of the search. This blog post has some more info: http://blogs.splunk.com/2012/09/12/how-long-does-my-search-live-default-search-ttl/

2) The reaper thread is a daemon, so it will not outlive the search it is associated with. A short search may not give the reaper enough time to completely delete all expired dispatch dirs.

View solution in original post

ddrillic
Ultra Champion

We saw this issue consistently with "older" versions of Hunk and we ended up setting the dedicated MapR volume to 3/4 of a terabyte. With 6.3.3 the dispatch directory on the HDFS is being kept tiny regardless of the query volume. Maybe something is off with 6.4...

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

I assume you are referring to dispatch dirs on HDFS? If so, some of the files in the dispatch dir are deleted when the search completes, but some stay in place, so that the search head can re-read them if necessary.

Once the corresponding dispatch dir on the search head is no longer present, the dispatch dir on HDFS is eligible to be deleted. As you noted, this happens when a new search is run. A "reaper" daemon thread will be launched, which crawls the HDFS dispatch area, looking for dirs that no longer correspond to searches the SH is managing, and deleting them.

Your dispatch directories could be persisting for a couple of reasons:
1) The dispatch dir on the SH still exists. The TTL for a search varies depending on different properties of the search. This blog post has some more info: http://blogs.splunk.com/2012/09/12/how-long-does-my-search-live-default-search-ttl/

2) The reaper thread is a daemon, so it will not outlive the search it is associated with. A short search may not give the reaper enough time to completely delete all expired dispatch dirs.

burwell
SplunkTrust
SplunkTrust

Thanks Keith. I was referring to files in HDFS.

Since there are so many files, using up many inodes, I was more aware of the files staying around.

Thanks for this information.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...