Solved: Reduce fishbucket size

RicoSuave · ‎05-03-2013

Hello folks,
My forwarders monitor several thousand oracle logs daily that rotate out at a high frequency. As such, my fishbucket index is growing at a steady pace. Currently it sits at 200MB+ on my forwarders. I understand that this is considered small, relatively speaking, but do to policies in place, i can't allow the splunk forwarder to take up this much space on the system it is sitting on. Is there a way to delete records out of the fishbucket and reclaim space? I am well aware that this could lead to reindexing. just an fyi.

jbsplunk · ‎05-03-2013

If you're using a forwarder, you can run 'splunk clean eventdata' from $SPLUNK_HOME/bin and it'll reset the fishbucket as well as any other data you've collected. Since you're not indexing and are aware that it could lead to reindexing, I suppose this is a good option for you. As an aside, the issue of not being able to control the fishbucket size has been raised in SPL-56516 and should be addressed in a future release of the product.

To delete specific entries from btree, you can see this post:

http://splunk-base.splunk.com/answers/54147/how-can-i-trigger-the-re-indexing-of-a-single-file

View solution in original post

jrodman · ‎12-05-2013

In Splunk 6.0+, the btree/fishbucket files have a size ceiling that is maintained. If the fishbucket files grow over a configurable ceiling, they are moved from $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db to $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db/save. We then populate a new, empty btree upon request -- entries we actually use are copied from the 'save' version.

Ultimately this means that your size will be bounded to 2x the ceiling.

If you need to resolve a current problem where the file is very large (let's say 10GB), we will copy your current btree/fishbucket data to 'save', so the space will not be immediately improved. In this case you can resolve your space concerns in the following way:

Run your system with 6.0+, and the 'save' subdirectory should be observed with the large file still in it
Wait for the system to check every file in the monitored locations, you could validate that it is caught up with an alltime realtime search for that host, or you could just wait a day or so.
Stop Splunk
Delete, or move out the 'save' subdirectory
Start Splunk

At this point your disk usage for btree/fishbucket to be constrained to 2x the limit.

In 6.0.x we use the maxTotalDataSizeMB value for the [fishbucket] index to configure this limit. After 6.0.x+ (next major release) there will be a dedicated configuration in limits.conf for this purpose.

jbsplunk · ‎05-03-2013

If you're using a forwarder, you can run 'splunk clean eventdata' from $SPLUNK_HOME/bin and it'll reset the fishbucket as well as any other data you've collected. Since you're not indexing and are aware that it could lead to reindexing, I suppose this is a good option for you. As an aside, the issue of not being able to control the fishbucket size has been raised in SPL-56516 and should be addressed in a future release of the product.

To delete specific entries from btree, you can see this post:

http://splunk-base.splunk.com/answers/54147/how-can-i-trigger-the-re-indexing-of-a-single-file

Masa · ‎05-03-2013

If you're using UF, you cannot run "splunk clean eventdata' because UF's index database are disabled. You have to stop splunk and delete $SPLUNK_HOME/var/lib/fishbucket directory.

Note that cleaning a fishbucket delete all records which files were monitored how much. So, the UF start to monitor data from the first line in each log file you're monitoring. So, it is a challenge to avoid duplicate events. And, once duplicate events are indexed, it is another challenge to keep one of duplicated events and delete the rest.

Reduce fishbucket size

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life