Getting Data In

Reduce fishbucket size

RicoSuave
Builder

Hello folks,
My forwarders monitor several thousand oracle logs daily that rotate out at a high frequency. As such, my fishbucket index is growing at a steady pace. Currently it sits at 200MB+ on my forwarders. I understand that this is considered small, relatively speaking, but do to policies in place, i can't allow the splunk forwarder to take up this much space on the system it is sitting on. Is there a way to delete records out of the fishbucket and reclaim space? I am well aware that this could lead to reindexing. just an fyi.

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

If you're using a forwarder, you can run 'splunk clean eventdata' from $SPLUNK_HOME/bin and it'll reset the fishbucket as well as any other data you've collected. Since you're not indexing and are aware that it could lead to reindexing, I suppose this is a good option for you. As an aside, the issue of not being able to control the fishbucket size has been raised in SPL-56516 and should be addressed in a future release of the product.

To delete specific entries from btree, you can see this post:

http://splunk-base.splunk.com/answers/54147/how-can-i-trigger-the-re-indexing-of-a-single-file

View solution in original post

jrodman
Splunk Employee
Splunk Employee

In Splunk 6.0+, the btree/fishbucket files have a size ceiling that is maintained. If the fishbucket files grow over a configurable ceiling, they are moved from $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db to $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db/save. We then populate a new, empty btree upon request -- entries we actually use are copied from the 'save' version.

Ultimately this means that your size will be bounded to 2x the ceiling.

If you need to resolve a current problem where the file is very large (let's say 10GB), we will copy your current btree/fishbucket data to 'save', so the space will not be immediately improved. In this case you can resolve your space concerns in the following way:

  • Run your system with 6.0+, and the 'save' subdirectory should be observed with the large file still in it
  • Wait for the system to check every file in the monitored locations, you could validate that it is caught up with an alltime realtime search for that host, or you could just wait a day or so.
  • Stop Splunk
  • Delete, or move out the 'save' subdirectory
  • Start Splunk

At this point your disk usage for btree/fishbucket to be constrained to 2x the limit.

In 6.0.x we use the maxTotalDataSizeMB value for the [fishbucket] index to configure this limit. After 6.0.x+ (next major release) there will be a dedicated configuration in limits.conf for this purpose.

jbsplunk
Splunk Employee
Splunk Employee

If you're using a forwarder, you can run 'splunk clean eventdata' from $SPLUNK_HOME/bin and it'll reset the fishbucket as well as any other data you've collected. Since you're not indexing and are aware that it could lead to reindexing, I suppose this is a good option for you. As an aside, the issue of not being able to control the fishbucket size has been raised in SPL-56516 and should be addressed in a future release of the product.

To delete specific entries from btree, you can see this post:

http://splunk-base.splunk.com/answers/54147/how-can-i-trigger-the-re-indexing-of-a-single-file

Masa
Splunk Employee
Splunk Employee

If you're using UF, you cannot run "splunk clean eventdata' because UF's index database are disabled. You have to stop splunk and delete $SPLUNK_HOME/var/lib/fishbucket directory.

Note that cleaning a fishbucket delete all records which files were monitored how much. So, the UF start to monitor data from the first line in each log file you're monitoring. So, it is a challenge to avoid duplicate events. And, once duplicate events are indexed, it is another challenge to keep one of duplicated events and delete the rest.

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...