Hi
We recently had a problem with one type of our indexed log files suddenly being recognized as binary.
This is the message we saw in splunkd.log: WARN FileClassifierManager - Invalid file: /xy/mylog.log, reason: binary.
We don't know why this happened it is an xml log, the format may have changed this type of file did get indexed befor.
So we changed our props.conf on the indexer and added the following parameter for our sources: NO_BINARY_CHECK = true We already had this parameter previusly: CHARSET = ISO-8859-2
We only saw very few events and a lot of warning messages: WARN UTF8Processor - Using charset UTF-8 for events from 'source::xxx|host::xxx|remoteport::33270', as the monitor is believed over the raw text which may be ISO-8859-2
So that made us think, that our config CHARSET setting (and therefore the NO_BINARY_CHECK) were not working. After reading this article in the wiki: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F we moved the props.conf settings to our light forwarders
The ISO-8859-2 warnings disappeared and one log file of that type got indexed, but we have several such files. Some were missing.
I ended up deleting the fishbucket index on the light forwarder and all files are indexed properly now.
So I'm guessing that something in the fishbucket prevented those files from being indexed. After reading this (old) blog post from Andrea Longo: http://blogs.splunk.com/2008/08/14/what-is-this-fishbucket-thing/ I was hoping that I could search the _fishbucket index (on the light forwarder) and remove entries for the files that are not being indexed if I have a similar case in the future.
My first question is: Is this a doable approach or have I missunderstood the problem/Is there a better way to resolve such issues?
My second question is The fishbucket index on all our instances exists, but it is empty (viewing indexes from the Splunk Manager). How do I enable it on the indexer and is it possible to enable it and make it searchable on a SplunkLightForwarder somehow?
Thank you for helping me.
Edit-- Enabling the following debug settings $SPLUNK_HOME/etc/log.cfg helps showing whether new data from a file is detected by splunk category.FileInputTracker=DEBUG category.selectProcessor=DEBUG category.TailingProcessor=DEBUG This is documented in: http://www.splunk.com/wiki/Community:Troubleshooting_Monitor_Inputs
The big problem with the fishbucket stuff that Andrea wrote about is that it does not apply in 4.x and up. It's accurate if you have a 3.x forwarder, but 4.x no longer stores the data in a Splunk index (it wasn't a good idea in the first place, though it was convenient for some purposes), but rather in the splunk_private_db
inside the fishbucket index location. You can kind of examine the data using the $SPLUNK_HOME/bin/btprobe
tool, but it's not that helpful, in particular because we are now only storing the hash and position, and not recording any of the other information that used to be in the fishbucket index.
I think there might be some plans to add back some tools and info to get some of this functionality back, but you might want to file ERs on it.
The big problem with the fishbucket stuff that Andrea wrote about is that it does not apply in 4.x and up. It's accurate if you have a 3.x forwarder, but 4.x no longer stores the data in a Splunk index (it wasn't a good idea in the first place, though it was convenient for some purposes), but rather in the splunk_private_db
inside the fishbucket index location. You can kind of examine the data using the $SPLUNK_HOME/bin/btprobe
tool, but it's not that helpful, in particular because we are now only storing the hash and position, and not recording any of the other information that used to be in the fishbucket index.
I think there might be some plans to add back some tools and info to get some of this functionality back, but you might want to file ERs on it.
All you need to do is change the maxKBps setting in limits.conf
to increase that.
Thank you for the quick reply. The main problem we had was that our forwarder was limited to sending at 256KBps and the Server sometimes needs slightly more than that so sometimes we didn't see any new events for up to almost 30min from some logs. It took us a while, things look better now.
That's a great question Chris. I'm looking forward to a great answer too; I'd like to understand the fishbucket
index better as well.