Hello Team,
splunk/_internaldb/db is indexing high volumes of internal logs in our environment (8-10GB per day). This is something recently started. Earlier, we never had issues with internaldb logs. No change has been done in default settings. The only difference I can see under db logs-- lot of .lock files.
-rw------- 1 uatfxspl fxituser 15 Jan 26 16:50 hot_v1_1944.lock
drwx--x--x 3 uatfxspl fxituser 4096 Jan 26 16:50 db_1453827013_1453826043_1944
drwx--x--x 3 uatfxspl fxituser 4096 Jan 26 17:07 db_1453828033_1453826997_1945
drwx--x--x 3 uatfxspl fxituser 4096 Jan 26 17:24 db_1453829051_1453828016_1946
Please let me know if you need more details.
There are a lot of issues that can cause an increase in _internal volume. A good start to determining the cause of the problem would be to run a few splunk searches:
index=_internal | timechart count by host
index=_internal | timechart count by sourcetype
Run these searches far enough back in time so that you can see when the problem started, and what host or sourcetype is generating most of the data. Then run more specific searches against that host or sourcetype. For example, if you see a large increase in volume from host xyz, find out which log file it is coming from:
index=_internal host=xyz | timechart count by sourcetype
Once you've find the point at which the volume increase started and which host/sourcetype caused it, you can start examining events. Are you seeing the same error repeatedly? Is this a new sourcetype that came from an add-on? Is this from a splunk server or a forwarder? Once you have this information, then we can help diagnose the problem.
There are a lot of issues that can cause an increase in _internal volume. A good start to determining the cause of the problem would be to run a few splunk searches:
index=_internal | timechart count by host
index=_internal | timechart count by sourcetype
Run these searches far enough back in time so that you can see when the problem started, and what host or sourcetype is generating most of the data. Then run more specific searches against that host or sourcetype. For example, if you see a large increase in volume from host xyz, find out which log file it is coming from:
index=_internal host=xyz | timechart count by sourcetype
Once you've find the point at which the volume increase started and which host/sourcetype caused it, you can start examining events. Are you seeing the same error repeatedly? Is this a new sourcetype that came from an add-on? Is this from a splunk server or a forwarder? Once you have this information, then we can help diagnose the problem.
Thanks guys. Appreciate your help. Found the real culprit. One of the forwarder's was keep trying to connect to the indexer and was failing. Hence logging high volumes of error message.
index=_internal | timechart count by sourcetype -- excellent idea ; -)
Have you enabled debug ? which source/sourcetype consumes the major part of _internal index?