Knowledge Management

Is all indexed data stored in the 'defaultdb' folder?

aywong
Path Finder

I have been looking at my 'indexing volume' data on the Splunk server and the volumes don't seem to be matching up in my defaultdb folders.

I have set it so that all hot buckets only last a day, and roll straight into warm buckets. When I check a certain date of indexing volume, it may say that I have indexed 300 MB of data that day, but when I check my warm bucket (in the folders) for that date, it only has 100 MB of data.

The whole defaultdb folder volume does not match up or even reach the total sum outputted from the Splunk 'Indexing Volume' data

why might this happen? should they be matching?

Tags (1)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

All indexed data is not stored in 'defaultdb'. It could be that you're indexing data into other indexes. Defaultdb is the 'main' index, where data is sent if you haven't specified any other index. If you've got a custom index, data will be written to that index, and not 'defaultdb' in $SPLUNK_HOME/var/lib/splunk/. You've also got the internal indexes within Splunk, but that data shouldn't be counted by the indexing volume page.

Keep in mind that warm buckets only contain data that is not going to be written to again, so if you're looking at warm buckets that contain data for a specific time period, there could also be hot buckets that are still being written and haven't yet been rolled to warm.

Data is also compressed when it is written to disk, so although you're charged for 300mb, it isn't a 1:1 storage ratio. Usually the figure that is tossed around is ~50%.

It might also be possible for a person to configure index retention in such a way as to clear out data older than a certain period of time/over a certain size. If that is done, you won't be able to tell by the size of the index about the indexing volume for that day.

For these reasons, I wouldn't recommend that you use a method of looking at physical space used on disk to determine how much size is used. The page you're looking at hits the rest endpoint and that counts the totality of indexed data in non-internal indexes from midnight until the time you view the page.

If you think you're having problems, start with this page:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

View solution in original post

bmacias84
Champion

How many indexes do you have configured? Do you have different paths configured for your hot, warm, cold, and frozen size? Also the raw data is compress, so the index through put may not be the same size as your volumes. What settings have you applied your index.conf? Keep in mind these are settings are applied on an index by index basis. I am figuring you configure each of your indexes with maxHotIdleSecs = 86400.

There are a number of settings to control bucket rotation depending on bucket size or seconds within your indices.

Other settings you might be might want to look at maxHotSpanSecs and rotatePeriodInSecs.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

All indexed data is not stored in 'defaultdb'. It could be that you're indexing data into other indexes. Defaultdb is the 'main' index, where data is sent if you haven't specified any other index. If you've got a custom index, data will be written to that index, and not 'defaultdb' in $SPLUNK_HOME/var/lib/splunk/. You've also got the internal indexes within Splunk, but that data shouldn't be counted by the indexing volume page.

Keep in mind that warm buckets only contain data that is not going to be written to again, so if you're looking at warm buckets that contain data for a specific time period, there could also be hot buckets that are still being written and haven't yet been rolled to warm.

Data is also compressed when it is written to disk, so although you're charged for 300mb, it isn't a 1:1 storage ratio. Usually the figure that is tossed around is ~50%.

It might also be possible for a person to configure index retention in such a way as to clear out data older than a certain period of time/over a certain size. If that is done, you won't be able to tell by the size of the index about the indexing volume for that day.

For these reasons, I wouldn't recommend that you use a method of looking at physical space used on disk to determine how much size is used. The page you're looking at hits the rest endpoint and that counts the totality of indexed data in non-internal indexes from midnight until the time you view the page.

If you think you're having problems, start with this page:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Ayn
Legend

It can definitely achieve that compression ratio.

aywong
Path Finder

I currently am in testing phase, so I have just 1 index configured. I have no different paths for my buckets, just the main. My settings are:basically
maxHotIdleSecs = 86400
maxHotBuckets = 3
maxWarmDBCount = 30
maxDataSize = 1024
frozenTimePeriodInSecs = 2592000and does it compress it over 10x the amount? the volume says I indexed 1044 MB of data on one date, but the bucket for that date is only 106 MB in size

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...