Solved: Search to Provide Days in Hot/WarmDB

jodros · ‎03-21-2012

In our environment, we have a CIFS share that is used to store all colddb. Warm is rolled to cold when the hot/warm volume hits 200GB.

maxVolumeDataSizeMB = 204800

I know how much data is being stored locally on the indexers versus the CIFS, but I would like to know how many days worth of data is stored locally on the indexers. I am about to do some basic calculations looking at what we typically index in a day, but that does not factor in the compression that Splunk does.

I was wondering if there is a search that could be constructed that could show the oldest timestamp of data in the warmdb. That would be exactly what I need. The reason I need this is to appease our auditing department which is not comfortable not knowing the exact days stored locally versus on CIFS.

Thanks in advance.

jodros · ‎03-21-2012

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

View solution in original post

jodros · ‎03-21-2012

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

lpolo · ‎03-22-2012

I would recommend to use |dbinspect See the example presented in the next answer.

lpolo · ‎03-21-2012

you can use:

|dbinspect index=name_of_your_index state=warm

Example for the main index:

|dbinspect index=main state=warm

Make sure to select "all time" as your time period in your query.

for more information take a look at the dbinspect command.

Once, you are familiar with the dbinspect command, you would be able to use it to obtain the result set you are looking for. This is an example:

|dbinspect index=main|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(earliestTime) as earliestTime|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(latestTime) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_seconds=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The result set for this query is something like this:

state   earliestTime        latestTime      sizeOnDiskMB    NumberOfBuckets
hot 09/15/2006:18:47:20 03/24/2012:12:00:00 10043.741711    10
warm    02/19/2012:00:00:01 03/21/2012:03:59:00 92646.695278    20
cold    01/03/2006:16:35:20 02/27/2012:17:00:00 204971.245710   586

I prefer to use the dbinspect command. It offers precise information about the state of the buckets found in each index.

Regards,
Lp

gjanders · ‎07-04-2016

For anyone using Splunk 6.4.x or the newer Splunk 6.x series the query changes slightly from lpolo's original answer, the updated query is:
|dbinspect index=main| convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(startEpoch) as earliestTime|convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(endEpoch) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_hours=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The changes are the startEpoch and endEpoch replace the earliestTime and latestTime fields.

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

David · ‎02-04-2014

At least as of 6.0, you can run dbinspect from a search head and it will hit every indexer.

lpolo · ‎03-21-2012

You have to run the command locally. You can create a script to collect this data or login each indexer to run the command via its web interface.

jodros · ‎03-21-2012

We have a distributed environment with 4 indexers. Is dbinspect able to report in a distributed environment? I have read that it is not.

Search to Provide Days in Hot/WarmDB

Splunk Custom Visualizations App End of Life

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk