Splunk Search

Search to Provide Days in Hot/WarmDB

jodros
Builder

In our environment, we have a CIFS share that is used to store all colddb. Warm is rolled to cold when the hot/warm volume hits 200GB.

maxVolumeDataSizeMB = 204800

I know how much data is being stored locally on the indexers versus the CIFS, but I would like to know how many days worth of data is stored locally on the indexers. I am about to do some basic calculations looking at what we typically index in a day, but that does not factor in the compression that Splunk does.

I was wondering if there is a search that could be constructed that could show the oldest timestamp of data in the warmdb. That would be exactly what I need. The reason I need this is to appease our auditing department which is not comfortable not knowing the exact days stored locally versus on CIFS.

Thanks in advance.

Tags (2)
0 Karma
1 Solution

jodros
Builder

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

View solution in original post

jodros
Builder

So I got this idea from a Splunk support engineer. He told me that the warmdb buckets are labeled with the start and end times in cron for the data in the warmdb buckets. He also told me that I can see the events from Splunk of moving a warm bucket to cold. So with that in mind, if I search for the most recent roll from warm to cold, then parse out a field that shows the latest time in cron for that data, that could give me an indication of the earliest data in warm. I have constructed a search that breaks this out per index per host. Let me know what is my logic is flawed. I know that there might be a gap between the last date in the newly formed cold bucket and the oldest warm bucket, but it at least gives me an estimate.

index=_internal sourcetype=splunkd bucketmover move completed | rex field=_raw "/opt/splunk/var/lib/splunk/(?<index>\w+)/" | rex field=_raw "/db_(?<time_cron>\d+)_\d+_" | dedup index host | eval time_human=strftime(time_cron, "%m-%d-%Y %H:%M:%S") | table _time host index time_human

lpolo
Motivator

I would recommend to use |dbinspect See the example presented in the next answer.

0 Karma

lpolo
Motivator

you can use:

|dbinspect index=name_of_your_index state=warm

Example for the main index:

|dbinspect index=main state=warm

Make sure to select "all time" as your time period in your query.

for more information take a look at the dbinspect command.

Once, you are familiar with the dbinspect command, you would be able to use it to obtain the result set you are looking for. This is an example:

|dbinspect index=main|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(earliestTime) as earliestTime|convert timeformat=""%m/%d/%Y:%H:%M:%S"" mktime(latestTime) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_seconds=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The result set for this query is something like this:

state   earliestTime        latestTime      sizeOnDiskMB    NumberOfBuckets
hot 09/15/2006:18:47:20 03/24/2012:12:00:00 10043.741711    10
warm    02/19/2012:00:00:01 03/21/2012:03:59:00 92646.695278    20
cold    01/03/2006:16:35:20 02/27/2012:17:00:00 204971.245710   586

I prefer to use the dbinspect command. It offers precise information about the state of the buckets found in each index.

Regards,
Lp

gjanders
SplunkTrust
SplunkTrust

For anyone using Splunk 6.4.x or the newer Splunk 6.x series the query changes slightly from lpolo's original answer, the updated query is:
|dbinspect index=main| convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(startEpoch) as earliestTime|convert timeformat="%m/%d/%Y:%H:%M:%S" mktime(endEpoch) as latestTime|stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as sizeOnDiskMB dc(path) as NumberOfBuckets by state|eval diff_hours=(latestTime-earliestTime)/3600|eval earliestTime=strftime(earliestTime,"%m/%d/%Y:%H:%M:%S")|eval latestTime=strftime(latestTime,"%m/%d/%Y:%H:%M:%S")

The changes are the startEpoch and endEpoch replace the earliestTime and latestTime fields.

David
Splunk Employee
Splunk Employee

At least as of 6.0, you can run dbinspect from a search head and it will hit every indexer.

lpolo
Motivator

You have to run the command locally. You can create a script to collect this data or login each indexer to run the command via its web interface.

0 Karma

jodros
Builder

We have a distributed environment with 4 indexers. Is dbinspect able to report in a distributed environment? I have read that it is not.

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...