Getting Data In

How to list all indexes that shows Time, Index Name, Size and NumOfEvents for each index?

flee
Path Finder

Hi, I'd like to get a list of all indexes that shows the data in the following format for a given time span such as last 7 days:

_time indexName IndexedVolumeSizeInMBofTheDay NumOfEventsOfTheDay

For example:
2015-11-20 myIndex-A 1234 1000
2015-11-20 myIndex-B 567 300
2015-11-20 myIndex-X 543 250
...
2015-11-21 myIndex-A 9876 2000
2015-11-21 myIndex-B 3542 341
2015-11-21 myIndex-X 18332 6723
...
I found the following search on this site, but the output of the list has limited columns, max 13 cols only?, and it doesn't show all indexes. We have over 140+ indexes! Is there a way to make this search list the output in above format or something similar and show all indexes?

index=_internal source=*metrics.log group=per_index_thruput series=* | eval MB = round(kb/1024,2) | timechart sum(MB) as MB by series

Thanks for your help.

0 Karma

lguinn2
Legend

The metrics log probably doesn't have the information you need, as it samples the data - it is not complete.

This is not exactly what you asked for, but it is correct and complete. It examines the buckets in each index and calculates the number of events, the size on disk and the raw data size. It will run quickly. If your buckets roll more often than once per day, then this may match a day's worth of data fairly accurately...

| dbinspect index=* | search index!=_*| fields bucketId endEpoch eventCount sizeOnDiskMB startEpoch index rawSize
| where endEpoch > relative_time(now(), "-1d@d")
| stats min(startEpoch) as startEpoch max(endEpoch) as endEpoch sum(eventCount) as EventCount sum(sizeOnDiskMB) as "Size On Disk (MB)" sum(rawSize) as rSize by index
| eval "Raw Data Size (MB)"=round(rSize/1024/1024,2) | eval "Size On Disk (MB)"=round('Size On Disk (MB)',2)
| eval "Time Range (hrs)" = round((endEpoch - startEpoch)/3600,2)
| eval "End Time"=strftime(endEpoch,"%x %X") | eval "Start Time"=strftime(startEpoch,"%x %X")
| table index "Start Time" "End Time" "Time Range (hrs)" EventCount "Raw Data Size (MB)" "Size On Disk (MB)"

Note that the second line is where the actual time range is chosen. The selection says "choose buckets where the latest event in the bucket is within the last day." If you used startEpoch instead of endEpoch, Splunk would select only index buckets that had been started within the last day.

HOORAY! UPDATE to the UPDATE!! dbinspect now works in a distributed environment! Yay!

[OLD UPDATE] I dbinspect does not work properly in a distributed environment IN OLDER VERSIONS OF SPLUNK - it needs to be run on each indexer. However there is a answer that addresses this:
https://answers.splunk.com/answers/6147/how-to-generate-a-report-on-multiple-indexes.html

flee
Path Finder

Good news! In what version of Splunk that dbinspect started working in a distributed environment? Thanks for the update!

0 Karma

lstewart_splunk
Splunk Employee
Splunk Employee

flee, I think that was version 6.0. And you are welcome 🙂

flee
Path Finder

Thanks for your explanations lguinn! That helped. The data needs to be within the date/time range specified. Other data points like number of events and Size on Disk are optional for my case. It doesn’t need to match the license usage either.

Actually, the volume size for indexes from the metrics.log would be sufficient for what I need. I’m able to get a report on all indexes by adding the limit=0; without this parameter the report is limited to 10 indexes only.

index=_internal source=*metrics.log group=per_index_thruput series=* | eval MB = round(kb/1024,2) | timechart sum(MB) as MB by series limit=0

Thanks again for your help!

lguinn2
Legend

Thanks to @lstewart_splunk for updating me on the dbinspect command!

0 Karma

flee
Path Finder

Thanks lguinn. When I used where startEpoch > relative_time(now(), "-1d@d"), it also returns data indexed today and data from yesterday's and only returns a small set of indexes out of some 50 indexes that have data. How do I define exact From and To Date-Time boundary? Is there a way to list all indexes regardless any data was indexed for that given date/time range?

We have clustered indexers, does the dbinspect command run on a clustered Search Head run against all indexers in the cluster or does the command need to run on each indexer?

I also noticed some SOS and DMC panels on indexes are using _internal *metrics.log. Why would those tools use metrics.log to pull indexes related data if data is not complete as you mentioned?

Thank you.

0 Karma

lguinn2
Legend

Yes, the time range for dbinspect cannot be exact. The timerange is used to identify any buckets that have data in the timerange - but the reporting is based on the entire bucket, which can certainly have data outside the timerange. If you use the dbinspect command, there is no way around this.

Many apps (including the DMC) and admins (including me), use the metrics log to get a handle on "what's going on." Looking at the most active data feeds or indexes or whatever is usually all the information that is needed. However, if you have low-volume objects, they will probably not appear in the metrics log. So don't expect this data to be complete - for example, you can't match it to the license usage.

If you are looking for license usage, there is a log for that: license_usage.log
However, it will not tell you everything that you've asked for, such as disk space consumed or number of events per day.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...