We are having bucket performance issues and it looks like the cause is a host that is sending data "from the past" consistently. I know which bucket is causing the issue so is it possible to search by bucket?
search bucket=hot_v1_431
Answered my own question...
Just go to $SPLUNK_HOME/var/lib/splunk/<index>/db/<bucket> and view the Hosts.data file
Yes, you can.
The _bkt field is available (though sadly, not as a search term in the first part of a search before the first |, but you can search on it). You'll need to know the index name, the bucket ID, and the GUID of the server itself. In 4.x instances, this is the guid parameter in the [general] stanza of server.conf. In 5.x, it's stored in $SPLUNK_HOME/etc/instance.cfg.
I was able to run this search for bucket ID 22 of the summary index:
index=summary | where _bkt="summary~22~4F582768-7B38-4768-95EA-EC3D491A8A23"
Brilliant! I have been looking for how long to do this. Thank you.
An easy way to find the timestamps for events, is to use the metadata command in combination with a stats operator. You can use any type of metadata, but your final query might look like:
| metadata type=hosts | stats min(recentTime) as last_event by host | sort +last_event
The above search finds the timestamp of the last event sent to splunk by a specific host, sorted by the the UTC timestamp. You can also do the same for sources and sourcetypes.
To make this humanly readable:
| metadata type=hosts | stats min(recentTime) as last_utc by host | convert ctime(last_utc) as
last_event
Answered my own question...
Just go to $SPLUNK_HOME/var/lib/splunk/<index>/db/<bucket> and view the Hosts.data file
Yes, I would be interested in this. Thanks!
Hosts.data will certainly show you the set of hosts that were on events which contributed to the bucket. With a bit of work you can also parse the time ranges yourself (the fields there, on a per host basis). There isn't really an exposed way to search on specific buckets, however. There's an internal field which represents the source bucket, so it is possible, but the names used are not the filesystem names. If this seems genuinely useful, I can go rediscover it. I'm not sure that this machinery is intended to be used, or possibly useful outside of troubleshooting cases.