Getting Data In

What's an Invalid_hot_6 doing in my index? Should I do something about it?

jrodman
Splunk Employee
Splunk Employee

In my index, in the warm directory, I have some buckets like db_1274392278_1271804233_0, some hot_v1_1, and then this one invalid_hot_2. What is an invalid hot? Should I be concerned? Should I delete it?

1 Solution

jrodman
Splunk Employee
Splunk Employee

Presuming you already know what indexing buckets are...

A splunk hot bucket is changed into an invalid_hot bucket when Splunk detects that the metadata files (Sources.data/Hosts.data/SourceTypes.data) are corrupt/incorrect. There are two types of incorrect data detected: the time ranges may be incorrect, or the event counts may be incorrect. We believe that the time ranges are usually at fault.

An invalid hot bucket is mostly ignored by the index from this point on. Since we don't trust it, we don't want to put more data in it, and we do not search it. They are not currently (4.1.x) automatically recovered or automatically managed in any way.

Invalid hots do not count as hot or warm for the index management considerations (max number of allowed hot, max number of allowed warm buckets). Thus, they will not negatively affect the flow of data through the system, but at the time of this writing (4.1.3) they can incur additional disk storage over what is expected, because the normal data will be stored in additional to the invalid hot data.

In some cases this is inconsequential. There have been versions historically where splunk decided a bucket was invalid too early when it was still an empty directory. While a nuisance, there is no harm with that scenario. You can safely delete such an empty invalid hot (these were generated by versions around 4.0.3. If you are running such, please upgrade.)

In other cases, real data had already arrived in the hot bucket before it was determined to be problematic.

The corrective action for an invalid hot is to:

  1. Attempt to rebuild the metadata from the rawdata information (recover-metadata)
  2. If successful, rename the bucket as a warm bucket to rejoin the splunk index proper.

The recover-metadata command is destructive. It will clobber the existing .data files in a bucket. I recommend making a duplicate of these files before running recover-metadata even if their only use may be for forensics purposes.

To run recover-metadata, run `splunk cmd recover-metadata path/to/your/invalid_hot_5'. Hopefully, it tells you that it worked.

If recover-metadata is successful, rename the bucket as it would normally be named (link to script forthcoming) and all should be well.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

Presuming you already know what indexing buckets are...

A splunk hot bucket is changed into an invalid_hot bucket when Splunk detects that the metadata files (Sources.data/Hosts.data/SourceTypes.data) are corrupt/incorrect. There are two types of incorrect data detected: the time ranges may be incorrect, or the event counts may be incorrect. We believe that the time ranges are usually at fault.

An invalid hot bucket is mostly ignored by the index from this point on. Since we don't trust it, we don't want to put more data in it, and we do not search it. They are not currently (4.1.x) automatically recovered or automatically managed in any way.

Invalid hots do not count as hot or warm for the index management considerations (max number of allowed hot, max number of allowed warm buckets). Thus, they will not negatively affect the flow of data through the system, but at the time of this writing (4.1.3) they can incur additional disk storage over what is expected, because the normal data will be stored in additional to the invalid hot data.

In some cases this is inconsequential. There have been versions historically where splunk decided a bucket was invalid too early when it was still an empty directory. While a nuisance, there is no harm with that scenario. You can safely delete such an empty invalid hot (these were generated by versions around 4.0.3. If you are running such, please upgrade.)

In other cases, real data had already arrived in the hot bucket before it was determined to be problematic.

The corrective action for an invalid hot is to:

  1. Attempt to rebuild the metadata from the rawdata information (recover-metadata)
  2. If successful, rename the bucket as a warm bucket to rejoin the splunk index proper.

The recover-metadata command is destructive. It will clobber the existing .data files in a bucket. I recommend making a duplicate of these files before running recover-metadata even if their only use may be for forensics purposes.

To run recover-metadata, run `splunk cmd recover-metadata path/to/your/invalid_hot_5'. Hopefully, it tells you that it worked.

If recover-metadata is successful, rename the bucket as it would normally be named (link to script forthcoming) and all should be well.

jrodman
Splunk Employee
Splunk Employee

Mind if i pull this and stash it on the splunk.com wiki somewhere? I'm unclear why the case matters since I thought we smashed case for search purposes. Is it a display issue?

0 Karma

Lowell
Super Champion

For anyone interested. I wrote a little python script that will attempt to restore the proper case in your recovered metadata files by using your index-level metadata files. Splunk stores all of your metadata values in lower case in the index and only preservers the case your .data files which get overwritten when you run recover-metadata, this script tries to fix this. (BTW. I haven't tried this in 4.x, but it should work fine.) Here is the script: http://pastebin.ca/1481049

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...