Reporting

data model storage and backups

jeff
Contributor

Question from my backup guys and I couldn't find a good answer in the docs- I don't understand the structure of the data model data on the system. Indexes with a data model defined have a datamodel_summary directory:

[splunk@splunk3 splunk]$ ll ./firewall
total 60
drwx------.  37 splunk splunk  4096 May  2 13:26 colddb
drwx------. 340 splunk splunk 24576 May  2 13:35 datamodel_summary
drwx------. 306 splunk splunk 20480 May  3 10:06 db
drwx------.   2 splunk splunk  4096 Aug 17  2013 thaweddb

In the _internaldb index directory, I seem to have one of these and another "summary" directory that looks like it's associated somehow with the splunk deployment monitor:

[splunk@splunk3 splunk]$ ll _internaldb/
total 532
drwx------. 2216 splunk splunk 126976 May  3 09:47 colddb
drwx------. 2519 splunk splunk 184320 May  3 09:55 datamodel_summary
drwx------.  306 splunk splunk  28672 May  3 10:08 db
drwx------. 2519 splunk splunk 184320 May  3 09:50 summary
drwx------.    2 splunk splunk   4096 Aug 16  2013 thaweddb

[splunk@splunk3 splunk]$ ll _internaldb/summary/998_163BFC27-2C4C-4CDE-83CD-F8B48C29BA80/20D17CF6-2E61-47A1-B3A4-FF57509916DF/
total 596
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_1a56f43bf8d5bf20
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_26e747c470c62ba8
<snip several lines />
drwx------. 2 splunk splunk 24576 Jan 11 14:08 splunk_deployment_monitor_nobody_NSd0dc3ea132443bbf

From the backup perspective, the backups are throwing a thousands of errors each night for non-existant files (were there when the drive was scanned, but not when it came time to back up). I'm fairly sure it's okay to tell them to exclude the datamodel_summary (and summary) directories entirely since they can be recreated after a restore, but for my own sanity I'd like to understand the structure a bit more.

  1. Can we exclude the data models from backup?
  2. What is that extra summary directory in _internaldb all about? Likewise, it can be excluded?
0 Karma

helge
Builder

Exclude the datamodel_summary directories from backup.
If you restore an index, Splunk recreates the accelerated data model (that is what is stored in datamodel_summary) automatically.

0 Karma

lmyrefelt
Builder

The summary directory you see are for summary databases , this one seems to be generated by the deployment monitor app.

Your tsdix files should go in the data model_summary dir if you do not tell them otherwise (in / via indexes.conf , look for tsidx_homepath or similar)

By default summary data should go to $SPLUNK_HOME/var/lib/splunk/database/summary

0 Karma

lmyrefelt
Builder
  1. as backup, the data should be generated when you first run /using the data models in the pivot if i don't remember wrong, so there should not be any point in making backups of them. If you create your own data models for your data, you should take a backup of the data model configuration.
0 Karma

lmyrefelt
Builder

indexes.conf - tstatsHomePath for datamodels
indexes.conf - tsidxStatsHomePath for accelerations
indexes.conf - summaryHomePath for summary data

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I think the summary directory is related to report acceleration turned on for a search owned by nobody in the splunk_deployment_monitor app... I also think those two kinds of accerelations don't need to be backed up because they don't contain anything unique but rather only summaries of existing index data.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...