Solved: Are data model summaries linked to the original ev...

gabriel_vasseur · ‎07-19-2016

With tstats, I can't seem to get access to the original events. Even in "verbose" mode, the "Events" tab contains only what I regard as stripped-down versions of the original events. Is there a way to access the original events? (other than the hack of adding the "_raw" field to the data model...)

The way I understand accelerated data model summaries is that they are basically independent traditional databases with a rigid schema: they just contain the values for the fields you specified in the definition of the data model. There are independent of indexes and your data and that's why they are quick and don't offer access to the original events.

However, I have just been trying to locate where the accelerated data model summaries are held (to figure out size requirements) and apparently they are held in the same place with the indexes. To me that doesn't make any sense. Data models by definition are index-agnostic and data in the model may come from many different indexes, so why keep them together? And if they are kept together, then why is the tsidx files not holding pointers to the original events in the index?

dshpritz · ‎07-19-2016

Accelerated Data Models are like summary indexes. That is, they will contain bits and pieces from events (the fields from the events that the data model includes) but they do not contain the the original event, nor pointers to them. You are correct, unlike Splunk's normal late-binding schema, data models are a little more rigid, and as such can offer speed when reporting.

Typically, when attempting to drill down from the accelerated data to actual events, the root search for the data model is used and combined with the other information from the accelerated fields, and the time range to get to the original event. However, this is running a new search to pull the event data (and possibly multiple events) based on the summary data, not "this is the event that relates to the accelerated data". I hope that makes sense.

It's important to remember that accelerated data models are summaries, and as such a single accelerated entry in the data model should point to more than one event. If you make the data model too specific, you end up with a bloated data model that will take up a lot of space (the data model included with the Palo Alto app is an example of this, as it includes very specific fields like an ID with each event, you end up with a very large accelerated store).

For information on where data model accelerations are stored, you can check out the Knowledge Manager Manual. For a custom data model, it's harder to say what the usage might look like, but for the models include in the Common Information Model App, you can take a look at this page in the knowledge manager manual as well as the Deployment Planning section of the ES documentation.

HTH

View solution in original post

helge · ‎07-30-2016

Take a look at my article series on accelerated data models. It should answer all the questions you asked in detail:
https://helgeklein.com/blog/2015/10/splunk-accelerated-data-models-part-1/

gabriel_vasseur · ‎08-05-2016

Interesting read, thanks!

dshpritz · ‎07-19-2016

Accelerated Data Models are like summary indexes. That is, they will contain bits and pieces from events (the fields from the events that the data model includes) but they do not contain the the original event, nor pointers to them. You are correct, unlike Splunk's normal late-binding schema, data models are a little more rigid, and as such can offer speed when reporting.

Typically, when attempting to drill down from the accelerated data to actual events, the root search for the data model is used and combined with the other information from the accelerated fields, and the time range to get to the original event. However, this is running a new search to pull the event data (and possibly multiple events) based on the summary data, not "this is the event that relates to the accelerated data". I hope that makes sense.

It's important to remember that accelerated data models are summaries, and as such a single accelerated entry in the data model should point to more than one event. If you make the data model too specific, you end up with a bloated data model that will take up a lot of space (the data model included with the Palo Alto app is an example of this, as it includes very specific fields like an ID with each event, you end up with a very large accelerated store).

For information on where data model accelerations are stored, you can check out the Knowledge Manager Manual. For a custom data model, it's harder to say what the usage might look like, but for the models include in the Common Information Model App, you can take a look at this page in the knowledge manager manual as well as the Deployment Planning section of the ES documentation.

HTH

Are data model summaries linked to the original events? Can tstats access them?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk