Splunk Search

Are data model summaries linked to the original events? Can tstats access them?

gabriel_vasseur
Contributor

With tstats, I can't seem to get access to the original events. Even in "verbose" mode, the "Events" tab contains only what I regard as stripped-down versions of the original events. Is there a way to access the original events? (other than the hack of adding the "_raw" field to the data model...)

The way I understand accelerated data model summaries is that they are basically independent traditional databases with a rigid schema: they just contain the values for the fields you specified in the definition of the data model. There are independent of indexes and your data and that's why they are quick and don't offer access to the original events.

However, I have just been trying to locate where the accelerated data model summaries are held (to figure out size requirements) and apparently they are held in the same place with the indexes. To me that doesn't make any sense. Data models by definition are index-agnostic and data in the model may come from many different indexes, so why keep them together? And if they are kept together, then why is the tsidx files not holding pointers to the original events in the index?

1 Solution

dshpritz
SplunkTrust
SplunkTrust

Accelerated Data Models are like summary indexes. That is, they will contain bits and pieces from events (the fields from the events that the data model includes) but they do not contain the the original event, nor pointers to them. You are correct, unlike Splunk's normal late-binding schema, data models are a little more rigid, and as such can offer speed when reporting.

Typically, when attempting to drill down from the accelerated data to actual events, the root search for the data model is used and combined with the other information from the accelerated fields, and the time range to get to the original event. However, this is running a new search to pull the event data (and possibly multiple events) based on the summary data, not "this is the event that relates to the accelerated data". I hope that makes sense.

It's important to remember that accelerated data models are summaries, and as such a single accelerated entry in the data model should point to more than one event. If you make the data model too specific, you end up with a bloated data model that will take up a lot of space (the data model included with the Palo Alto app is an example of this, as it includes very specific fields like an ID with each event, you end up with a very large accelerated store).

For information on where data model accelerations are stored, you can check out the Knowledge Manager Manual. For a custom data model, it's harder to say what the usage might look like, but for the models include in the Common Information Model App, you can take a look at this page in the knowledge manager manual as well as the Deployment Planning section of the ES documentation.

HTH

View solution in original post

helge
Builder

Take a look at my article series on accelerated data models. It should answer all the questions you asked in detail:
https://helgeklein.com/blog/2015/10/splunk-accelerated-data-models-part-1/

gabriel_vasseur
Contributor

Interesting read, thanks!

dshpritz
SplunkTrust
SplunkTrust

Accelerated Data Models are like summary indexes. That is, they will contain bits and pieces from events (the fields from the events that the data model includes) but they do not contain the the original event, nor pointers to them. You are correct, unlike Splunk's normal late-binding schema, data models are a little more rigid, and as such can offer speed when reporting.

Typically, when attempting to drill down from the accelerated data to actual events, the root search for the data model is used and combined with the other information from the accelerated fields, and the time range to get to the original event. However, this is running a new search to pull the event data (and possibly multiple events) based on the summary data, not "this is the event that relates to the accelerated data". I hope that makes sense.

It's important to remember that accelerated data models are summaries, and as such a single accelerated entry in the data model should point to more than one event. If you make the data model too specific, you end up with a bloated data model that will take up a lot of space (the data model included with the Palo Alto app is an example of this, as it includes very specific fields like an ID with each event, you end up with a very large accelerated store).

For information on where data model accelerations are stored, you can check out the Knowledge Manager Manual. For a custom data model, it's harder to say what the usage might look like, but for the models include in the Common Information Model App, you can take a look at this page in the knowledge manager manual as well as the Deployment Planning section of the ES documentation.

HTH

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...