Knowledge Management

Why am I getting a mismatch between Summary Index values and the original values?

nl65
Explorer

I have the following search which works fine:

sourcetype=my_sourcetype some_filter  |bucket _time span=1d |  timechart count by some_field

Verifying decomposition into 1h bins works fine as well and matches the above.

sourcetype=my_sourcetype some_filter  |bucket _time span=1h | sitimechart count by some_field | bucket _time span=1d|timechart count by some_field

Created saved search:

sourcetype=my_sourcetype some_filter |bucket _time span=1h | sitimechart count by calc_severity

Then back-filling summary index using:

./splunk cmd python fill_summary_index.py -app my_app -name my_search -et my_start_time -lt my_end_time  -dedup true -auth my_user:my_pass

populating the summary index.

However, trying to regenerate the chart using :

index=summary source=my_source | bucket _time span=1d  | timechart count by some_field

produces wrong chart as the values are much greater then the originals.

It seems to me that the values in summary index are factored by permutations of the values of some_field.

Please advise

Thanks

Tags (1)
0 Karma
1 Solution

halr9000
Motivator

A few observations:

  1. You don't need to call bucket explicitly on _time, because timcahrt does this as needed (docs). It may be confusing things here, so why don't you move such reporting commands out of your populating search? Or do you really want to be modifying _time twice?
  2. The search used to retrieve results from a summary index has to be the same as how it goes in, or the results will differ for sure. This answer explains it well.
  3. Make sure that summary is what you want! Report acceleration and data model acceleration are more flexible and often (but not always) replace what was previously done with summary indexing. This docs page talks about the differences.

View solution in original post

halr9000
Motivator

A few observations:

  1. You don't need to call bucket explicitly on _time, because timcahrt does this as needed (docs). It may be confusing things here, so why don't you move such reporting commands out of your populating search? Or do you really want to be modifying _time twice?
  2. The search used to retrieve results from a summary index has to be the same as how it goes in, or the results will differ for sure. This answer explains it well.
  3. Make sure that summary is what you want! Report acceleration and data model acceleration are more flexible and often (but not always) replace what was previously done with summary indexing. This docs page talks about the differences.

nl65
Explorer

Thanks much, switched to accelerated.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...