Splunk Search

Discrepancy between raw search and accelerated data model search

responsys_cm
Builder

I have a customer who has tasked me with coming up with a strategy for monitoring that the output of data model searches would match the corresponding search on raw events. They are OK with running this search hourly and have it look back on a relative time window (like: earliest=-15m@m latest=-10m@m) for sampling purposes.

In order to test this, I accelerated the default internal_server data model on my lab system. The goal was to create a search that utilizes the high level tags of the data model, pipe that to stats, and then use tstats with the append=t option to add the same statistical calculations from the data model and compare them.

So, I used the "Advanced" date and time range option in the search bar to set it to -10m@m to -15m@m. Here is the search:

index=_internal source=scheduler.log OR source=metrics.log OR source=splunkd.log OR source=license_usage.log OR source=splunkd_access.log | stats count AS raw_count, dc(processor) AS raw_dc_processor by host | tstats append=t prestats=t summariesonly=t count, dc(server.processor) FROM datamodel=internal_server BY host | stats values(raw_dc_processor) AS raw_dc_processor, dc(server.processor) AS dm_processor, values(raw_count) AS raw_count, count AS dm_count by host

Here's the weirdness... When I run that search over a relative time window as defined by the "Advanced" option in the timerange picker, the count of events on the data model is always 1 higher than the count on raw events. If I hard code a particular 5 minute time window, the raw search, the "| datamodel" search, and the "| tstats" search all return the same count of events.

It's only when I combine the raw search and tstats search into the same search with the "Advanced" relative time that I see a 1 event delta between the two.

This lab machine is basically sitting there, so it isn't really dealing with a massive number of _internal Splunk events.

If I used the "Advanced" option in the time picker, does the top level search impose the same earliest/latest time constraints on both the raw events and the tstats search? Can anyone explain the consistent difference of one event between the two searches?

I may try the same search using appendcols and see if the results are different, but I'd love to figure out why the count of events between what should be identical searches is always one off.

Thanks for your help!

0 Karma

tiagofbmm
Influencer

Hi

Yes the timepicker influences everything on that search, so that is not the inconsistency.
I also don't think that it happens only with relative time ranges. I just tried it with a fixed period and I get the same 1 event discrepancy.

I'm curious about this and tried it myself, and found differences too, but I think due to the summariesonly.
Using the summariesonly=t in such a precise comparison may be misleading, as the most recent events may not be accelerated yet.

Having that said, the dm results having 1 more event doesn't really make any sense to me either. Investigating

0 Karma

responsys_cm
Builder

Part of the reason I chose -50 to -45 min as the time range was so that it was pretty much guaranteed that the data model acceleration was done -- at least under normal conditions.

I believe I tried it with both summariesonly=t and without it and still had the one event discrepancy.

0 Karma

tiagofbmm
Influencer

Notice that a normal append won't produce that strange behaviour:

index=_internal source=*scheduler.log OR source=*metrics.log OR source=*splunkd.log OR source=*license_usage.log OR source=*splunkd_access.log 
| stats count AS raw_count, dc(processor) AS raw_dc_processor by host 
| append [ | tstats prestats=t summariesonly=t count, dc(server.processor) FROM datamodel=internal_server BY host 
| stats values(raw_dc_processor) AS raw_dc_processor, dc(server.processor) AS dm_processor, values(raw_count) AS raw_count, count AS dm_count by host]
0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...