Monitoring Splunk

How does Splunk choose events to scan?

danielrusso1
Path Finder

How does Splunk determine which events to scan in order to find results?

For example, say I run a query to find a particular System log event code on a particular server (over a 7 day period):

index=main sourcetype="WinEventLog:System" host=server2 EventCode="33"

I get back 8178 results by scanning 8450 events.

How did Splunk choose the 8450 events to scan?

If I search the total number of System log events over the same period (regardless of event code):

index=main sourcetype="WinEventLog:System" host=dtweb2p*

I get back 10,000 results by scanning 14,589 events.

I understand why I received more results obviously. What I do not understand is why they universe on which Splunk searched increased as well. To me it would seem logical that the number of System log events to search on would be the same.

Tags (3)
0 Karma
1 Solution

Ayn
Legend

It's actually the other way around. Splunk has no concept of fields until very late in the search process (except for index-time fields), so it searches for the values you supply and THEN check to see if any found values are bound to the field you specify. See more in @dwaddle's EXCELLENT answer on this here: http://splunk-base.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focus...

So in your first search, you have a number of fields that you use to filter which events Splunk should retrieve. A number of these are already set at index time, so Splunk will filter on both field and value right away. One field, EventCode, is extracted at search-time. What Splunk does is it looks in the index for events having the value "33", and THEN checks if any found values are bound to the field EventCode. So when it says it's scanned 8450 events and found 8178 matching results, the events that were scanned but found not to match are the ones that have the value "33" but not in the EventCode field.

EDIT: As for your second search, I suspect Splunk needs to look at a bunch of events to see which have a matching host field because of your wildcard.

View solution in original post

Ayn
Legend

It's actually the other way around. Splunk has no concept of fields until very late in the search process (except for index-time fields), so it searches for the values you supply and THEN check to see if any found values are bound to the field you specify. See more in @dwaddle's EXCELLENT answer on this here: http://splunk-base.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focus...

So in your first search, you have a number of fields that you use to filter which events Splunk should retrieve. A number of these are already set at index time, so Splunk will filter on both field and value right away. One field, EventCode, is extracted at search-time. What Splunk does is it looks in the index for events having the value "33", and THEN checks if any found values are bound to the field EventCode. So when it says it's scanned 8450 events and found 8178 matching results, the events that were scanned but found not to match are the ones that have the value "33" but not in the EventCode field.

EDIT: As for your second search, I suspect Splunk needs to look at a bunch of events to see which have a matching host field because of your wildcard.

jtrucks
Splunk Employee
Splunk Employee

I suspect in this case, the first search contains a total population of events that have the field EventCode with any value (even null) and the latter search contains a total population of events both with and without the field EventCode in it.

--
Jesse Trucks
Minister of Magic
0 Karma

danielrusso1
Path Finder

You would think so, however the EventCode field appears in 100% of the results for the latter search.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...