Strange behaviour with | table *

FrankSPL · ‎11-11-2017

Hi all,

I have some issues with the results from using | table *

I start with a simple data selection:

sourcetype=senssordata sensortype="sens*"
This gives me 108 events as results.
With two different sensortype's , namely: "sens1" and "sens-B".

Ofcourse this would give me the same result:

sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B"
And it does, It gives the same 108 events as result.

So far, so good.
Now the strange issue appear.

sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | table *
or (sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | fieldsummary)

versus

sourcetype=senssordata sensortype="sens*" | table *
or (sourcetype=senssordata sensortype="sens*" | fieldsummary)

These two queries does give a different output!!!
Both field summaries are not equal, and both table * outputs are not equal.
Even when both initial data selection has the same events.....

The outputs of the second query contains much more fields and those fields doesn't seem to exist.
This first query seems to output valid date. But the second should do exactly the same.

Any ideas?
Can this be explained or is this a bug?

DalJeanis · ‎11-14-2017

The answer is that, when you are doing sensortype=sens*, the system is doing an expansion of all the fields from the other sensortypes before eliminating those sensortypes that don't match. This leaves a bunch of NULL fields.

Of course, table * is not best practices anyway -- much better to use only the fields that you need for any given query, and to put them in an explicit fields command after the first pipe, to minimize the amount of extraction done by the system.

For an understanding of why this unexpected behavior is not a bug, you have to understand how searches and bloom filters actually work under the covers.

If you look at slide 22 of this .conf2017 presentation by MVP Martin Müller (@martin_mueller) at https://conf.splunk.com/files/2017/slides/fields-indexed-tokens-and-you.pdf

...then you will see this wording...

▶ Default assumption: Field values are whole indexed tokens
▶ exception=java.lang.NullPointerException becomes [ AND java lang NullPointerException ]
▶ Actual field extractions and post-filtering happens after loading raw events

So basically, for the event selection, sensortype=sens* initially becomes AND sens*, so the initial part of the search is going to find all events that have sens* somewhere in them. That is going to literally be every record with a sensortype= in its _raw, since sens* will pick up the tokensensortype. It will also pick up any other fields that happen to have values starting with sens.

Since you are coding | table * , the system cannot optimize to the fields you are asking for and EVERY field has to be expanded. Once that all gets expanded, the ones where sensortype!=sens* get dropped, but the search still knows all the fields that were created/extracted for any of the events.

cmerriman · ‎11-14-2017

are you running sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | fieldsummary separately and comparing it to sourcetype=senssordata sensortype="sens*" | fieldsummary ? I only ask because you shouldn't be able to have a |table * or (sourcetype.... without the query erroring out. I'm wondering if either part of your query is missing or i'm misunderstanding something.

Strange behaviour with | table *

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life