Solved: Why does splunk add junk to the front of a search ...

landen99 · ‎06-26-2015

With the simplest search:

index=checkpoint action=accept | head 1

The normalizedSearch (under Job Inspect, 8.34s) is:

litsearch index=checkpoint ( ( ( sourcetype=opsec_audit ) AND ( ( ( ( ( ( sourcetype=WinRegistry ) AND ( ( registry_type=accept ) ) ) OR ( ( sourcetype=fs_notification ) AND ( ( action=accept ) ) ) ) OR ( vendor_action=accept ) ) ) ) ) ) OR ( ( ( ( sourcetype=fe_json ) AND ( ( "alert.action"=accept ) ) ) OR ( ( sourcetype=fe_xml ) AND ( ( "alerts.alert.action"=accept ) ) ) OR ( ( source="/nsm/bro/logs/current/notice.log" ) AND ( ( EXTRA_FIELD_18=accept ) ) ) ) OR ( action=accept ) ) | litsearch index=checkpoint action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false

A slight modification of the search to put the field search after the first pipe makes the junk go away:

    index=checkpoint accept | search action=accept | head 1

The normalizedSearch (under Job Inspect, 3.1s) is now:

litsearch index=checkpoint accept | search action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false

This is true even when he value "accept" is not before the first pipe.

Why does Splunk insert junk into the normalized search with a field search before the first pipe? The junk increases search time and in some cases where "NOT" OR "!" it can return "no results".

Splunk Version
6.2.3
Splunk Build
264376
Current App
Search & Reporting

dwaddle · ‎06-26-2015

The "junk" is an expansion of reverse lookups that happens because of various CIM-compliant TAs you have installed. I'll explain the concept of reverse lookups, but first let's go back to automatic lookups. Say you have a sourcetype bob defined with an automatic lookup. In props.conf:

[bob]
LOOKUP-actions = boblookup someinputfield OUTPUT action

And in boblookup.csv you have:

someinputfield,action
potato,deny
tomato,accept
blueberry,accept

Now your normal expectation is that when you do a search on sourcetype=bob that events will be matched against the lookup field and have a new field named action when someinputfield has a value of "tomato, potato, or blueberry". But, this same thing can be applied in reverse too.

If you search on action=accept, then Splunk can look through all of its config files and reason-out something like this:

Sourcetype bob has a lookup that outputs a field named action based on this CSV file. I see here in the CSV file that action=accept is returned whenever sometinputfield=blueberry or someinputfield=tomato. So there is an equivalency here:

( sourcetype = bob  AND ( someinputfield = blueberry OR someinputfield = tomato ) )

This is the fundamental step of a reverse lookup - the goal is to attempt to make automatic lookup fields searchable. This is a necessary evil for CIM-compliant apps like Enterprise Security because of how often they use automatic lookups to normalize field names and values.

There's a whole longer discussion here about the performance impacts around this. While it made your example situation slower, there are many other counter examples where this approach (up to a point) speeds things up.

View solution in original post

dwaddle · ‎06-26-2015

The "junk" is an expansion of reverse lookups that happens because of various CIM-compliant TAs you have installed. I'll explain the concept of reverse lookups, but first let's go back to automatic lookups. Say you have a sourcetype bob defined with an automatic lookup. In props.conf:

[bob]
LOOKUP-actions = boblookup someinputfield OUTPUT action

And in boblookup.csv you have:

someinputfield,action
potato,deny
tomato,accept
blueberry,accept

Now your normal expectation is that when you do a search on sourcetype=bob that events will be matched against the lookup field and have a new field named action when someinputfield has a value of "tomato, potato, or blueberry". But, this same thing can be applied in reverse too.

If you search on action=accept, then Splunk can look through all of its config files and reason-out something like this:

Sourcetype bob has a lookup that outputs a field named action based on this CSV file. I see here in the CSV file that action=accept is returned whenever sometinputfield=blueberry or someinputfield=tomato. So there is an equivalency here:

( sourcetype = bob  AND ( someinputfield = blueberry OR someinputfield = tomato ) )

This is the fundamental step of a reverse lookup - the goal is to attempt to make automatic lookup fields searchable. This is a necessary evil for CIM-compliant apps like Enterprise Security because of how often they use automatic lookups to normalize field names and values.

There's a whole longer discussion here about the performance impacts around this. While it made your example situation slower, there are many other counter examples where this approach (up to a point) speeds things up.

dwaddle · ‎06-26-2015

I'm not necessarily suggesting disabling anything - that's up to you, but you'll need to understand how that may affect the workings of existing searches. If you do have a lookup defined on [host::*] then yes there's a chance for reverve lookups to have weird effects. There are things in limits.conf that attempt to place boundaries on the behaviour though.

Reverse lookups are a bit of an obscure topic, so a support case may be a good idea in order to better understand what you can do with them.

Also, I'm not sure that measuring "time to first event" is necessarily the best way to approach comparing performance. Part of the goal of the reverse lookup approach is to turn dense searches into less dense ones. Doing a | head 1 of a dense search is almost always going to be quick, but it doesn't demonstrate how long it takes to get back all of the events.

landen99 · ‎09-29-2015

Last conf2015 session explained that field alias and calc. fields on CIM fields also cause this issue (in addition to auto-lookups).

kskujawa · ‎03-24-2016

Thanks! ...

landen99 · ‎06-26-2015

SO are you saying that I should find lookup tables defined in auto-lookups with the "action" field? And if I disable the auto-lookup then the junk will go away?

What if I have a lookup on src against a table listing all src ips and their hostnames defined for host::*? Will that include every src ip in the table as the junk in front of the normalizedSearch for every sourcetype defined in inputs.conf? Am I not completely misunderstanding your explanation here?

landen99 · ‎06-26-2015

Also, I looked in my auto-lookups and noticed fs_notification (among the many things having "action" field in them) in this line:

fs_notification : LOOKUP-action_for_fs_notification endpoint_change_vendor_action_lookup vendor_action OUTPUT action

No owner
Splunk_TA_windows

Also, I imagine that Splunk_TA_Windows is probably widely used and that most people will be seeing this issue. Do we need to disable that app for that particular entry to stop messing with our searches?

woodcock · ‎06-26-2015

FYI, I checked this on my system and it adds totally different (and much less, as well as much less-strange) baggage. I assume that search performance is significantly different between the 2 as well?

landen99 · ‎07-02-2015

On my local search head (on my laptop) tied to the corporate license, I search:
index=checkpoint action=accept | head 1

And get:
litsearch index=checkpoint ( action=accept OR ( sourcetype="Script:TimesyncStatus" ) ) | litsearch index=checkpoint action=accept | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | prehead limit=1 null=false keeplast=false

TimesyncStatus belongs to Splunk_TA_windows which is on this search head. Why does this normalized search have so much less? Could it be that other TAs are complicating the matter on the other search head? How should I proceed to discover what is causing this and why?

dwaddle · ‎07-02-2015

Yes, it is a function of the number of TAs you have installed and the configuration they contain. Whenever a app (TAs, SAs, etc) contains an automatic lookup - no matter what the app is - and Splunk can resolve that the automatic lookup has an output field named the same as one of your search terms, it will attempt to make a reverse lookup. The more apps you have, and the more automatic lookups those apps define, the more of this activity you'll see.

THIS DOES NOT MEAN that you disable those automatic lookups. They are there for a reason, in order to make data CIM compliant. Breaking how TAs make your data CIM compliant in order to achieve search performance improvements is a bit like burning down the house to get rid of the spider.

If you need to fix the performance problems, then I would suggest discussing with support how this can be made better, without breaking CIM.

landen99 · ‎06-26-2015

correct. As shown above, 3.1s vs 8.34s for pulling just one event (head 1).

Why does splunk add junk to the front of a search when a field search is defined before the first pipe?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes