Splunk Search

Why do I get different Search Behavior depending on fields.conf?

mzorzi
Splunk Employee
Splunk Employee

I'm running a search based on a field extracted at search time using props.conf.

I've noticed that if I don't have a fields.conf, my search works fine. Instead if I create a fields.conf and I specify INDEXED=true, I have to add a * at the end of the value I'm searching for.

An example:

My props.conf looks like:

 [myst]

    EXTRACT-dbflds = ^\\[?< locationid>.\*?\] \[?< hostname>.\*?\] \[?< database>.\*?\] \[?< instance>.\*?\] \[?< pid>.\*?\] \[?< thread>.\*?\] \[(.\*?)\]

My source file is like:

    [2010-04-09 17:29:51,085] [asia123] [bighost] [dbprod] [pango] [pid675] [open.connection] 
    [2010-04-09 18:49:52,063] [europe345] [smallhost] [dbdev] [acaia] [pid987] [close.transaction] 

When there is no fields.conf at all, or there is a fields.conf but I with INDEXED=false for every field stanzas, my search:

sourcetype=myst instance=pango

works correctly.

Instead, if there is a fields.conf and I specified INDEXED=true my previous search doesn't return any result, but this does:

sourcetype=myst instance="pango*"

Why this different behavior?

2 Solutions

jrodman
Splunk Employee
Splunk Employee

The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).

In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.

View solution in original post

Lowell
Super Champion

This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..

BTW, it looks like your regex got scrambled when you posted your question.

With that said, you may want to consider the following regex tweaks:

  • Drop the trailing un-named group which probably isn't necessary. (Or give it name if you want it.)
  • Using non-greedy dot matching is normally a good thing (when compared to simple .*), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+ rather than .*?.

This should work well for you:

EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]

Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)

View solution in original post

0 Karma

Lowell
Super Champion

This is technically more of a comment than an answer (both jrodman and gkanapathy covered the topic well.) But comments have limited formatting, so I'm posting it here..

BTW, it looks like your regex got scrambled when you posted your question.

With that said, you may want to consider the following regex tweaks:

  • Drop the trailing un-named group which probably isn't necessary. (Or give it name if you want it.)
  • Using non-greedy dot matching is normally a good thing (when compared to simple .*), but this can still be costly because of back-tracking within the regex-engine. Perhaps it would be more efficient to say "stop matching one you find a "]". As long as you don't have nested square braces, you should be able to match your values using [^\]]+ rather than .*?.

This should work well for you:

EXTRACT-dbflds = ^[^\]]+\] \[(?<locationid>[^\]]+)\] \[(?<hostname>[^\]]+)\] \[(?<database>[^\]]+)\] \[(?<instance>[^\]]+)\] \[(?<pid>[^\]]+)\] \[(?<thread>[^\]]+)\]

Just some thoughts. (Hopefully this will actually be formatted correctly when it gets posted...)

0 Karma

jrodman
Splunk Employee
Splunk Employee

Wow, regex optimization tips. I've always meant to do some experimentation with timing regex behavior but doing it in a performant enough language to get tight results seemed too boring.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You should not be setting anything in fields.conf for search-time extracted fields. Setting INDEXED=true tells Splunk to look for your field as a separately stored and indexed field. It won't be unless it was index-time extracted and stored. (Which, BTW, is rarely recommended.)

The fact that it works when you add the wildcard is either a bug or a special-casing of wildcard behavior.

jrodman
Splunk Employee
Splunk Employee

The setting INDEXED=true for a field, which is not set by default, means that the field was created at indexing time, and is actually stored in a special way in the index (specifically the string instance::pango is indexed.) Since your fields are created via search-time extractions, this setting is incorrect. When you ad a wildcard to the value, apparently splunk is abandoning the requirement that it be locatable as an index-time field (though this surprises me).

In short, this setting is simply not correct for your configuration, which is why it does not work. Realize that the strings are indexed regardless (INDEXED_VALUE=true) so there isn't really an expected performance cost for this.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...