Splunk Search

Fields correctly shown in results but not searchable well.

vaijpc
Communicator

Howdy! So I've been playing around with splunk and all of a sudden something that was working Friday afternoon has stopped working Monday morning.

Running on Windows, 4.1.6. The source in question is something like C:\foo\bar\host\date\port13.stat.

As well as the host, I want to take the 'port' number as well and make it an easily searchable field, rather than doing something based on 'source=' every time.

So I set up a field extraction applying to the sourcetype: "\port(?\d+).stat in source"

This has been working perfectly fine. If I search by that source file I will get all my results. Checking the available fields column on the left, I can also click the little button next to 'port' and see 100% of these results have "13" as their 'port'. I can also see all 7000+ of them.

However, if I then click on the '13', (so my search now looks like source="C:\foo\bar\host1\december\port13.stat" port="13") I get only 500 results. Oddly there is about one day every week where I seem to get a few hundred results, wheras the rest are just one or two.

Any ideas? I can't think for the life of me what I have changed to break this and have tried 'clean eventdata' and reindexing.

Tags (2)
0 Karma
1 Solution

Paolo_Prigione
Builder

You can solve that by adding the following to your $SPLUNK_HOME/etc/system/local/fields.conf:

[port]
INDEXED_VALUE = false

to tell splunk its index dows not contain port's values.

Here's why... by default, when you specify a search such as:

port=13 | ...

to increase performance Splunk will automatically translate that to:

13 port=13

which means "search for all events having a "13" in their index, then filter upon them to just pick those having the field "port" equal to 13, and discard the others (which must have a "13" in some other part of the text).

So, the basic -default- assumption is "field values are present in the index". In your case, this is just not true as the "source" field is not in the event content, so it has not been indexed. By modifying fields.conf you can change the default behaviour so that Splunk' search will just be "extract everything, then compute the port fields and pick only the port=13 event"

Just to give another example, if your events was something like

2011-01-23 13:05:51 ERROR723 Desc="asdasdasddas"

and you had a field extraction like

ERROR(?<errcode>\d+)

Than again, searching for errcode=723 would give you no results as the Splunk index only contains ERROR723 and not 723 alone. Modifying fields.conf again would solve the problem.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Edited to add new answer. Original answer is below, is still valid, but incomplete:

The below workaround works, but in newer versions (5.0 and later), you can also set up a fields.conf setting:

[port]
INDEXED_VALUE = source::*port<VALUE>*

In this case, you are telling Splunk what to look for in the index when a the field port has a given <VALUE>. Here, you want to look in the indexed field source.

The advantage of this is that it preserves the standard search syntax and thus allows default chart drilldowns to work.

-----original answer------

The explanation that Paolo gives is correct. The problem is due to the fact that Splunk assumes that your field value is a separate token in the event, so that a search for port=13 internally turns in a search for port=13 AND 13. This searches for items containing the token 13, i.e., it needs to have segmenter characters on each side of the string. By setting INDEXED_VALUE = false for the port field, it will simply scan every event and check, i.e., it will basically be going a "grep" rather than using Splunk search.

My recommended workaround if this causes unacceptably slow performance is to create a macro:

[port(1)]
args = p
definition = (port$p$ AND port="$p$)

You would then, in your search string use `port(13)` to search instead of port=13, e.g.:

sourcetype=mysourcetype other "term" `port(13)` field1=value2 | stats count

The disadvantage though is that this is a new syntax that must be learned, but worse, clickthrus in the Splunk UI and charts do not know it and thus won't use the macro.

vaijpc
Communicator

Aha thanks for that, could very well be useful.

Right now I'm only running through a few Mbs a day (with a backlog) and almost all data has this field from the source. It may be a performance issue in larger deployments but I'm not worried about that currently on my little laptop and regardless, yes that differing syntax isn't nice.

I'm going to wander off now and perhaps read up on that whole "search-time" vs "index time" argument...

If only I could have my cake and eat i.!

0 Karma

Paolo_Prigione
Builder

You can solve that by adding the following to your $SPLUNK_HOME/etc/system/local/fields.conf:

[port]
INDEXED_VALUE = false

to tell splunk its index dows not contain port's values.

Here's why... by default, when you specify a search such as:

port=13 | ...

to increase performance Splunk will automatically translate that to:

13 port=13

which means "search for all events having a "13" in their index, then filter upon them to just pick those having the field "port" equal to 13, and discard the others (which must have a "13" in some other part of the text).

So, the basic -default- assumption is "field values are present in the index". In your case, this is just not true as the "source" field is not in the event content, so it has not been indexed. By modifying fields.conf you can change the default behaviour so that Splunk' search will just be "extract everything, then compute the port fields and pick only the port=13 event"

Just to give another example, if your events was something like

2011-01-23 13:05:51 ERROR723 Desc="asdasdasddas"

and you had a field extraction like

ERROR(?<errcode>\d+)

Than again, searching for errcode=723 would give you no results as the Splunk index only contains ERROR723 and not 723 alone. Modifying fields.conf again would solve the problem.

vaijpc
Communicator

Aha awesome, fixed!

Apologies if this is a common issue, I have a funny feeling I saw this explained elsewhere but dismissed it due to never having used 'INDEXED_VALUE' before. I wonder what I changed to break things...

Now go check everything else is ok.

0 Karma

Paolo_Prigione
Builder

Ops, sorry, I've mispelled the property name: it is INDEXED_VALUE that you are interested in setting to false. I am modifying my answer accordingly.... Sorry for that.

0 Karma

vaijpc
Communicator

I've also tried adding it to apps/search/local/fields.conf and no luck.

0 Karma

vaijpc
Communicator

I'm afraid that hasn't worked.

I also tried naming it 'EXTRACT-port', restarting, rebuilding indexes etc.

FYI incase it is of relevance, I have two extracts for 'port' from different sourcetypes. The other one is from inside the file/event, rather than from the source.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...