I have a log that has multiple fields and values and each event has a different set of fields and values. To handle that, I'm using a transforms stanza with a REGEX to separately extract the field and value at search time. The transform seems to be working as expected as Splunk shows all of my fields on the left side with all of their values. But that's where it stops working.
When I try to actually use one of the extracted fields in search, I get very odd behavior. If I do a search with field=value, I get no results, even if I use Splunk's built-in extraction from the field list on the left side to construct my search string. However, if I add an asterisk (*) to the end of the field=value search, then I get results. This makes me think my REGEX is extracting a bit more than it should, but I can't see any extra characters or non-printable ones.
Here is a sample event that is being extracted correctly:
2014-03-12 11:26:32,389 INFO SSID:AA87309DKj9911FFFFACDD [pool-10251-thread-1] SERVICE_KEY=5688 SERVICE=myService INDEX_POS=0 APPLICATION_ID==APPID~ACCOUNT_NUM==123456789~CUST_SUBTYPE==R~CUST_TYPE==I~ENV_CODE==ENV~MARKET_CODE==123~OPERATOR_ID==123456~ORIGIN_SYSTEM==APP~PSUBMKTGRP_ROW_COUNT==12~RUN_DATE==20140312~SUBMKT_SUB_MARKET_CODE==ABC~TRANSACTION_MODE==O~
And here is my stanza from transforms:
REGEX = ([A-Z0-9_]*?)==([^~]*?)~
FORMAT = $1::$2
In this case, Splunk properly pulls out all the field names (APPLICATION_ID, ACCOUNT_NUM, CUST_SUBTYPE, etc), and the values are also correct as the left side list of fields shows. But if my search is something like APPLICATION_ID=APPID, I'll get no results. However, simply making the search APPLICATION_ID=APPID* will work.
Because Splunk is able to properly extract field names and values in the left side in verbose mode, but then fails in search mode, this makes me think this could be a bug in Splunk. And potentially it's related to my data having double equal signs. The reason for the double equal signs is to prevent Splunk from trying to auto extract since in some cases these fields can contain an equal sign as part of the value.
Hopefully that's enough information for someone to give me some pointers. Thanks.
Regardless of the field extraction, searching for the word APPID
doesn't find your event while searching for APPID*
does, right?
If that's the case, you're being hampered by a performance optimization Splunk makes. It assumes field values are indexed tokens, which yours is not. You can stop Splunk from making that assumption in fields.conf, see http://docs.splunk.com/Documentation/Splunk/6.0.2/admin/fieldsconf for reference.
You could set INDEXED_VALUE=false
for your field, forcing Splunk to do a fulltext search for your value...
Or, you could use the fact that your values are preceded by an equals sign so they are the start of an indexed token - I believe you might get away with setting INDEXED_VALUE=s/$/*/
. The benefit of this is that Splunk will still utilize the indexed values for performance gains, your users just don't need to add the asterisk themselves.
Some background: http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
thanks for this, solved my problem too
So you're saying that I have to exhaustively list all fields that I want to be able to search in fields.conf? I need to be able to search any of them which makes this a daunting task.
Let's say I have 200 fields. So I'll have to put 200 different field definitions in fields.conf. If I'm going to do that, couldn't I just define all the possible REGEX patterns in props or transforms as a regular extract with a field name? And if my assumption is correct, is making them normal extractions better from a performance perspective in terms of them being indexed according to normal heuristics?
You can check that yourself, see the blog post I included in the answer for background.
Oh, and one other thing to add that might make a difference. If I search for a dynamically extracted field whose value is a single character, then my search will work. Based on my example event above, if I search for CUST_SUBTYPE=R, then that will work.
Does this behavior still match what you referred to above?
And thanks for the help.
Nah, only for fields where you don't want INDEXED_VALUE=true
which is the default. Basically, only touch that value if you run into the problem you describe, and only touch it for those specific fields.
One thing that doesn't seem to make much sense is the [
Will I need to define the INDEXED_VALUE property for each of these?
Yes, if I search for APPID, I also get no results found.