I have set up a search-time field extraction. The extraction extracts a bunch of fields from a URL in a log file.
My problem is that for one of these fields, some events contain it and others do not, with no apparent reason. Here are two such examples. The first manages to extract the field, the second doesn't:
1.1.9.1 - [20/Mar/2011:17:39:37 -0700] 15625 "some.web.site" GET "/myaccount/videos/B004CZXC54.flv" "" 307 - "medusa" "-" "Python-urllib/2.6" "2.2.2.2"
1.1.9.1 - [20/Mar/2011:18:10:45 -0700] 0 "some.web.site" GET "/myaccount/videos/B003QMJAXM.flv" "" 307 - "medusa" "-" "Python-urllib/2.6" "2.2.2.2"
The field I'm trying to extract is the one corresponding to the "myaccount" part. As you can see, the two events are extremely similar - but the first doesn't show the field, the second does.
The odd thing about this is that: * If I pipe my search into | extract reload=T, I can see the missing field for all results. * There are a number of fields after this missing field (for the "videos" part, "B003QM.." part, "flv" part, etc) that are extracted fine.
The original regular expression was quite complex but I stripped it down to something simple that still shows the problem:
/(?<medusa_account_alias>[^/]+)/(?<medusa_restype>videos|images)
The problem field is the medusa_account_alias field. The fields following it seem to be extracted ok.
Any ideas will be greatly appreciated, is this some kind of bug in splunk or am I missing something?
Can you please provide the props.conf/transforms.conf stanzas that are responsible for performing the extractions and field aliasing?
The problem is caused by a field alias I have defined.
What I want is to have medusa_account_alias filled either from the above regex, or from another field ("accountId") extracted for another format of the log row. I used an alias from accountId to medusa_account_alias, and this caused the problem.
How do I achieve this otherwise? Having a field that can get filled by two disjoint cases?
Also, this doesn't explain why splunk's behavior was so arbitrary - why would it generate medusa_account_alias for one event and not for the other?