I've been trying to do a search time field extraction, using the EXTRACT-
From the props.conf docs (http://docs.splunk.com/Documentation/Splunk/5.0.2/Admin/Propsconf), it appears that there are 2 ways to perform a search time extraction using EXTRACT; either on the _raw field, or on a specific field.
When I try to perform the field extraction on a specific field (using the 'in' syntax), the extraction doesn't run unless I specify '| extract reload=T'
EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname
When I remove the 'in questionname' portion of the extraction (resulting in the extraction being run on _raw), the extraction runs all the time (doesn't require '| extract reload=T')
EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$
Has anyone else run into this problem? In this case, I can rewrite my extraction to work on _raw, but there are other cases that I'm also working with that it would be very convenient to have the regex be applied to only one field.
The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T
" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.
Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.
Exactly correct!
Using btool, I was able to see the order that the extractions are applied, and confirmed what you said.
EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname
EXTRACT-opcode = (?<operation>[ R]) (?<opcode>.) \[(?<hexflags>[0-9A-Fa-f]+) (?<flags>....) (?<response>[^\]]+)\]
EXTRACT-protocol = (?<packetid>[0-9A-Fa-f]*) (?<protocol>UDP|TCP) (?<direction>\w+) (?<src_ip>[0-9A-Fa-f\.\:]+)\s+
EXTRACT-question1 = \] (?<questiontype>\w+)\s+(?<questionname>.*)
EXTRACT-question2 = \] (?<questionname>[^\s]*)$
EXTRACT-threadid = (?<threadid>[0-9A-Fa-f]+)\s+(?<context>PACKET)
When I renamed to zzExtractDomain, it works great because the questionname has been filled at that point
Thanks!
The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T
" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.
Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.