Getting Data In

EXTRACT from specific field (using 'in' syntax) doesn't work without forcing an extract reload=T

Adam_Sealey
Explorer

I've been trying to do a search time field extraction, using the EXTRACT- stanza in props.conf.

From the props.conf docs (http://docs.splunk.com/Documentation/Splunk/5.0.2/Admin/Propsconf), it appears that there are 2 ways to perform a search time extraction using EXTRACT; either on the _raw field, or on a specific field.

When I try to perform the field extraction on a specific field (using the 'in' syntax), the extraction doesn't run unless I specify '| extract reload=T'

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname

When I remove the 'in questionname' portion of the extraction (resulting in the extraction being run on _raw), the extraction runs all the time (doesn't require '| extract reload=T')

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$

Has anyone else run into this problem? In this case, I can rewrite my extraction to work on _raw, but there are other cases that I'm also working with that it would be very convenient to have the regex be applied to only one field.

Tags (2)
0 Karma
1 Solution

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

View solution in original post

Adam_Sealey
Explorer

Exactly correct!

Using btool, I was able to see the order that the extractions are applied, and confirmed what you said.

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname
EXTRACT-opcode = (?<operation>[ R]) (?<opcode>.) \[(?<hexflags>[0-9A-Fa-f]+) (?<flags>....) (?<response>[^\]]+)\]
EXTRACT-protocol = (?<packetid>[0-9A-Fa-f]*) (?<protocol>UDP|TCP) (?<direction>\w+) (?<src_ip>[0-9A-Fa-f\.\:]+)\s+
EXTRACT-question1 = \] (?<questiontype>\w+)\s+(?<questionname>.*)
EXTRACT-question2 = \] (?<questionname>[^\s]*)$
EXTRACT-threadid = (?<threadid>[0-9A-Fa-f]+)\s+(?<context>PACKET)

When I renamed to zzExtractDomain, it works great because the questionname has been filled at that point

Thanks!

0 Karma

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...