Getting Data In

EXTRACT from specific field (using 'in' syntax) doesn't work without forcing an extract reload=T

Adam_Sealey
Explorer

I've been trying to do a search time field extraction, using the EXTRACT- stanza in props.conf.

From the props.conf docs (http://docs.splunk.com/Documentation/Splunk/5.0.2/Admin/Propsconf), it appears that there are 2 ways to perform a search time extraction using EXTRACT; either on the _raw field, or on a specific field.

When I try to perform the field extraction on a specific field (using the 'in' syntax), the extraction doesn't run unless I specify '| extract reload=T'

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname

When I remove the 'in questionname' portion of the extraction (resulting in the extraction being run on _raw), the extraction runs all the time (doesn't require '| extract reload=T')

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$

Has anyone else run into this problem? In this case, I can rewrite my extraction to work on _raw, but there are other cases that I'm also working with that it would be very convenient to have the regex be applied to only one field.

Tags (2)
0 Karma
1 Solution

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

View solution in original post

Adam_Sealey
Explorer

Exactly correct!

Using btool, I was able to see the order that the extractions are applied, and confirmed what you said.

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname
EXTRACT-opcode = (?<operation>[ R]) (?<opcode>.) \[(?<hexflags>[0-9A-Fa-f]+) (?<flags>....) (?<response>[^\]]+)\]
EXTRACT-protocol = (?<packetid>[0-9A-Fa-f]*) (?<protocol>UDP|TCP) (?<direction>\w+) (?<src_ip>[0-9A-Fa-f\.\:]+)\s+
EXTRACT-question1 = \] (?<questiontype>\w+)\s+(?<questionname>.*)
EXTRACT-question2 = \] (?<questionname>[^\s]*)$
EXTRACT-threadid = (?<threadid>[0-9A-Fa-f]+)\s+(?<context>PACKET)

When I renamed to zzExtractDomain, it works great because the questionname has been filled at that point

Thanks!

0 Karma

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...