Getting Data In

Can you help me with a problem extracting XML?

manderson7
Contributor

I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:

 <Doc_OutPut XML_Version="1.0">
      <Doc_Field>
        <Field_Name>BatchName</Field_Name>
<Field_Value>GOCLM36962920190214001_19045SCLM000018</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>GUID</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>ph_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phEmp_Template</Field_Name>
        <Field_Value>-Initial – Company</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phPhy_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
  </Doc_OutPut>

I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried
props.conf:

DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /<Doc_Field/>

What am I doing wrong here?

0 Karma
1 Solution

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

View solution in original post

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

manderson7
Contributor

Thanks very much, Chris. You're right, I believe I do want all the data in the text doc to show as 1 event.
Unfortunately, this did not extract the field names from the XML, and not all of the fields were in the 1 event. I ingested 1 file and got an event that was 257 lines long, and the rest of the lines were as their own event, and it didn't extract the field names.
I ingested another file of the same type, but I added a \n in between & , but this didn't help w/ the field name extraction. I again got 1 event w/ 257 lines, and the rest of the lines were in their own events.
It worked on regex101, so I'm not sure what happened.
Do you have any ideas what could be the problem?
I also tried adding LINEBREAKER = <\/Doc_OutPut> to the props, no go there either. The events still broke after 257 lines.

0 Karma

chrisyounger
SplunkTrust
SplunkTrust

Using LINE_BREAKER is the best thing to do. If the split works on Regex101 then it should work in Splunk. However two tricks to be aware of:
1. Make sure you put the LINE_BREAKER where the parsing is happening, this usually means the indexer or the first heavy forwarder the data goes through.
2. Make sure you have a "capture group" in your regular expression otherwise it won't work. e.g. LINEBREAKER = \<\/Doc_OutPut\>([\r\n]*)

0 Karma

manderson7
Contributor

LINE_BREAKER did the trick, with the capture group. Didn't know that was required.
Still not getting field names.
props.conf

[ocr_xml]
REPORT-ocr_xml_pairs = ocr_xml_pairs

transforms.conf

[ocr_xml_pairs]
REGEX = `|<Field_Name>\s*(?<Name>.*?)\s*<\/Field_Name>\n.*?<Field_Value>\s*(?<_Value>.*?)\s*<\/Field_Value>.*?
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...