Getting Data In

Can you help me with a problem extracting XML?

manderson7
Contributor

I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:

 <Doc_OutPut XML_Version="1.0">
      <Doc_Field>
        <Field_Name>BatchName</Field_Name>
<Field_Value>GOCLM36962920190214001_19045SCLM000018</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>GUID</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>ph_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phEmp_Template</Field_Name>
        <Field_Value>-Initial – Company</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phPhy_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
  </Doc_OutPut>

I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried
props.conf:

DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /<Doc_Field/>

What am I doing wrong here?

0 Karma
1 Solution

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

View solution in original post

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

manderson7
Contributor

Thanks very much, Chris. You're right, I believe I do want all the data in the text doc to show as 1 event.
Unfortunately, this did not extract the field names from the XML, and not all of the fields were in the 1 event. I ingested 1 file and got an event that was 257 lines long, and the rest of the lines were as their own event, and it didn't extract the field names.
I ingested another file of the same type, but I added a \n in between & , but this didn't help w/ the field name extraction. I again got 1 event w/ 257 lines, and the rest of the lines were in their own events.
It worked on regex101, so I'm not sure what happened.
Do you have any ideas what could be the problem?
I also tried adding LINEBREAKER = <\/Doc_OutPut> to the props, no go there either. The events still broke after 257 lines.

0 Karma

chrisyounger
SplunkTrust
SplunkTrust

Using LINE_BREAKER is the best thing to do. If the split works on Regex101 then it should work in Splunk. However two tricks to be aware of:
1. Make sure you put the LINE_BREAKER where the parsing is happening, this usually means the indexer or the first heavy forwarder the data goes through.
2. Make sure you have a "capture group" in your regular expression otherwise it won't work. e.g. LINEBREAKER = \<\/Doc_OutPut\>([\r\n]*)

0 Karma

manderson7
Contributor

LINE_BREAKER did the trick, with the capture group. Didn't know that was required.
Still not getting field names.
props.conf

[ocr_xml]
REPORT-ocr_xml_pairs = ocr_xml_pairs

transforms.conf

[ocr_xml_pairs]
REGEX = `|<Field_Name>\s*(?<Name>.*?)\s*<\/Field_Name>\n.*?<Field_Value>\s*(?<_Value>.*?)\s*<\/Field_Value>.*?
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...