Dashboards & Visualizations

Extracting XML log files

samiomer
Path Finder

Hello,

I'm trying to perform searches on XML log files that I have on my drive. I'm facing issues in setting up Splunk to extract each log entry properly. My log file looks something like this:

<?xml version="1.0"?>
<Logs>
    <Log>
        <HOST>127.0.0.1</HOST>
        <DATE>Wed Jul 07 11:41:42 EDT 2011</DATE>
        <FACILITY>8</FACILITY>
        <LEVEL>6</LEVEL>
    </Log>
    <Log>
        <HOST>127.0.0.1</HOST>
        <DATE>Wed Jul 07 11:41:45 EDT 2011</DATE>
        <FACILITY>5</FACILITY>
        <LEVEL>6</LEVEL>
    </Log>
</Logs>

So, each log entry is enclosed by a . I've configured my configuration files (under $SPLUNK_HOME\etc\system\local) as follows:

inputs.conf

[monitor://C:\data\]
sourcetype = log_xml
_whitelist = .*\.xml
crcSalt = <SOURCE>

All of my XML logs reside under C:\data\

props.conf

[log_xml]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^<Log>
MUST_BREAK_AFTER = </Log>

These are the only changes that I've made hoping that the XML extraction would work. Are there any other configuration files that I have to modify in order to make this work?

I've noticed that the changes I made to props.conf do not show up in the web interface under Manager->Fields->Field Extractions. Am I missing something here?

Thanks

Tags (4)
1 Solution

samiomer
Path Finder

Was finally able to find why my props.conf changes weren't being picked. Had to clean my index.
I tried restarting the splunk server many times, but that didn't re-index my files. I ran the following from the command line:
splunk clean eventdata

View solution in original post

samiomer
Path Finder

Was finally able to find why my props.conf changes weren't being picked. Had to clean my index.
I tried restarting the splunk server many times, but that didn't re-index my files. I ran the following from the command line:
splunk clean eventdata

samiomer
Path Finder

Thanks for your help. I'm only using now the BREAK_ONLY_BEFORE and modified my inputs.conf as you've posted (also removed the crcSalt cause I'm not sure why I was using it either :). Unfortunately, this still is not working for me. The event separation using the regex isn't happening. I've tried multiple values for BREAK_ONLY_BEFORE (including the one you pointed out) and they all seem to be ignored since they don't make any difference when I look at my search results (search results look the same as if I never made this modification to props.conf).

I think my problem is that the breaking is not happening for some reason. Is there anything I'm missing configuration wise? The only changes I've made so far were the ones I pointed out in inputs.conf props.conf.

Thanks

0 Karma

lguinn2
Legend

FIrst, don't use both BREAK_ONLY_BEFORE and MUST_BREAK_AFTER. Choose one. If you choose BREAK_ONLY_BEFORE, then you need to update your regular expression

BREAK_ONLY_BEFORE = ^\s*<Log>

because whitespace is significant in regular expressions.

The line-breaking rules are not field extractions, and they do not appear anywhere in the web interface under Manager.

Here is how your stanza in inputs.conf might look:

[monitor://C:\data]
sourcetype=log_xml
whitelist=\.xml$

_whitelist is deprecated. Use whitelist instead. Also, I am not clear why you need crcSalt, but okay.

This should get the events into Splunk. You may need to do some additional work to get the timestamps properly recognized, but Splunk may be able to pick them out correctly once you follow the steps above. Let us know how it works!

Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...