Splunk Search

How to configure Splunk to parse and extract fields from my pseudo-XML sample data?

DMohn
Motivator

Hi Splunkers,

I have a question regarding the input extraction of XML fields (with inputs and transforms).
I have tried to follow the advice in this post:
https://answers.splunk.com/answers/683/xml-input-line-breaking-and-field-extraction-how.html
but have not been successful yet, since the XML-structure of my data is somehow different.

Here's the data:

<ClientStatistics refDate="2015-11-10T09:47:46.888+01:00"><RequestStatistics><Client created="2015-09-10T23:25:17.523+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:45.279+01:00" pollCount="3342838" pollThroughput="1563"/><Client created="2015-09-10T23:25:21.751+02:00" id="IDxxxx" lastPoll="2015-11-10T09:46:02.196+01:00" pollCount="45031" pollThroughput="116030"/><Client created="2015-09-10T23:25:30.007+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.850+01:00" pollCount="16640185" pollThroughput="314"/><Client created="2015-09-10T23:25:17.516+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.432+01:00" lastPush="2015-11-10T09:47:46.360+01:00" pollCount="40604184" pollThroughput="129" pushCount="11646891" pushThroughput="449"/><Client created="2015-09-17T11:13:03.268+02:00" id="IDxxxx" lastPoll="2015-09-17T11:29:03.415+02:00" pollCount="9" pollThroughput="120018"/><Client created="2015-09-17T11:16:03.552+02:00" id="IDxxxx" lastPoll="2015-11-09T08:02:02.497+01:00" pollCount="300" pollThroughput="15237597"/></RequestStatistics></ClientStatistics>

Yes, it's pretty unstructured, and it's not clean XML...

I have tried to put KV-MODE = xml in my inputs.conf, with no effect. Also, the other suggested setting, like BREAK_ONLY_BEFORE or LINE_BREAKER did not split my events.

I understand, that there should be the possibility to extract the KV-pairs inside the <Client> Tags somehow, maybe with an additional transform command. I figured it sould be REGEX = (\w+)="([^"]+)" and FORMAT = $1::$2 inside transforms.conf - but I am missing the connection.

Can somebody please enlight me?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

DMohn
Motivator

Thanks a ton - this was a setting I actually didn't try yet 🙂

With one small modification (stripping the closing slash as well) it works perfectly!

 SHOULD_LINEMERGE=false
 LINE_BREAKER=(/><)
 TIME_PREFIX=refDate=
0 Karma

richgalloway
SplunkTrust
SplunkTrust

What values of BREAK_ONLY_BEFORE and LINE_BREAKER have you tried?

---
If this reply helps you, Karma would be appreciated.
0 Karma

DMohn
Motivator

I have tried numerous versions of RegExes, started with a simple '<', '

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...