Getting Data In

Help indexing XML CDRs

jerrad
Path Finder

I have been struggling to get these XML CDRs to index correctly in Splunk without missing some data from the events.

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>

I would really like to create an event that contains <record> thru to </record> and move on to the next event, however I get events that only contain two lines here and there so one event may show

<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>

then the next event will show

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>

Instead of

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
..........
</record>

My props.conf

[aaaacct]
BREAK_ONLY_BEFORE=<recId>
MAX_EVENTS=200000
TIME_PREFIX = (?m)<startDate>

Does anyone have any suggestions on how to approach this problem?

Thanks

Jerrad

Tags (1)
0 Karma
1 Solution

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

View solution in original post

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

ziegfried
Influencer

It matches because of the use of the *-quantifier. It matches 0 or more whitespaces - hence it matches if there's nothing between the tags.

0 Karma

jerrad
Path Finder

This worked perfectly, the way that I read the LINE_BREAKER docs confused me, but I think the magic lies in

Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.

However I'm not sure how this group matched since there isn't a space, tab or linebreak between and

0 Karma

ziegfried
Influencer

Yes, your're right. Corrected it in the answer above.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Instead of MAX_EVENTS, when you use SHOULD_LINEMERGE = false, you need to increase the event limit by increasing TRUNCATE, e.g., TRUNCATE = 500000. MAX_EVENTS will have no effect since when you don't merge lines, the maximum number of lines merged is 1. (Or zero or whatever.)

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...