I have been struggling to get these XML CDRs to index correctly in Splunk without missing some data from the events.
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
I would really like to create an event that contains <record> thru to </record>
and move on to the next event, however I get events that only contain two lines here and there so one event may show
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
then the next event will show
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
Instead of
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
..........
</record>
My props.conf
[aaaacct]
BREAK_ONLY_BEFORE=<recId>
MAX_EVENTS=200000
TIME_PREFIX = (?m)<startDate>
Does anyone have any suggestions on how to approach this problem?
Thanks
Jerrad
I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:
[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>
I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:
[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>
It matches because of the use of the *-quantifier. It matches 0 or more whitespaces - hence it matches if there's nothing between the tags.
This worked perfectly, the way that I read the LINE_BREAKER docs confused me, but I think the magic lies in
Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.
However I'm not sure how this group matched since there isn't a space, tab or linebreak between and
Yes, your're right. Corrected it in the answer above.
Instead of MAX_EVENTS, when you use SHOULD_LINEMERGE = false
, you need to increase the event limit by increasing TRUNCATE
, e.g., TRUNCATE = 500000
. MAX_EVENTS
will have no effect since when you don't merge lines, the maximum number of lines merged is 1. (Or zero or whatever.)