Getting Data In

Splunk doesn't break events correctly for monitor input.

dilipvpatel
Explorer

I am struggling to break multi-line events correctly with source defined as monitor input. Occassionally, Splunk breaks events incorrectly. If I cleanup event index and _thefishbucket index, the event that had got incorrectly broken previously gets broken correctly the second time during reindexing.

My event log files are XML-formatted.

12010961392012-07-13T13:23:58.829881ZSCOTToraclehost.domain.com1461pts/01BMFACCOUNTS30151858181902324057select * from bmf.accounts

12010961182012-07-13T13:23:49.209880ZSCOTToraclehost.domain.com1461pts/01BMFACCOUNTS30151858151902324057select * from bmf.accounts

12010961072012-07-13T13:23:38.261471ZSCOTToraclehost.domain.com1461pts/01BMFACCOUNTS30151858121902324057select count(*) from bmf.accounts

1201096552012-07-13T13:23:30.117440ZSCOTToraclehost.domain.com1461pts/01SYSTEMPRODUCT_PRIVS30151857911902324057SELECT CHAR_VALUE FROM SYSTEM.PRODUCT_PRIVS WHERE (UPPER('SQL*Plus') LIKE UPPER(PRODUCT)) AND ((USER LIKE USERID) OR (USERID = 'PUBLIC')) AND (UPPER(ATTRIBUTE) = 'ROLES')

Here is my props.conf:

LINE_BREAKER=([\r\n]+)

TIME_PREFIX=

TZ=UTC

MAX_TIMESTAMP_LOOKAHEAD=27

NO_BINARY_CHECK=1

SHOULD_LINEMERGE=true

MUST_BREAK_AFTER=

TRUNCATE=0

This is how one of the events had got broken:

estamp>2012-07-13T13:23:30.118117ZSCOTToraclehost.domain.com1461pts/01SYSDUAL30151857911902324057SELECT DECODE('A','A','1','2') FROM DUAL

As you might see the event should have got broken at "" tag, but it didn't break at that tag but it got broken in the middle of "" tag.

I will appreacite a quick reponse.

Tags (1)
0 Karma

rturk
Builder

Using your data, I used the following sourcetype configuration in props.conf:

[your_sourcetype]
BREAK_ONLY_BEFORE = <auditrecord>
SHOULD_LINEMERGE = true
TIME_PREFIX = <extended_timestamp>
pulldown_type = 1

This gave me clean extractions for what you provided (after applying "| xmlkv").

How does this fare with your larger data set?

EDIT: Also, looking at this:

LINE_BREAKER=([rn]+)<auditrecord>

Are you meaning to to look for end line characters? This will match against one or more instances of the letter 'r' or 'n'.

LINE_BREAKER=([\r\n]+)<auditrecord>

I'm no regex expert, but I think that's right.

0 Karma

rturk
Builder

Hi Dilip. A couple of things I can think of:

  • Your MAX_TIMESTAMP_LOOKAHEAD value of 27. Unlikely to be the cause of the issue, but your timestamps are far in excess of 27 characters in (I count 165), and this is unnecessary as you have consistently defined TIME_PREFIX.
  • Any reason why you're not using KV_MODE = xml?
  • Wht regex are you using for your disc_xml_header (unlikely to be related, but I like to cover off all bases)
  • Have you tried replacing the your prop/transforms with what I posted? I find starting from scratch and building up from there often helps
0 Karma

dilipvpatel
Explorer

Hi Turk, I do have REPORT-oracleaudit_xml defined in my props.conf. Here is my complete props.conf file.

[oracleaudit_xml]

LINE_BREAKER=([\r\n]+)<\AuditRecord>

TIME_PREFIX=<\Extended_Timestamp>

TZ=UTC

MAX_TIMESTAMP_LOOKAHEAD=27

NO_BINARY_CHECK=1

SHOULD_LINEMERGE=true

MUST_BREAK_AFTER=<\/AuditRecord>

TRUNCATE=0

TRANSFORMS-disc_xml_header=disc_xml_header

SEDCMD-disc_xml_end_tag=s/<\/Audit>//g

KV_MODE=none

REPORT-oracleaudit_xml=oracleaudit_xml_extractions

0 Karma

rturk
Builder

That was just something I put in there for testing purposes to manually select the sourcetype.

0 Karma

dilipvpatel
Explorer

Hi R. Turk, What pulldown_type=1 does in props.conf output?

0 Karma

rturk
Builder

Hmm... this gives me the event extractions & fields:

props.conf

[your_sourcetype]
LINE_BREAKER = ([\r\n]+)
MUST_BREAK_AFTER =
REPORT-field_extract = oracleaudit_xml_extractions
SHOULD_LINEMERGE = true
TIME_PREFIX =
pulldown_type = 1

transforms.conf

[oracleaudit_xml_extractions]
REGEX = <(\w+)>([^<]+?)
FORMAT = $1::$2

Hope this is some help 😛

0 Karma

dilipvpatel
Explorer

Yes, my XML records have line break before tag starts as well as line break after tag.

In short my event starts with tag and ends with . The new line character precedes tag and also follows tag. I do have new-line characters in between these tags but I would not like to break my event there.

0 Karma

dilipvpatel
Explorer

Secondly, I do not use xmlkv but use the following field extraction stanza in my transforms.conf.

[oracleaudit_xml_extractions]

REGEX=<(\w+)>([^<]+?)</\1>

FORMAT=$1::$2

0 Karma

dilipvpatel
Explorer

This happens occassionaly and especially for active log file. For example, let us say I generate new log file by doing some activity on my database by logging to the database and running some SQL statements, then I log off then from my database session. My DB generates new XML log file for each session. If I go to Splunk and query the events for this newly generated log file, I see one of the events broken incorrectly. Now, if I stop splunk, delete event index including _thefishbucket index and restart splunk, I find the event broken correctly that had got broken incorrectly the first time.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...