Getting Data In

Multi-line events breaking at 257 lines despite MAX_EVENTS=40000

dbourke
Engager

I have some files I'm trying to parse into splunk, and I'm having trouble with getting large multi-line events to work properly.
The file format looks like this:

15-05-07 01:03:24.481936 
url=http://something something
content="""lots
and lots
of multiline
content
in completely
random
formats
"""
--- END PASTE RECORD ---

This works fine for most events, but some (long) events get split up into 257-line chunks and everything goes to hell.
The setup I'm using is universal forwarder -> indexers -> search head.

On the forwarder, there's a props.conf in etc/system/local, with this in it:

[source::///opt/path/*.log]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

on the indexers I have a stanza in props.conf (in a deployment app) like this:

[pastedata]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

What am I missing? When events are short, everything works fine, but any long event can break in such a way that it gets turned in to hundreds of individual events (if, for instance, the event data has multiple lines which start with timestamps). This is weird, and sometimes ends up with events happening in the future.

(side note: did you know, if you're running a real-time all-time search on splunk, on a data source that is not currently being populated, and you get to a timestamp that already existed in the data, it shows up like it was an event that just happened?)

My specific questions are:
1. why are my events being broken up early
2. when my events are broken up, why do they sometimes get broken up into chunks that don't match the line breaker settings?

I am more concerned about question 1, because if that stops happening the other one will stop too.

thanks, and let me know if you need anything else.

(edit: the regexes are actually fine, but the lt/gt characters aren't displaying properly here. I do not actually have html escapes in my regexes at this time)

0 Karma
1 Solution

dbourke
Engager

Of course, once I posted the question, I managed to make it work.

The answer is:
if you're using LINE_BREAKER and nothing else, you need to set SHOULD_LINEMERGE = false.

also you should make sure that your deployment app is deploying things, before you restart your indexers, but that is an entirely other issue.

View solution in original post

0 Karma

dbourke
Engager

Of course, once I posted the question, I managed to make it work.

The answer is:
if you're using LINE_BREAKER and nothing else, you need to set SHOULD_LINEMERGE = false.

also you should make sure that your deployment app is deploying things, before you restart your indexers, but that is an entirely other issue.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...