I have a really big file that I'm trying to subdivide. It has a lot of different subsections, and one of them is called "logs" which contains any number of log messages which each starts with a timestamp (mm/dd/yyyy hh:mm:ss). Below I've listed my props and transforms stanzas.
Splunk is currently taking the sections as separated by "@@@" properly (I've left off the dozen or so other "TRANSFORMS-sourcetype" rules below), and when it gets to the logs section it breaks off the logs into one gigantic event rather than individually pulling each timestamped part as its own event. I'd prefer the latter. I've been playing around with it for awhile now and I haven't made any headway. Anyone have any tips?
Props.conf
[source::sourcename]
BREAK_ONLY_BEFORE = @@@
SHOULD_LINEMERGE = true
TRANSFORMS-sourcetype = set_log
Transforms.conf
[set_log]
REGEX = \d{2}/\d{2}/\d{4}\s\d{2}:\d{2}
FORMAT = sourcetype::log
DEST_KEY = MetaData:Sourcetype
I tried to answer this a minute ago but it didn't take. I'll try again:
I eventually solved the problem by using the following three lines in my in props.conf stanza:
BREAK_ONLY_BEFORE = \d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s<
MUST_BREAK_AFTER = @@@
SHOULD_LINEMERGE = true
The BREAK_ONLY_BEFORE is set to uniquely pick up the start of a log line and breaks each log into its own event, and the MUST_BREAK_AFTER cuts the event at the end of its section (delimited by the triple-@) which correctly sections the multi-line events. I checked the edges and it looks sound.
I tried to answer this a minute ago but it didn't take. I'll try again:
I eventually solved the problem by using the following three lines in my in props.conf stanza:
BREAK_ONLY_BEFORE = \d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s<
MUST_BREAK_AFTER = @@@
SHOULD_LINEMERGE = true
The BREAK_ONLY_BEFORE is set to uniquely pick up the start of a log line and breaks each log into its own event, and the MUST_BREAK_AFTER cuts the event at the end of its section (delimited by the triple-@) which correctly sections the multi-line events. I checked the edges and it looks sound.
I should also note that where I said "Some Stuff" and "More Stuff" and so on, it could be one line or dozens. The unmodified input is typically several hundred lines, and I can't change how that comes in. I can only try to deal with it at index time or search time.
The input that comes in is very long. A nonspecific abridged example would be:
@@@
Some Stuff
@@@
More Stuff
@@@
01/01/2000 06:06:06 Log message here
01/01/2000 05:05:05 Other log message here
@@@
Even More Stuff
And so on. In the above, "Some Stuff" would go to one sourcetype as one event, "More Stuff" would go to another sourcetype as one event, and then the two log messages would go to the "log" sourcetype as two events. Ideally.
Now, I've seen this work before, I just don't know how to get there. If I set SHOULD_LINEMERGE to false I get the behavior I want for the logs section but not for all the others.
We don't do two passes through the aggregator, so re-sepearating sections that you made into events isn't a thing that can be done. Somehow you're going to have to get them into the right events the first time around, or preprocess the data.
Transforms only operates on already established events, modifying them. It has no power to turn one event into multiple events.
Can you provide some sample log entries?
Does it help if you change SHOULD_LINEMERGE
to false
? It seems like making it true
contributes to bundling lines into multi-line events, based on http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Propsconf