I'm troubleshooting some issues with one sourcetype and realized that Splunk is not indexing events very well. The format for these events is a little different, but there are clear boundaries and these are always prefixed by =LOGLEVEL REPORT====Date====, and end with two lines feeds. it would be nice if splunk could split events on these boundaries.
Example events:
=TYPE REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
How to configure the props.conf?
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)
This method assumes "TYPE" in your example was the loglevel.
Works fine with sample data I created based on your examples:
=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.
This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉
Thanks guys 🙂
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)
This method assumes "TYPE" in your example was the loglevel.
Works fine with sample data I created based on your examples:
=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.
This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉
Well, new events do not always begin with "=LOGLEVEL REPORT====" as your example shows. (Unless "TYPE" is a log level, or maybe an abstract example.) But I would do this in props.conf
[yoursourcetypehere]
TIME_PREFIX = \=\w+ REPORT====
MAX_TIMESTAMP_LOOKAHEAD=35
TIME_FORMAT=%d-%b-%Y::%H:%M:%S
EXTRACT-e1 = \=(<?loglevel>\w+) REPORT====
MAX_EVENTS = 500
This should actually be enough to get the events broken out correctly and with the right timestamp on each event. While it would be more efficient to create a LINEBREAKER to precisely identify the event boundary, I don't recommend that if you are new to Splunk or inexperienced with regular expressions.
By default, Spunk considers the line containing the timestamp to be the first line of the event. That default should work fine in your case.
BREAK_ONLY_BEFORE_DATE = true #is the default
Note that I also included a setting for MAX_EVENTS. This controls the maximum number of lines per event (it isn't well named). The default is 128 lines per event - if Splunk is not separating events properly, this also could be the cause. I set the limit to 500 arbitrarily, but you should make sure that it is set to something reasonable for your data.