Getting Data In

Event pattern for sourcetype

krishnani
New Member

I'm troubleshooting some issues with one sourcetype and realized that Splunk is not indexing events very well. The format for these events is a little different, but there are clear boundaries and these are always prefixed by =LOGLEVEL REPORT====Date====, and end with two lines feeds. it would be nice if splunk could split events on these boundaries.

  1. Break events based on these boundaries
  2. Define a logLevel field based on the text before "REPORT"

Example events:
=TYPE REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX

How to configure the props.conf?

Tags (1)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

View solution in original post

0 Karma

krishnani
New Member

Thanks guys 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

0 Karma

lguinn2
Legend

Well, new events do not always begin with "=LOGLEVEL REPORT====" as your example shows. (Unless "TYPE" is a log level, or maybe an abstract example.) But I would do this in props.conf

[yoursourcetypehere]
TIME_PREFIX = \=\w+ REPORT====
MAX_TIMESTAMP_LOOKAHEAD=35
TIME_FORMAT=%d-%b-%Y::%H:%M:%S
EXTRACT-e1 = \=(<?loglevel>\w+) REPORT====
MAX_EVENTS = 500

This should actually be enough to get the events broken out correctly and with the right timestamp on each event. While it would be more efficient to create a LINEBREAKER to precisely identify the event boundary, I don't recommend that if you are new to Splunk or inexperienced with regular expressions.
By default, Spunk considers the line containing the timestamp to be the first line of the event. That default should work fine in your case.

BREAK_ONLY_BEFORE_DATE = true   #is the default

Note that I also included a setting for MAX_EVENTS. This controls the maximum number of lines per event (it isn't well named). The default is 128 lines per event - if Splunk is not separating events properly, this also could be the cause. I set the limit to 500 arbitrarily, but you should make sure that it is set to something reasonable for your data.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...