Solved: Creating Custom Field Extractions

snix · ‎03-30-2020

I am trying to add some field extractions for a log file created by Entrust IdentityGurard authentication solution. Currently when I read it in I read it with a SourceType of log4j as the application outlines it formats the logs in. Things look okay but the fields specific to the log are not being extracted. I am looking into how I can build a custom extraction myself because I have always wanted to learn how it works but figured I would also post the question here to get some tips and best practices.

Here is an example of one event in the log file:
[2020-03-29 18:37:51,020] [IG Audit Writer] [INFO ] [IG.AUDIT] [AUD6012] [UserNameHere] EventMessageHere

Basically, all the fields I want are wrapped in square brackets [] and the message itself is just added at the end with no square brackets.

I think I will have to build out my own custom SourceType in the SplunkHome\etc\system\local\props.conf that will just be a copy of the log4j stanza but with either a REPORT key that references a corresponding extraction in the transforms.conf file or use the EXTRACT key and put it in there using regex. Am I on the right path?

darrenfuller · ‎03-30-2020

You are absolutely on the right path.

Your sourcetype definition in props.conf would look something like this:

[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+                             # = Break on every line
SHOULD_LINEMERGE = false                           # = Use basic line break detection
TIME_PREFIX = ^\[                                  # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N                # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25                       # = stop looking for timestamp after 25 chars

EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$

Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1

Hope this helps..

./D

View solution in original post

darrenfuller · ‎03-30-2020

You are absolutely on the right path.

Your sourcetype definition in props.conf would look something like this:

[SOURCETYPENAME]
disabled = false
LINE_BREAKER = [\r\n]+                             # = Break on every line
SHOULD_LINEMERGE = false                           # = Use basic line break detection
TIME_PREFIX = ^\[                                  # = what comes before the timestamp
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N                # = strftime representation of the timestamp
MAX_TIMESTAMP_LOOKAHEAD = 25                       # = stop looking for timestamp after 25 chars

EXTRACT-01-Fields = ^\[[^\]]+\]\s+\[(?<firstfieldname>[^\]]+)\]\s+\[(?<secondfieldname>[^\]]+)\]\s+\[(?<thirdfieldname>[^\]]+)\]\s+\[(?<fourthfieldname>[^\]]+)\]\s+\[(?<username>[^\]]+)\]\s+(?<message>.+)$

Here is how the regex looks in regex101: https://regex101.com/r/LOwRwN/1

Hope this helps..

./D

snix · ‎03-30-2020

Holy you know what... That is exactly what I am looking for. Thank you for such a great and specific example! You even built out how to pull in the time from the logs which I had no idea how to do but was going to be the next part to figure out.

I was able to implemented it and verify it works exactly how I wanted.
Thank you!!!!

snix · ‎03-30-2020

After looking closer at it I did find most of the events contained a combination of multiple events into one event. Not sure why because I would think what you have would work. I don't pretend to understand much about return carriages and new lines in the little amount of programing I have to deal with but it looked good.

I took some of the output from the log file and pasted it into Notpad++ and did a show of all characters and it showed CR LF at the end of each line so that looks good to me.

That said I commented out the LINE_BREAKER line and replaced it with "BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d" which I found under the log4j stanza and it worked. Since I don't grasp 100% what I am doing I am sure this is not the best way to do it but it did get the results I was looking for.

If someone understands what is going on and would like to explain it I am all ears. I think this will end up being a good post in general for others trying to do something similar and just needs a useful example of what it would look like.

Creating Custom Field Extractions

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes