I'm running into a problem where some events are parsed in the middle versus from the beginning of the string. For the below data, I received the following
Why is the second line starting in the middle of the event instead of the first character?
Log.txt:
dTrace DDCDI1MSVC201_DD.DTO.Users.GetUserInfoResponse GetOne(System.String) 9:00:17 AM.018 2016-4 -21 [124] w3wp 36020 DD.Common.Logging.Infrastructure.LogManager DD.Common.Logging.Infrastructure.ILogManager.Log 0 0.0.0.0 0.0.0.0 {TAG:DD1FE36020VINT>6A2A0A0A.443} {CTX:0} : N/A Exit returned GD.DTO.Users.GetUserInfoResponse at 9:00:17 AM
dTrace DDCDI1MSVC201_DD.DTO.Users.GetUserInfoResponse Execute[Nullable`1,GetUserInfoResponse](DD.DAL.DBContext.UserProfileEntities, System.Func`2[System.Nullable`1[System.Guid],DD.DTO.Users.GetUserInfoResponse], System.Nullable`1[System.Guid]) 9:00:17 AM.018 2016-4 -21
dTrace DDCDI1MSVC201_DD.DTO.Users.GetUserInfoResponse GetOne(System.String) 9:00:17 AM.018 2016-4 -21 [124] w3wp 36020 DD.Common.Logging.Infrastructure.LogManager DD.Common.Logging.Infrastructure.ILogManager.Log 0 0.0.0.0 0.0.0.0 {TAG:DD1FE36020VINT>6A2A0A0A.441} {CTX:0} : N/A Enter at 9:00:17 AM
props.conf:
[customparse]
DATETIME_CONFIG = /etc/apps/search/local/datetime.xml
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
KV_MODE = none
EXTRACT-m = ^(?<logMsgType>d[^ ]+) +(?<domain>.+?) (?<time>\d{1,2}:[0-5]\d:[0-5]\d (?:A|P)M)\.(?<time_milliseconds>\d{3}) (?<date>\d{4}-\d{1,2} ?-\d{1,2} ?) (?<threadId>\[[^\]]*\]) (?<processName>[^ ]+) +(?<processId>\d+) (?<moduleName>[^ ]+) +(?<methodName>[^ ]+) +(?<errorCode>\d+) +(?<clientIp>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) +(?<serverIp>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) +(?<message>.*)(?<exception>\n(\>[^\n]+\n)*)?
MAX_TIMESTAMP_LOOKAHEAD = 50
SHOULD_LINEMERGE = true
TIME_PREFIX = ^d[^ ]+ +.+?(?= \d{1,2}:[0-5]\d:[0-5]\d (?:A|P)M)
datetime.xml:
<datetime>
<define name="ddtimestamp" extract="hour, minute, second, ampm, subsecond, year, month, day">
<text>(\d{1,2}):([0-5]\d):([0-5]\d) (AM|PM)\.(\d{3}) (\d{4})-(\d{1,2}) ? -(\d{1,2}) ?</text>
</define>
<timePatterns>
<use name="ddtimestamp"/>
</timePatterns>
<datePatterns>
<use name="ddtimestamp"/>
</datePatterns>
</datetime>
If I'm correct in reading that each line is its own event, and if there are no multi-line events (read as: containing a new line character, \n or \r) within the log, the simplest fix is to add the setting SHOULD_LINEMERGE = false. You could also consider a custom LINE_BREAKER written as a regular expression to "capture the text which occurs between events, usually having a non-capturing lookahead". In this case, the LINE_BREAKER becomes an event breaker, and you don't have to engage the linemerging behavior. Its default heuristics can get confused by the placement of timestamps in the middle of events, in my experience.
Did the copy/paste of the EXTRACT-m line work properly? It doesn't look like it extracts what your log lines say they have in them.
As I look a bit more I can see why you might have issues. That is some nasty logfile formatting.
One technique I've used successfully, since no one's tried to answer yet, is to break this into smaller pieces, with the idea being to break off chunks at the front that work well and are consistent, then use the remaining "field" with all the rest of the text in a completely different regex to parse it. That way you can customize it a bit better for different log lines.
I've taken out the names, but this gets me all three lines properly done up to the date.
^\s+(?<logMsgType>d[^ ]+)\s+(?<domain>.*)\s+.*(\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM)\.\d{3}\s\d{4}-\d{1,2}\s?-\d{1,2}\s?)
But only if non-greedy. If you put a new field (?<blah>.*)$
at the end, you could then feed that into another regex using that field as the input and root it again at the front ^
and, oh, perhaps use another anchor like "w3wp" to get more consistent results.
Sorry, laptop at 5%, might check in on this tomorrow.