In the past we had an easy LINE_BREAKER regex that broke on newlines where an ip4 was present ([\r\n]+)\d+.\d+.\d+.\d+
Now we have some logs with ip6 in addition to ip4 being logged, so I was hoping I can just do this via piping it out to alternate capture groups depending on which ip it matches:
([\r\n]+)(\d+.\d+.\d+.\d+|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))
Is there something present where splunk only expects one capture group to be here for the LINE_BREAKER regex? I'm wondering how we can handle linebreakers now that we have 2 different style of IP that can come in.
Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)
As per the props.conf documentation it says:
Example 1: LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3
* A line ending with 'end' followed a line beginning with 'begin' would
match the first branch, and the first capturing group would have a match
according to rule 1. That particular newline would become a break
between lines.
So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...
Example:
([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}
Perhaps?
If this works I'll convert it to an answer...
Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)
As per the props.conf documentation it says:
Example 1: LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3
* A line ending with 'end' followed a line beginning with 'begin' would
match the first branch, and the first capturing group would have a match
according to rule 1. That particular newline would become a break
between lines.
So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...
Example:
([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}
Perhaps?
If this works I'll convert it to an answer...
Thanks gareth, feel free to convert to answer and I will mark it as solved!