Here is the data I am trying to parse. I actually want to extract a number of fields but cannot figure out how to parse through the {0d}{0a}{20}s. For this question, what regex will pull out "Microsoft-IIS/8.5"?
May 16 06:56:02 75-vw-win7ns.net-10-1-3.dhcp.company.org pvs: 10.x.x.x:46168|10.y.y.y:80|6|6852|HTTP 4xx Detection (Client)|4{00}E{00}2{00}6{00}8{00}C{00}6{00}F{00}<{00}/{00}P{00}r{00}o{00}p{00}e{00}r{00}t{00}y{00}>{00}<{00}/{00}H{00}o{00}o{00}k{00}2{00}>{00}<{00}/{00}H{00}o{00}o{00}k{00}s{00}>{00}<{00}P{00}a{00}y{00}l{00}o{00}a{00}d{00}{20}{00}T{00}y{00}p{00}e{00}={00}"{00}i{00}n{00}l{00}i{00}n{00}e{00}"{00}/{00}>{00}<{00}T{00}a{00}r{00}g{00}e{00}t{00}H{00}o{00}s{00}t{00}>{00}D{00}T{00}-{00}S{00}C{00}C{00}M{00}P{00}R{00}O{00}D{00}0{00}1{00}.{00}A{00}D{00}.{00}S{00}F{00}G{00}|HTTP/1.1{20}401{20}Unauthorized{0d}{0a}Content-Type:{20}text/html{0d}{0a}Server:{20}Microsoft-IIS/8.5{0d}{0a}WWW-Authenticate:{20}Negotiate{0d}{0a}WWW-Authenticate:{20}NTLM{0d}{0a}X-Powered-By:{20}ASP.NET{0d}{0a}Date:{20}Mon,{20}16{20}May{20}2016{20}13:55:59{20}GMT{0d}{0a}Content-Length:{20}1293{0d}{0a}{0d}{0a}<!DOCTYPE{20}html{20}PUBLIC{20}"-//W3C//DTD{20}XHTML{20}1.0{20}Strict//EN"{20}"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">{0d}{0a}<html{20|NONE
May 16 06:56:01 75-vw-win7ns.net-10-1-3.dhcp.company.org pvs: 10.x.x.x:46168|10.y.y.y:80|6|6852|HTTP 4xx Detection (Client)|--aAbBcCdDv1234567890VxXyYzZ{0d}{0a}content-type:{20}text/plain;{20}charset=UTF-16{0d}{0a}{0d}{0a}{ff}{fe}<{00}M{00}s{00}g{00}{20}{00}S{00}c{00}h{00}e{00}m{00}a{00}V{00}e{00}r{00}s{00}i{00}o{00}n{00}={00}"{00}1{00}.{00}1{00}"{00}>{00}{0d}{00}{0a}{00}{09}{00}<{00}I{00}D{00}>{00}{{00}4{00}2{00}7{00}D{00}C{00}1{00}C{00}C{00}-{00}1{00}0{00}7{00}4{00}-{00}4{00}7{00}C{00}3{00}-{00}9{00}0{00}0{00}6{00}-{00}D{00}0{00}4{00}7{00}8{00}C{00}3{00}E{00}E{00}0{00}6{00}|HTTP/1.1{20}401{20}Unauthorized{0d}{0a}Content-Type:{20}text/html{0d}{0a}Server:{20}Microsoft-IIS/8.5{0d}{0a}WWW-Authenticate:{20}Negotiate{0d}{0a}WWW-Authenticate:{20}NTLM{0d}{0a}X-Powered-By:{20}ASP.NET{0d}{0a}Date:{20}Mon,{20}16{20}May{20}2016{20}13:55:59{20}GMT{0d}{0a}Content-Length:{20}1293{0d}{0a}{0d}{0a}<!DOCTYPE{20}html{20}PUBLIC{20}"-//W3C//DTD{20}XHTML{20}1.0{20}Strict//EN"{20}"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">{0d}{0a}<html{20|NONE
This regex string will pull out the Server name.
"Server:\{20}(?<server>[^\{]+)"
@richgalloway
Thanks for the regex!
See my addendum above
This regex will give the server details
.... | rex field=_raw" "Server:\{\d+\}(?<server>[^\{]+)\{" | ...
Here's a link to a great tool to learn & test regex https://regex101.com/
@sundareshr
Thanks for the regex but I guess what I am really after is sort of counting {00}, {0d}{0a}, and {20} to find the right location. Your regex is dependent on "Server" being in there but when the app is Oracle Java SE, "Server" will not precede the desired extraction.
I do use regex101 but probably need a book or better tutorial because I can mash on it for a while and not make progress.
My regex was:
{0d}{0a}.+:{20}(?
But what I really need is the 2nd occurrence of the "{0d}{0a}" and then I need it to stop on time...the collection seems to go on for too far
What do you mean by "stop on time"? Are there any other requirements you haven't told us? What does the Oracle Java SE log entry look like?
Here's a regex string that skips 2 {0d}{0a}
instances and then extracts the text between {20}
and {0d}
.
\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<server>[^\{]+)\{0d}
oh my regex in regex101.com just goes until the last {0d} in the _raw. I need it to stop at the next {0d}.
BTW, how does {0d}{0a} skip the first instance of "{0d}{0a}" and start matching at the second?
Something happened to my last comment and the regex got left out. I've added it.
Your regex continues to the end of the event because of the greedy '+' wildcards. Using the non-greedy wildcard '+?' is better, but still doesn't match what you want.
Here. this rex should give you from second {0d}{0a}
to end of time 🙂
.. | rex "\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<fields>[^\{]+.*\d{2}:\d{2})\{"
\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<fields>[^\{]+)\{0d}\{0a}
Is what I used in this case but I have found what I am after shows up in different places depending on the application so building a field is just not going to work for me...I'll keep thinking about it or maybe abandon it.
Thanks!
show us all your different data formats and maybe we can write regex to extract regardless if oracle or whatever. Otherwise, we've given the correct answers for the data you supplied and one of the answers should be marked as the answer.
you might also consider ingesting the data with the appropriate character set so it doesnt appear encoded in UTF-16, etc.