Splunk Search

How do I write the regex to extract this field from my data?

ccsfdave
Builder

Here is the data I am trying to parse. I actually want to extract a number of fields but cannot figure out how to parse through the {0d}{0a}{20}s. For this question, what regex will pull out "Microsoft-IIS/8.5"?

May 16 06:56:02 75-vw-win7ns.net-10-1-3.dhcp.company.org pvs: 10.x.x.x:46168|10.y.y.y:80|6|6852|HTTP 4xx Detection (Client)|4{00}E{00}2{00}6{00}8{00}C{00}6{00}F{00}<{00}/{00}P{00}r{00}o{00}p{00}e{00}r{00}t{00}y{00}>{00}<{00}/{00}H{00}o{00}o{00}k{00}2{00}>{00}<{00}/{00}H{00}o{00}o{00}k{00}s{00}>{00}<{00}P{00}a{00}y{00}l{00}o{00}a{00}d{00}{20}{00}T{00}y{00}p{00}e{00}={00}"{00}i{00}n{00}l{00}i{00}n{00}e{00}"{00}/{00}>{00}<{00}T{00}a{00}r{00}g{00}e{00}t{00}H{00}o{00}s{00}t{00}>{00}D{00}T{00}-{00}S{00}C{00}C{00}M{00}P{00}R{00}O{00}D{00}0{00}1{00}.{00}A{00}D{00}.{00}S{00}F{00}G{00}|HTTP/1.1{20}401{20}Unauthorized{0d}{0a}Content-Type:{20}text/html{0d}{0a}Server:{20}Microsoft-IIS/8.5{0d}{0a}WWW-Authenticate:{20}Negotiate{0d}{0a}WWW-Authenticate:{20}NTLM{0d}{0a}X-Powered-By:{20}ASP.NET{0d}{0a}Date:{20}Mon,{20}16{20}May{20}2016{20}13:55:59{20}GMT{0d}{0a}Content-Length:{20}1293{0d}{0a}{0d}{0a}<!DOCTYPE{20}html{20}PUBLIC{20}"-//W3C//DTD{20}XHTML{20}1.0{20}Strict//EN"{20}"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">{0d}{0a}<html{20|NONE
May 16 06:56:01 75-vw-win7ns.net-10-1-3.dhcp.company.org pvs: 10.x.x.x:46168|10.y.y.y:80|6|6852|HTTP 4xx Detection (Client)|--aAbBcCdDv1234567890VxXyYzZ{0d}{0a}content-type:{20}text/plain;{20}charset=UTF-16{0d}{0a}{0d}{0a}{ff}{fe}<{00}M{00}s{00}g{00}{20}{00}S{00}c{00}h{00}e{00}m{00}a{00}V{00}e{00}r{00}s{00}i{00}o{00}n{00}={00}"{00}1{00}.{00}1{00}"{00}>{00}{0d}{00}{0a}{00}{09}{00}<{00}I{00}D{00}>{00}{{00}4{00}2{00}7{00}D{00}C{00}1{00}C{00}C{00}-{00}1{00}0{00}7{00}4{00}-{00}4{00}7{00}C{00}3{00}-{00}9{00}0{00}0{00}6{00}-{00}D{00}0{00}4{00}7{00}8{00}C{00}3{00}E{00}E{00}0{00}6{00}|HTTP/1.1{20}401{20}Unauthorized{0d}{0a}Content-Type:{20}text/html{0d}{0a}Server:{20}Microsoft-IIS/8.5{0d}{0a}WWW-Authenticate:{20}Negotiate{0d}{0a}WWW-Authenticate:{20}NTLM{0d}{0a}X-Powered-By:{20}ASP.NET{0d}{0a}Date:{20}Mon,{20}16{20}May{20}2016{20}13:55:59{20}GMT{0d}{0a}Content-Length:{20}1293{0d}{0a}{0d}{0a}<!DOCTYPE{20}html{20}PUBLIC{20}"-//W3C//DTD{20}XHTML{20}1.0{20}Strict//EN"{20}"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">{0d}{0a}<html{20|NONE
0 Karma

richgalloway
SplunkTrust
SplunkTrust

This regex string will pull out the Server name.

"Server:\{20}(?<server>[^\{]+)"
---
If this reply helps you, Karma would be appreciated.
0 Karma

ccsfdave
Builder

@richgalloway

Thanks for the regex!

See my addendum above

0 Karma

sundareshr
Legend

This regex will give the server details

.... | rex field=_raw" "Server:\{\d+\}(?<server>[^\{]+)\{" | ...

Here's a link to a great tool to learn & test regex https://regex101.com/

ccsfdave
Builder

@sundareshr

Thanks for the regex but I guess what I am really after is sort of counting {00}, {0d}{0a}, and {20} to find the right location. Your regex is dependent on "Server" being in there but when the app is Oracle Java SE, "Server" will not precede the desired extraction.

I do use regex101 but probably need a book or better tutorial because I can mash on it for a while and not make progress.

My regex was:
{0d}{0a}.+:{20}(?.+){0d}

But what I really need is the 2nd occurrence of the "{0d}{0a}" and then I need it to stop on time...the collection seems to go on for too far

0 Karma

richgalloway
SplunkTrust
SplunkTrust

What do you mean by "stop on time"? Are there any other requirements you haven't told us? What does the Oracle Java SE log entry look like?

Here's a regex string that skips 2 {0d}{0a} instances and then extracts the text between {20} and {0d}.

\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<server>[^\{]+)\{0d}
---
If this reply helps you, Karma would be appreciated.
0 Karma

ccsfdave
Builder

oh my regex in regex101.com just goes until the last {0d} in the _raw. I need it to stop at the next {0d}.

BTW, how does {0d}{0a} skip the first instance of "{0d}{0a}" and start matching at the second?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Something happened to my last comment and the regex got left out. I've added it.

Your regex continues to the end of the event because of the greedy '+' wildcards. Using the non-greedy wildcard '+?' is better, but still doesn't match what you want.

---
If this reply helps you, Karma would be appreciated.
0 Karma

sundareshr
Legend

Here. this rex should give you from second {0d}{0a} to end of time 🙂

.. | rex "\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<fields>[^\{]+.*\d{2}:\d{2})\{"
0 Karma

ccsfdave
Builder
\{0d}\{0a}.*?\{0d}\{0a}.*?\{20}(?<fields>[^\{]+)\{0d}\{0a}

Is what I used in this case but I have found what I am after shows up in different places depending on the application so building a field is just not going to work for me...I'll keep thinking about it or maybe abandon it.

Thanks!

0 Karma

jkat54
SplunkTrust
SplunkTrust

show us all your different data formats and maybe we can write regex to extract regardless if oracle or whatever. Otherwise, we've given the correct answers for the data you supplied and one of the answers should be marked as the answer.

0 Karma

jkat54
SplunkTrust
SplunkTrust

you might also consider ingesting the data with the appropriate character set so it doesnt appear encoded in UTF-16, etc.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...