Splunk Search

How to build a regex for a field extraction that will match a portion of the domain for the field?

matoch
New Member

I'm looking at sendmail logs and I'm trying to pull out a portion of the domain name based on the relay.

I've testing using rex and have arrived at the following command.

index=mail stat relay | rex "relay=([a-zA-Z0-9-]+\.)*(?<test123>([a-zA-Z0-9-]+\.){1}((ab|bc|mb|nb|nf|nl|ns|nt|nu|on|pe|qc|sk|yk).)?([a-zA-Z0-9-]+))(s)?" | table test123

With a log line that looks like this
Nov 12 22:24:37 some.mail.host Nov 12 22:24:37 sendmail[9056]: sAD5OZKS011800: to=********@gov.ab.ca, delay=00:00:02, xdelay=00:00:01, mailer=smtp, pri=66484, relay=something.gov.ab.ca. [XXX.XXX.XXX.XXX], dsn=2.0.0, stat=Sent (ok: Message 54730621 accepted)

Nov 13 09:34:13 some.mail.host Nov 13 09:34:13 sendmail[30002]: sADGYCM5028904: to=something@example.com, ctladdr=somethingelse@example.com (999/25), delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=37906, relay=aspmx.l.google.com. [XXX.XXX.XXX.XXX], dsn=2.0.0, stat=Sent (OK 1415896453 63si40410316iol.79 - gsmtp)

Two sample relays are
something.gov.ab.ca
something.blah.google.com

In testing I end up with ab.ca and google.com.
What I'm trying to get is gov.ab.ca and google.com.

I've played with a number of regex tools online and they seem to aggressively match the gov.ab.ca. In splunk it seems that ? after ((ab|bc|mb|nb|nf|nl|ns|nt|nu|on|pe|qc|sk|yk).) acts more like +? based on the regular expression documentation I've come across online.

Is there something I can do to get the behavior I'm looking for?

0 Karma

matoch
New Member

I've added another example. I can see where you are going with your regex and it's not quite working the way I want. You are simply pulling off the first piece of the domain which can vary. For example. If the following relays showed up in the log
1.2.3.4.google.com
1.2.3.google.com
1.2.google.com
1.google.com
something.gov.ab.ca

I want google.com for the first 4 and because ab exists right before .ca in the final one I want gov.ab.ca rather then just ab.ca.

Is that a bit clearer?

0 Karma

matoch
New Member

I can't seem to comment on the provided answer. I've fixed the query in my question and clarified where I believe the problem is.

0 Karma

Raghav2384
Motivator

Not sure if i understood correctly, let's say relay can hold
relay = something.gov.ab.ca as well as
relay = somewhere.google.com ? if so, you can use

base search|rex field=_raw "relay=\w+\.(?<Domain>.*)\s"

Pardon me if i am going tangents, can you post sample for relay=xxx.google.com as well? I assumed the pattern as "something.google.com"
Hope this helps,
Thanks,
Raghav

0 Karma

Raghav2384
Motivator

The address you are trying to extract is part of a key value pair
Try this:

.....|rex field=_raw "relay=\S+(?.*)\s"

0 Karma

Raghav2384
Motivator

Please add a back slash before S+ and s. It disappeared from my post.

0 Karma

MuS
Legend

if you mark some test and klick the 101010 symbol it will show up in the post .... magic all over the place 🙂

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...