Splunk Search

A list of common regular expressions for field extractions?

stefanlasiewski
Contributor

Splunk isn't extracting certain fields from my logs. This includes basic things such as IP addresses.

It seems that I need to build regular expressions so that Splunk will recognize my data better. Here are some things which I need Splunk to recognize:

  1. 1.1.1.1 and 192.168.100.100 are IPv4 addresses. Regex is something like (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
  2. IPv6 addresses. The regex for this is difficult. Very difficult, which is why I was hoping that Splunk would do this for me, and save me time.
  3. 1.1.1.1:8080 is an IP address with a port
  4. foo@example.gov is an email address.

The examples above are extremely common. Is there a list of common regular expressions which I can import into Splunk so that I don't need to experiment with dozens of regular expression strings?

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

While there are plenty of regex sites that can provide these regexes, it isn't all that useful in most cases. A field extraction is usually defined by absolute position (e.g., 5rd word in the line) or its location relative to fixed characters (e.g., string after src_addr= until the next space, or string starting after <addr> until you see </addr>). So trying to force the regex to match the exact thing you're looking for is rarely necessary. Usually, once you have located it, it's sufficient to say "string of non-space characters" (\S*) or "sequence of hex digits and colons" ([0-9a-zA-Z\:]* or [[:xdigit:]:]). So typically, it's less important to know how to match or validate against the data type itself as much as to match to locate it within a log entry. This unfortunately is more dependent on your log format, and less likely to be found in the wild.

stefanlasiewski
Contributor

I was under the impression that fields are not position-based. e.g. If I want Splunk to identify an IPv6 field anywhere on the line, I need to use the interactive field extractor to define the IPv6 field based on a regular expression.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...