Splunk Search

Can I transform data and extract fields at once?

stefanlasiewski
Contributor

Our Splunk server receives data via syslog. As a result, I need to transform the syslog data using transforms.conf and props.conf (Details in the question "Why does Splunk not recognize standard fields in my Apache data forwarded by syslog?".

My question, can I transform the data and still do some field extraction on that data? I would like to preserve the process field. However, the default transform simply strips out the data. It doesn't save any of the fields.

So, given the following transformation in local/props.conf:

[syslog]
TRANSFORMS-strip-syslog-header = syslog-header-stripper-ts-host-proc

And this default transform from default/transforms.conf:

# This will strip out date stamp, host, process with pid and just get the 
# actual message
[syslog-header-stripper-ts-host-proc]
REGEX         = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s.*?:\s(.*)$
FORMAT        = $1
DEST_KEY      = _raw

Can I somehow preserve one of the fields and save it to the name of process?

I have had some luck with the following pattern, saved at https://www.regex101.com/r/iK8iX5/1 . However, I am uncertain how to use this in a Splunk Transform.

^(?<SyslogPri><\d+>)(?<SyslogDate>[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+)\s(?<SyslogHost>.*)\s(?<process>.*):\s(?<SyslogMessage>.*)$

bmacias84
Champion

Hello @stefanlasiewski,
Your regex statement will work just find by simply adding it to the REGEX settings in the transforms.conf. I do what you are doing all the time.



[syslog-header-stripper-ts-host-proc]
REGEX = yourRegex statement

This will working for search search time extraction, but are you trying to create an Index time extract? In your example it seems like you are trying to overwrite the _raw data.

0 Karma

stefanlasiewski
Contributor

I don't really care where this extraction is happening. I'm fine with anywhere, as long as it's fast and easy to do. I just want to use the fields. I'm only using the _raw data because that's what the Splunks docs suggest, and I'm using the default Transform named syslog-header-stripper-ts-host-proc from default/transforms.conf.

0 Karma

stefanlasiewski
Contributor

Can you show me an example that you use for the FORMAT and DEST_KEY? I'm confused by how those should be used.

0 Karma

jayannah
Builder

You can't at the index time, but at search time as FORMAT take multi name-value pairs.

Refer : http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/Transformsconf

FORMAT =
* NOTE: This option is valid for both index-time and search-time field extraction. However, FORMAT
behaves differently depending on whether the extraction is performed at index time or
search time.
* This attribute specifies the format of the event, including any field names or values you want
to add.
* FORMAT for index-time extractions:
* Use $n (for example $1, $2, etc) to specify the output of each REGEX match.
* If REGEX does not have n groups, the matching fails.
* The special identifier $0 represents what was in the DEST_KEY before the REGEX was performed.
* At index time only, you can use FORMAT to create concatenated fields:
* FORMAT = ipaddress::$1.$2.$3.$4
* When you create concatenated fields with FORMAT, "$" is the only special character. It is
treated as a prefix for regex-capturing groups only if it is followed by a number and only
if the number applies to an existing capturing group. So if REGEX has only one capturing
group and its value is "bar", then:
* "FORMAT = foo$1" yields "foobar"
* "FORMAT = foo$bar" yields "foo$bar"
* "FORMAT = foo$1234" yields "foo$1234"
* "FORMAT = foo$1\$2" yields "foobar\$2"
* At index-time, FORMAT defaults to ::$1
* FORMAT for search-time extractions:
* The format of this field as used during search time extractions is as follows:
* FORMAT = ::( ::)*
* where:
* field-name = [|$]
* field-value = [|$]

* Search-time extraction examples:

    * 1. FORMAT = first::$1 second::$2 third::other-value 

    * 2. FORMAT = $1::$2 

* If the key-name of a FORMAT setting is varying, for example $1 in the
  example 2 just above, then the regex will continue to match against
  the source key to extract as many matches as are present in the text.
* NOTE: You cannot create concatenated fields with FORMAT at search time. That 
  functionality is only available at index time.
* At search-time, FORMAT defaults to an empty string
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...