Splunk Search

Trouble with regex in transforms.conf

sc0tt
Builder

I am filtering events in transforms.conf but I cannot seem to get the regex to match. When I test the regex in Search it works as expected and even when tested at http://gskinner.com/RegExr/.

I'm trying to match on the MsgType tag.

Sample event:

2013-10-28 4:36:38,322  <?xml version="1.0" encoding="UTF-8"?><INTERFACE><MsgType>SendMessage</MsgType><Emailaddress>user@example.com</Emailaddress><Userid>9999999999999</Userid><FolderName>inbox</FolderName><Alerts>false</Alerts><Ack>true</Ack><To>user@example.com</To></INTERFACE>

Below are variations that I tried that all seem to work but not when used in transforms.conf

^(.*<MsgType>(SendMessage|ReplyMessage)\b<\/MsgType>).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<\/).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<.MsgType.).*$

^(.*MsgType.(SendMessage|ReplyMessage)\b..MsgType).*$

^(.*<[^<]*MsgType[^>]*>(SendMessage|ReplyMessage)\b<\/[^<\/]*MsgType[^>]*>).*$

This works but isn't ideal ^(.*MsgType.(SendMessage|ReplyMessage)\b).*$

What's the proper way to escape the opening/closing tags?

0 Karma
1 Solution

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

View solution in original post

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

sc0tt
Builder

It looks like my issue was due to the fact that SED-* entries are executed prior to TRANSFORMS-*

0 Karma

sc0tt
Builder

As a follow up, running certain sed scripts seem to work without issue while others cause the event to never get indexed. For example, running SEDCMD-format= s/Emailaddress/Email/g after TRANSFORMS-set= setnull,keep in props.conf works but SEDCMD-format= s/(.*)<MsgType>(.*)<\/MsgType>.*/\1 MsgType=\2/ does not and the event is never indexed. Any ideas?

0 Karma

sc0tt
Builder

Thank you. You are correct and this does work just fine. It seems that a sed script running after the transforms was the issue. I thought it was the regex that was the problem.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...