Splunk Search

Trouble with regex in transforms.conf

sc0tt
Builder

I am filtering events in transforms.conf but I cannot seem to get the regex to match. When I test the regex in Search it works as expected and even when tested at http://gskinner.com/RegExr/.

I'm trying to match on the MsgType tag.

Sample event:

2013-10-28 4:36:38,322  <?xml version="1.0" encoding="UTF-8"?><INTERFACE><MsgType>SendMessage</MsgType><Emailaddress>user@example.com</Emailaddress><Userid>9999999999999</Userid><FolderName>inbox</FolderName><Alerts>false</Alerts><Ack>true</Ack><To>user@example.com</To></INTERFACE>

Below are variations that I tried that all seem to work but not when used in transforms.conf

^(.*<MsgType>(SendMessage|ReplyMessage)\b<\/MsgType>).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<\/).*$

^(.*<MsgType.(SendMessage|ReplyMessage)\b<.MsgType.).*$

^(.*MsgType.(SendMessage|ReplyMessage)\b..MsgType).*$

^(.*<[^<]*MsgType[^>]*>(SendMessage|ReplyMessage)\b<\/[^<\/]*MsgType[^>]*>).*$

This works but isn't ideal ^(.*MsgType.(SendMessage|ReplyMessage)\b).*$

What's the proper way to escape the opening/closing tags?

0 Karma
1 Solution

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

View solution in original post

Ayn
Legend

First of all there's no need for anchor your matches with ^.* and .*$. The regex engine will automatically find what you're after anyway. You don't need to escape either of the characters you're escaping.

<MsgType>(SendMessage|ReplyMessage)</MsgType>

should work just fine.

sc0tt
Builder

It looks like my issue was due to the fact that SED-* entries are executed prior to TRANSFORMS-*

0 Karma

sc0tt
Builder

As a follow up, running certain sed scripts seem to work without issue while others cause the event to never get indexed. For example, running SEDCMD-format= s/Emailaddress/Email/g after TRANSFORMS-set= setnull,keep in props.conf works but SEDCMD-format= s/(.*)<MsgType>(.*)<\/MsgType>.*/\1 MsgType=\2/ does not and the event is never indexed. Any ideas?

0 Karma

sc0tt
Builder

Thank you. You are correct and this does work just fine. It seems that a sed script running after the transforms was the issue. I thought it was the regex that was the problem.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...