Splunk Search

Regex help

sc0tt
Builder

I'm using a sed script to clean up some events before they are indexed by Splunk in order to reduce the license usage. My raw data has some XML tags. Prior to indexing, I reformat these tags as key=value pairs. The below sed script was working correctly. However, there has been a change to the log that introduces an angle bracket character (<) which is causing the data to not be indexed as desired.

Sed script in props.conf

s/<([^\s\>]*)[^\>]*\>([^<].*?)\<\/\1\>/ \1="\2"/g

Sample data

2014-03-20 09:35:46,193 Outgoing UserSessionLog <UserId>55555555555</UserId><MsgType>Menu</MsgType><Title>My Title</Title><MenuId>1</MenuId><Text>This is some text</Text><MenuId>2</MenuId><Text><This is text with an angle bracket</Text><Internal>User Menu</Internal><IsActive>true</IsActive><SessionID>1000</SessionID>

The above sample data is indexed as:

2014-03-20 09:35:46,193 Outgoing UserSessionLog UserId="55555555555" MsgType="Menu" Title="My Title" MenuId="1" Text="This is some text" MenuId="2"<Text><This is text with an angle bracket</Text> Internal="System Menu" IsActive="true" SessionID="1000"

As you can see, the regular expression is not matching the second Text key because of the angle bracket (<) so the value is not getting assigned properly. It should be Text="<This is text with an angle bracket". I have been unable to modify the regular expression to handle this scenario.

Any help or suggestions would be greatly appreciated!

0 Karma
1 Solution

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

View solution in original post

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

somesoni2
Revered Legend

Try second option.

0 Karma

sc0tt
Builder

Thanks! Is there a way to keep the "<" if it is part of the value? Other than that, this seems to work so I may use it anyways and just discard the bracket.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...