Splunk Search

Regex help

sc0tt
Builder

I'm using a sed script to clean up some events before they are indexed by Splunk in order to reduce the license usage. My raw data has some XML tags. Prior to indexing, I reformat these tags as key=value pairs. The below sed script was working correctly. However, there has been a change to the log that introduces an angle bracket character (<) which is causing the data to not be indexed as desired.

Sed script in props.conf

s/<([^\s\>]*)[^\>]*\>([^<].*?)\<\/\1\>/ \1="\2"/g

Sample data

2014-03-20 09:35:46,193 Outgoing UserSessionLog <UserId>55555555555</UserId><MsgType>Menu</MsgType><Title>My Title</Title><MenuId>1</MenuId><Text>This is some text</Text><MenuId>2</MenuId><Text><This is text with an angle bracket</Text><Internal>User Menu</Internal><IsActive>true</IsActive><SessionID>1000</SessionID>

The above sample data is indexed as:

2014-03-20 09:35:46,193 Outgoing UserSessionLog UserId="55555555555" MsgType="Menu" Title="My Title" MenuId="1" Text="This is some text" MenuId="2"<Text><This is text with an angle bracket</Text> Internal="System Menu" IsActive="true" SessionID="1000"

As you can see, the regular expression is not matching the second Text key because of the angle bracket (<) so the value is not getting assigned properly. It should be Text="<This is text with an angle bracket". I have been unable to modify the regular expression to handle this scenario.

Any help or suggestions would be greatly appreciated!

0 Karma
1 Solution

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

View solution in original post

somesoni2
Revered Legend

Try this SED

To remove "<" from value of Text:-

s/<([^\s\>]*)[^\>]*\>[<]*([^<].*?)\<\/\1\>/ \1=\"\2\"/g

To keep the "<" with value of Text:-

s/<([^\s\>]*)[^\>]*\>(.*?)\<\/\1\>/ \1=\"\2\"/g

somesoni2
Revered Legend

Try second option.

0 Karma

sc0tt
Builder

Thanks! Is there a way to keep the "<" if it is part of the value? Other than that, this seems to work so I may use it anyways and just discard the bracket.

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...