Dashboards & Visualizations

Another regular expression question -- need help with "</"

sc0tt
Builder

This is similar to a question that I previously asked here

I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </ inside a tag. I don't care about the outer tags (i.e. <Message>, just the opening and closing tags that immediately follow each other.

Sample data:

2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text &#13;</Item></Message><Action/><MsgType>Menu</MsgType>

Sed script:

s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g

This almost works, but fails to capture the first Item value because of </. Is there a way that I can get this to work? The final pairs should be

UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text &#13;" MsgType="Menu"

Any help would be greatly appreciated!

UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.

index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text &#13; that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
0 Karma
1 Solution

MuS
SplunkTrust
SplunkTrust

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

View solution in original post

MuS
SplunkTrust
SplunkTrust

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

MuS
SplunkTrust
SplunkTrust

You're welcome and thx for accepting the answer 🙂

0 Karma

sc0tt
Builder

Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.

0 Karma

MuS
SplunkTrust
SplunkTrust

now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂

0 Karma

sc0tt
Builder

Thanks for your help, patiently waiting...

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...