Dashboards & Visualizations

Another regular expression question -- need help with "</"

sc0tt
Builder

This is similar to a question that I previously asked here

I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </ inside a tag. I don't care about the outer tags (i.e. <Message>, just the opening and closing tags that immediately follow each other.

Sample data:

2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text &#13;</Item></Message><Action/><MsgType>Menu</MsgType>

Sed script:

s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g

This almost works, but fails to capture the first Item value because of </. Is there a way that I can get this to work? The final pairs should be

UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text &#13;" MsgType="Menu"

Any help would be greatly appreciated!

UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.

index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text &#13; that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
0 Karma
1 Solution

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

MuS
Legend

You're welcome and thx for accepting the answer 🙂

0 Karma

sc0tt
Builder

Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.

0 Karma

MuS
Legend

now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂

0 Karma

sc0tt
Builder

Thanks for your help, patiently waiting...

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...