Dashboards & Visualizations

Another regular expression question -- need help with "</"

sc0tt
Builder

This is similar to a question that I previously asked here

I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </ inside a tag. I don't care about the outer tags (i.e. <Message>, just the opening and closing tags that immediately follow each other.

Sample data:

2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text &#13;</Item></Message><Action/><MsgType>Menu</MsgType>

Sed script:

s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g

This almost works, but fails to capture the first Item value because of </. Is there a way that I can get this to work? The final pairs should be

UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text &#13;" MsgType="Menu"

Any help would be greatly appreciated!

UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.

index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text &#13; that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
0 Karma
1 Solution

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi sc0tt,

UPDATE: try this as matching regex in your sed command:

<\w+>(((?![</]\w).)+)\<\/\w+>

hope that helps ...

cheers, MuS

MuS
Legend

You're welcome and thx for accepting the answer 🙂

0 Karma

sc0tt
Builder

Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.

0 Karma

MuS
Legend

now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂

0 Karma

sc0tt
Builder

Thanks for your help, patiently waiting...

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...