This is similar to a question that I previously asked here
I am using a sed script to format a log at index time. The raw data contains some XML tags that can include any type of characters. Through previous help from here and lots of trial and error, I am able to extract everything I need except for in cases when I encounter a </
inside a tag. I don't care about the outer tags (i.e. <Message>
, just the opening and closing tags that immediately follow each other.
Sample data:
2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item>This is some more text </Item></Message><Action/><MsgType>Menu</MsgType>
Sed script:
s/<([^\s\>]*)[^\>]*\>(((?![<]\/).)+)\<\/\1\>/ \1="\2"/g
This almost works, but fails to capture the first Item value because of </
. Is there a way that I can get this to work? The final pairs should be
UserId"=1234567" Key="1" Item="Some stuff </\>`1234567890~!@#$%^&*()-=_+,." Key="2" Item="This is some more text " MsgType="Menu"
Any help would be greatly appreciated!
UPDATE: Thanks to MuS I was able to get this working. There may be a better regex, but below is an example search that shows the sed script in action.
index=_internal | head 1 | eval _raw = "2014-04-09 06:20:04,519 Outgoing IOLog <INTERFACE><Message><UserId>1234567</UserId><Key>1</Key><Item>Some stuff </\>`1234567890~!@#$%^&*()-=_+,.</Item><Key>2</Key><Item><This is some more text that starts with an angle bracket</Item></Message><Action/><MsgType>Menu</MsgType>"
| rex mode=sed "s/<([^\s\>]*)[^\>]*\>(((?![<]\/\w).)+)\<\/\1\>/ \1=\"\2\"/g"
| rex mode=sed "s/<INTERFACE>|<Message>|<\/Message>|<Action\/>//g"
Hi sc0tt,
UPDATE: try this as matching regex in your sed command:
<\w+>(((?![</]\w).)+)\<\/\w+>
hope that helps ...
cheers, MuS
Hi sc0tt,
UPDATE: try this as matching regex in your sed command:
<\w+>(((?![</]\w).)+)\<\/\w+>
hope that helps ...
cheers, MuS
You're welcome and thx for accepting the answer 🙂
Many thanks! It wasn't exactly what I needed, but it helped me get it working for my needs. Another part of the issue was that the stuff between the tags could also start with a "<" character which caused more issues. I've updated the original question with a final working solution.
now it's perfect form my understanding...some regex gurus would probably find some tuning possibilities 🙂
Thanks for your help, patiently waiting...