Splunk Search

how to use rex command to rex out field starting from < and ending from >

m7787580
Explorer
Hi Splunker,

How would like to learn how can i rex out these fields names and i don't want to rex out startTimestamp and endTimestamp in it.

<activityName>TubeSales<activityName>
<activityStatus>Play<activityStatus>
<startTimestamp>Do not want to extract<startTimestamp>
<endTimestamp>Do not want to extract<endTimestamp>
<JourneyID>3DF62A1191152ED064B039AFD2C6A81E.node-app-1<JourneyID>
<startID>C3FE7047-E9EA-78DE-D719-8D3D66EF4A1F<startID>
<JourneyOrderPointsByProductCode>
<ProductCode>16<ProductCode>
<JourneyOrderPoints>130<JourneyOrderPoints>
<JourneyOrderPointsByProductCode>
<success>
<GetRequiredJourneyOrderPointsend>
</S:Body>

Thanks in advance

Tags (1)
0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\> 

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.


Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Two things -
1) To be proper XML or HTML, the second time the field is named, to close the tag, it must have a slash in front of it. Example:

 <activityName>TubeSales</activityName>

I'm going to assume that is the case, because otherwise you have much bigger problems than how to write the rex.

This one here will extract all the individual fields, including the two timestamps you don't want, but not including the multi-line JourneyOrderPointsByProductCode...

 \<(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\>

Here it is, built up with a negative assertion to ignore the two Timestamps...

\<(?!startTimestamp|endTimestamp)(?<fieldname>\w+)\>(?<fieldvalue>[^\<]+)\<\/?\1\> 

Both of those regexes will work for any tags that are opened and closed, even if they lack the slash in the end tag. If you verify that your markup language has the proper slashes on the close tags, then remove the very last question mark from both regexes.


Now, that all being said, you are much better off using @nikeynilay's advice and using the spath command.

0 Karma

m7787580
Explorer

Hi DalJeanis,

It was great stuff,queried worked absolutely fine.

Just wanted to ask one question

<(?!startTimestamp|endTimestamp)(?\w+)>(?[^<]+)<\/?\1>

--> <\/?\1> <----

What is actually doing this thing i am able to understand the whole query but not the use of this last part and 1 which you have written in the last.

Thanks again it was really awesome stuff.

Regards,
Tarun Malhotra

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

That whole thing is to find the closing tag for the same opening tag. That is how we avoid picking up the <success> keyword, because it is not followed by a close tag, so it is not calling out a field name and value.

\< means "match only the opening < of the next html/xml tag"
\/? means "match an optional slash \/ if it is there, but due to the ? if it is not there then that's okay too."
\1 means "match another copy of the first group that was previously matched... in this case that would be the group called fieldname"
\> means "match only the ending > of the html/xml tag"

0 Karma

m7787580
Explorer

Hi DalJeanis,

Thanks for the explanation it was really help full. 🙂

0 Karma

niketn
Legend

@m7787580, You should use spath (which is meant to parse XML or JSON data) to Output the fields you need.(http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath)

You should also see the feasibility of taking care of extracting XML data at the search time using KV_MODE = xml while defining the sourcetype (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf)

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

horsefez
SplunkTrust
SplunkTrust

Hi,

how about this one.

(?:\<activityName\>)(?<activityName>[^\<]+)(?:\<activityName\>)[\r\n](?:\<activityStatus\>)(?<activityStatus>[^\<]+)(?:\<activityStatus\>)[\r\n].+[\r\n].+[\r\n](?:\<JourneyID\>)(?<JourneyID>[^\<]+)(?:\<JourneyID\>)[\r\n](?:\<startID\>)(?<startID>[^\<]+)(?:\<startID\>)[\r\n].+[\r\n](?:\<ProductCode\>)(?<ProductCode>[^\<]+)(?:\<ProductCode\>)[\r\n](?:\<JourneyOrderPoints\>)(?<JourneyOrderPoints>[^\<]+)(?:\<JourneyOrderPoints\>)

https://regex101.com/r/AeXvXo/1

0 Karma

m7787580
Explorer
Hi Pyro_wood,

Thanks for the solution i understood.
but what if i don't want to write whole fields names  again and again.
We can see that all fields are staring from < and ending on />

Can this be possible if we right single rex command like
rex field=_raw starting from <(capturing Name)>(Capturing Value)</

As we can see all the fields are following same format present below starting from < and ending on </

<ProductCode>16</ProductCode>
<JourneyOrderPoints>130</JourneyOrderPoints>

If i can have single standard rex query then i can run it on any service irrespective of any field name and value.

Thanks in advance
0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...