Splunk Search

REX SED Help, need to replace namespaces from xml field

somesoni2
Revered Legend

Hi,

I have a xml field which holds values like below. It contains namespaces for each element which I want to remove:

...message="<h:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<h:Header>
<h:creationTimestamp>2013-12-09T16:58:57.2450018+05:30</h:creationTimestamp>
<h:applicationId>XYZ</h:applicationId>
<h:hostName>Myhost</h:hostName>
</h:Header>
</h:Envelope>"

Obviously there could be more/different namespace values ("h:" prefix) in the the logs, so can't hardcode the value to replace it. I believe REX SED would be the appropriate method for my requirement.

I am very new to Regex so not able to start with REX SED command. Could anyone provide me some direction/example how to go about it?

Tags (2)
1 Solution

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

View solution in original post

Cuyose
Builder

Another way you could do this at search time is to simply eval the XML portion with a regex replace

|rex field=_raw "(?i)request=(?.+)$"

|eval requestXML=replace(XML,"\w+:","")

|xmlkv requestXML

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

If you wanted to do this at index-time (i.e. remove the content from the event before it's indexed into Splunk), then you could use SEDCMD to remove the content via props.conf:

[XML_sourcetype]
...
SEDCMD-null = s/(<h:\S+)\s+(xmlns:\S+)(>)/\\1\3/g

If you're doing this inline, then the same regex should work with the rex command:

... | rex field=_raw mode=sed "s/(<h:\S+)\s+(xmlns:\S+)(>)/\1\3/g"

emechler_splunk
Splunk Employee
Splunk Employee

Sure, simply remove the "h:" from the capture group as such:
"s/(<)h:(\S+)\s+(xmlns:\S+)(>)/\1\2\4/g"

0 Karma

somesoni2
Revered Legend

It worked fine and removing all occurrance of xmlns where "<h:" is present. Is is possible to remove the "h:" also so that I will be left with just "" instead of ""

0 Karma

emechler_splunk
Splunk Employee
Splunk Employee

Sorry, looked like there was an extra backslash in the regex. Try the updated version above.

0 Karma

somesoni2
Revered Legend

I would like to do this during search time. Unfortunately the above rex command doesn't work for me (event with test data I provided).

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...