Splunk Search

Why does my sed replace command replace too much?

DempseyWilliams
Explorer

I need some help figuring out why my sed replace command is replacing all of the text to the end of the event in Splunk rather than just the specific text I had it look for. As part of a GDPR-compliance project, I was tasked with anonymizing personal names that come through Splunk, which my solution does. But I'm finding that everything after the replaced text is being cut off as well.

In my props.conf file, I've added this section to do the replace.

[host::...*]
SEDCMD-GDPR-anonymize-firstname = s/\"FirstName\"[=:].*\".*?\"/"FirstName":"######"/g

These are JSON messages, so I have Splunk looking for the "FirstName":"Billy", and want it to replace whatever it finds between the double-quotes with the pound signs, which it does.

Here's a sample message that I want to anonymize:

"Beneficiary_LocalID":"TZ056500190","FirstName":"Billy","Location":"Tanzania"

Desired result:

"Beneficiary_LocalID":"TZ056500190","FirstName":"######","Location":"Tanzania"

Actual result:

"Beneficiary_LocalID":"TZ056500190","FirstName":"######"

Do I have something wrong in my regex statement that is causing the rest of the event to be included in the replacement? Any help would be greatly appreciated.

0 Karma
1 Solution

dshpritz
SplunkTrust
SplunkTrust

Your regex is a little too greedy. Try

"FirstName"[=:]"[^"]+"

This is using something called a "negated character class".

View solution in original post

ballen1
Explorer

Is this still a valid fix?  I've tried something very similar and it didn't work for me.  Please see below:

rex mode=sed "s/\"name":\s\"[^\"]+\"/"name":"###############"/g"

 

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Your regex is a little too greedy. Try

"FirstName"[=:]"[^"]+"

This is using something called a "negated character class".

DempseyWilliams
Explorer

That appears to have fixed it. I'm still learning regex. Could you give a brief explanation as to what your version is doing compared to what I had?

0 Karma

dshpritz
SplunkTrust
SplunkTrust

My version is saying "anything that isn't a quote character, repeated one or more times". Once it hits that first quote, the match stops, and then we add another quote to match it. This is stricter than the other version, which would keep capturing until it hit the final quote. HTH!

DempseyWilliams
Explorer

Awesome! Thanks for the explanation!

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...