Splunk Search

How to anonymize data using REGEX in transforms.conf for an undefined number of characters?

SirHill17
Communicator

Hi,

I would like to anonymize data (data is file system path) using REGEX. I succesfully managed to hide data like IP, Credit Card Number, etc. But not able to replicate the setup for an undefined number of characters.

Could you please help reviewing the below code:

props.conf:

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

transforms.conf

[filepath-anonymizer]
REGEX = (?m)^(.*)filePath=\S+(.*)$
FORMAT = $1filePath=XXXX$2
DEST_KEY = _raw

Below an example of logs that must be transformed:

2016-02-25 14:40 GMT+1 this is only an example filePath="/tmp/file.log" error script 1

The log is indexed without any modification.

Thanks for your help.

Cyril

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

Hi, please try this regex with positive lookahead and positive lookbehind.

Props.conf

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

Transforms.conf

[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw

View solution in original post

jkat54
SplunkTrust
SplunkTrust

Hi, please try this regex with positive lookahead and positive lookbehind.

Props.conf

[amit_anonymize_data]
TRANSFORMS-anonymize = filepath-anonymizer

Transforms.conf

[filepath-anonymizer]
REGEX = '(.*)(?<=filePath=").*(?=")(.*)'
FORMAT = $1XXXX$2
DEST_KEY = _raw

jkat54
SplunkTrust
SplunkTrust

Ok so what is the architecture here? Are there forwarders etc? You say you can mask CCredit cards but did you do that in development on single Splunk instance and now you're trying this other redaction in production where the architecture is different?

0 Karma

SirHill17
Communicator

I am working on a DEV environment (same one as Credit Card masking). Files props.conf and transforms.conf have been updated on the indexer server. Data is coming from a forwarder yes.

0 Karma

jkat54
SplunkTrust
SplunkTrust

also what if you put single quotes around the regex?

0 Karma

SirHill17
Communicator

Great, it's working with the single quotes. Thanks!!!

0 Karma

jkat54
SplunkTrust
SplunkTrust

Awesome! I edited the answer to add the single quotes for folks looking in the future.

Thanks for the follow up and marking the answer!

0 Karma

SirHill17
Communicator

In case it could help:

I have customized the REGEX to take in account the case where the path would contain a space char (which can happened but should not 🙂 )

'^(.*)(?<=filePath=").*?(?=")(.*)$'

jkat54
SplunkTrust
SplunkTrust

Very nice, great follow up! I didnt even think about spacing in file paths...

0 Karma

jkat54
SplunkTrust
SplunkTrust

this makes me think your first regex might have worked with single quotes too. Its hard to tell which regex is less resource intensive without testing but I assume my regex requires more effort by the CPU due to the lookaheads.

0 Karma

SirHill17
Communicator

No more success. From your input I also tried

(?<=filePath=")\S+(?=")

but no more success.

Can anything else impact it?

0 Karma

jkat54
SplunkTrust
SplunkTrust

My apologies. I have corrected my answer.

0 Karma

SirHill17
Communicator

Unfortunately no change. I don't really know what's wrong...

0 Karma

Richfez
SplunkTrust
SplunkTrust

What happens when you do this? Anything, or is the _raw unchanged?

And have you tried without multiline? (The (?m) at the front)? That may also be making it behave slightly differently.

0 Karma

SirHill17
Communicator

Yes _raw is unchanged. Just tried without (?m) but no success.

Is the FORMAT mentioned correct? My concern is about the number of char that XXXX replace. If the filePath has 15 characters, it will be replace by XXXX (4X) ? Is that right?

Thanks.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The FORMAT string looks correct to me. Yes, the filepath will be replaced by 4 X's no matter how many characters are in the original path.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Is the sourcetype on the input set correctly (amit_anonymize_data)?

---
If this reply helps you, Karma would be appreciated.
0 Karma

SirHill17
Communicator

Yes the sourcetype is correct.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...