Getting Data In

Why is my configuration to anonymize data not working for fields named by FIELD_NAMES in props.conf?

keywork
Explorer

Hallo,
I am in the need of anonymizing the second column in a tab-separated log file.
I use the method described in "Anonymize Data".

transforms.conf:

[abcdef]
REGEX = ^([^\t]*\t)[^\t]*somePatternToReplace[^\t]*(\t.*)$
FORMAT = $1TestReplacementString$2
DEST_KEY = _raw

props.conf:

[someSourceType]
TRANSFORMS-xyz = abcdef
FIELD_DELIMITER = \t
FIELD_NAMES = "Field1", "Field2", "Field3", ...

The raw data is processed and indexed as expected, i.e. I see "TestReplacementString" in the search for field _raw. However, the field "Field2" still has the original, unanonymized value. Is there a way I can have that value also affected by anonymization?

Adding this to transforms.conf (and including in props.conf) does not make a difference either:

[fieldSpecific]
REGEX = (.*)something.*
FORMAT = $1TestReplaceField
DEST_KEY = Field2

[accepted_keys]
name = Field2

Thanks for your help in advance!

0 Karma
1 Solution

keywork
Explorer

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

View solution in original post

serpin
Explorer

I have the same problem.
I am indexing CSV files and every field maintains its data even when the _raw field is getting anonymized.

Is there any way to auto extract csv (not using explicit extraction) and at the same time be able to anonymize some of the fields?

0 Karma

keywork
Explorer

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

keywork
Explorer

I found out that FIELD_NAMES applies during index time, so not surprising anymore that anonymization doesn't work.
I changed field extraction using EXTRACT- command. This allows me to use SEDCMD- command to modify data at indexing time.
The problem is now: i can provide several SED commands at once, but apparently these are executed somehow in parallel. However, what I need is to implement a mapping, i.e. some kind of case-statement: If field_2 value matches regex_1, then set this value for field_2; if field_2 value matches regex_2, then set that value for field_2, and so on. Particularly, I need a default value for field_2 if none of my regular expressions defined in the SED commands matches.

0 Karma

keywork
Explorer

I read that FIELD_NAMES is used at search time field extraction. I assume that for search time field extraction, the _raw field is used as "the source data". With _raw having successfully being anonymized, I wonder why I still get the original, unanonymized value when using search.

0 Karma

keywork
Explorer

Some buzzwords as a direction for further investigation are also appreciated.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...