Solved: Why is my configuration to anonymize data not work...

keywork · ‎10-27-2014

Hallo,
I am in the need of anonymizing the second column in a tab-separated log file.
I use the method described in "Anonymize Data".

transforms.conf:

[abcdef]
REGEX = ^([^\t]*\t)[^\t]*somePatternToReplace[^\t]*(\t.*)$
FORMAT = $1TestReplacementString$2
DEST_KEY = _raw

props.conf:

[someSourceType]
TRANSFORMS-xyz = abcdef
FIELD_DELIMITER = \t
FIELD_NAMES = "Field1", "Field2", "Field3", ...

The raw data is processed and indexed as expected, i.e. I see "TestReplacementString" in the search for field _raw. However, the field "Field2" still has the original, unanonymized value. Is there a way I can have that value also affected by anonymization?

Adding this to transforms.conf (and including in props.conf) does not make a difference either:

[fieldSpecific]
REGEX = (.*)something.*
FORMAT = $1TestReplaceField
DEST_KEY = Field2

[accepted_keys]
name = Field2

Thanks for your help in advance!

keywork · ‎10-29-2014

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

View solution in original post

serpin · ‎11-13-2014

I have the same problem.
I am indexing CSV files and every field maintains its data even when the _raw field is getting anonymized.

Is there any way to auto extract csv (not using explicit extraction) and at the same time be able to anonymize some of the fields?

keywork · ‎10-29-2014

I found out that the SED commands are processed sequentially by the order i write them down in props.conf. That way I can provide a default replacement as the last SED command.
My SEDCMD-xyz line in props.conf has about 35,000 characters since I'm checking about 450 conditions. It works fine; indexing data takes significantly longer, but still OK. If someone has a better idea, please let me know.

keywork · ‎10-28-2014

I found out that FIELD_NAMES applies during index time, so not surprising anymore that anonymization doesn't work.
I changed field extraction using EXTRACT- command. This allows me to use SEDCMD- command to modify data at indexing time.
The problem is now: i can provide several SED commands at once, but apparently these are executed somehow in parallel. However, what I need is to implement a mapping, i.e. some kind of case-statement: If field_2 value matches regex_1, then set this value for field_2; if field_2 value matches regex_2, then set that value for field_2, and so on. Particularly, I need a default value for field_2 if none of my regular expressions defined in the SED commands matches.

keywork · ‎10-28-2014

I read that FIELD_NAMES is used at search time field extraction. I assume that for search time field extraction, the _raw field is used as "the source data". With _raw having successfully being anonymized, I wonder why I still get the original, unanonymized value when using search.

keywork · ‎10-28-2014

Some buzzwords as a direction for further investigation are also appreciated.

Why is my configuration to anonymize data not working for fields named by FIELD_NAMES in props.conf?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life