Getting Data In

Anonymize data from JSON File

AnujaJ
Path Finder

I have a json event with an id which I want to anonymize. However, I have to be able to perform stats/count/grouping and other analytics on this id later. In short, I want to hide this id for the users but should be able to be used internally by Splunk. Is this possible?

My event looks something like this:

{"duration":0.33,"a":"login","i":"50050","d":"2055502349","c":"LIVE","@timestamp":"2020-05-22T01:59:59.601Z"}

I want to anonymize "d" id.

Labels (1)
0 Karma

to4kawa
Ultra Champion

UPDATED:

props.conf

[anony_json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
TRANSFORMS-anony = anony, anony_raw
TRUNCATE = 0
TIME_PREFIX = timestamp\":\"
SHOULD_LINEMERGE = false

transforms.conf

[anony]
INGEST_EVAL = d:=md5(d)
WRITE_META = true

[anony_raw]
REGEX = (?m)(.*\"d\":\s*\"\d{4})\d+\"(.*)
FORMAT = $1XXXXXX"$2
DEST_KEY =_raw

https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
https://docs.splunk.com/Documentation/Splunk/latest/Data/IngestEval

In my splunk(ver 8), this setting works.
I have a few mistakes. I fix them.

How about this?

0 Karma

AnujaJ
Path Finder

Thank you for your answer.

d is single valued.

However, I cannot use this solution as I would not be able to perform commands like "|stats count by d" since the indexed value of d will be changed. I want d to be anonymized for all the users but splunk should be able to internally use it.

0 Karma

to4kawa
Ultra Champion
 [anony]
 INGEST_EVAL = d=md5(d)
 WRITE_META = true
 [anony_raw]
 REGEX = (\"d\":\s*\")(\d{4})\d+\"
 FORMAT = $1$2XXXXXX"
 DEST_KEY = _raw

use hash

0 Karma

AnujaJ
Path Finder

I exactly want this. I changed anony_raw so as to include data before and after. However, the hash is not applied. The script only adds XXX to d instead of calculating hash.

props.conf

INDEXED_EXTRACTION = json 
KV_MODE = none 
TRANSFORMS-anony = anony, anony_raw

Transforms.conf

[anony] 
INGEST_EVAL = d=md5(d)
WRITE_META = true

[anony_raw] 
REGEX = (?m)^(.*)(\"d\":\s*\")(\d{4})\d+\"(.*)
FORMAT = $1$2$3XXXXXX"$4 
DEST_KEY =_raw
0 Karma

AnujaJ
Path Finder
0 Karma

to4kawa
Ultra Champion

INGEST_EVAL = d=substr(d,5,10).substr(d,1,6).(d%2).(d%3)
How's this?

0 Karma

AnujaJ
Path Finder

This does not work. anony_raw overrides anony so the end result is d: 2055XXXXXX. I want to use md5 so that I can still co-relate data-.

For props.conf even if I change order of the two properties the end result stays the same. Removing anony_raw makes no changes to the original information.

0 Karma

to4kawa
Ultra Champion

My answer is updated. please confirm.

0 Karma

mah
Builder

Hi,

I have the same issue BUT little more complex. 

This is an example of a json event return in splunk :

{ [-]
   CodeSha2562+1ndsvhz23R2VD42
   CodeSize1909
   Description: None
   Environment: { [-]
     Variables: { [-]
       CLUSTER_NAME: Cluster
       ENVIRONMENTdev
       USER_NAMEtata
       PASSWDtoto!
     }

   }

   LastModified2019-12-05T10:58:05.308+0000
   MemorySize128
   RevisionIdf0d723sdf6-c000edfzf
   Runtimepython3.6
   Timeout180
   TracingConfig: { [+]
   }

   Version$LATEST
   regioneu-east-1

}

The problem is that sensitive data appear in clear specifically  in Environment>Variables

In this section, we have variables : we can not create a regex with specific key name because it always changes. 

How can I mask all values in the Environment>Variables WITHOUT masking the key ?

Example of result I want :

[-]
   CodeSha2562+1ndsvhz23R2VD42
   CodeSize1909
   Description: None
   Environment: { [-]
     Variables: { [-]
       CLUSTER_NAME:
       ENVIRONMENT:
       USER_NAME:
       PASSWD:
     }

   }

   LastModified2019-12-05T10:58:05.308+0000
   MemorySize128
   RevisionIdf0d723sdf6-c000edfzf
   Runtimepython3.6
   Timeout180
   TracingConfig: { [+]
   }

   Version$LATEST
   regioneu-east-1
}

Tags (1)
0 Karma

lloydknight
Builder

Hello @AnujaJ

Though I haven't tried this yet, I think this can be achieved by forwarding the anonymized event at index-time to the intended customer index and forward a separate non-anonymized event on an admin-only index.

Caveat for this is it would double your license usage.

Please see link below if my answer is what you're aiming for:
https://answers.splunk.com/answers/690291/one-source-to-two-indexes.html

EDIT:

You can actually achieve the "one data source (anonymized and non-anonymized) to two indexes solution" without hitting a double license usage:
(check woodcock's answer on the link below)
https://answers.splunk.com/answers/567223/how-to-send-same-data-source-to-two-or-multiple-in-1.html

Hope it helps!

0 Karma

AnujaJ
Path Finder

Since the actual data is only available to the admin, does it mean that only admin will create the dashboards while other users use customer index?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...