Hi Splunkers,
I am looking for some help in modifying current regex to meet our updated project criteria.
Link: https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/Anonymizedata
Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP|
LogEvent="Response",MethodName="get.complete",ActionResult="Success",ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="jhon",LastName="doe",Gender="M",DateOfBirth="7/19/1993",SocialSecurityNumber="123456789",MaritalStatus="0",RaceInformation="Item8",CitizenshipCode="1",County="20",AddressLine1="221 Street",City="Washington",State="USA"
I want to write a regular expression to mask all key value pairs basically PII data which start after ,MethodName="get.complete",
(i.e ApplicationNumber, FirstName, DateOfBirth, SocialSecurityNumber, MaritalStatus
,etc)
Order of the field till Method name is constant and is never changing. Every event would have exact order till “MethodName” and additional PII elements added after the “MethodName”.
Note: The location of the fields to masked may change at time but it will always be in a key value pair format. (i.e ,ApplicationNumber="1234567890",ApplicationLanguage="1",Section="SUMMARY",FirstName="Sherlock",LastName="Holmes",Gender="M",DateOfBirth="7/19/1976"
)
Following are the solution I was planning to use to mask data at index time.
PROPS Example Using SEDCMD Regex:
[sourcetype]
**SEDCMD-mask = regex to skip first three key-value pair and mask rest
OR**
Transforms Example Using regex:
[ssn-anonymizer]
REGEX = regex to capture ssn
FORMAT = format to mask entire data
DEST_KEY = _raw
Current approaches not fulfilling our request.
1 Below expression is dropping all values after MethodName instead of masking them.
SEDCMD-maskPHI = s/(MethodName=\"[^\"]+\",).*$/\1/g
2 Below regex is masking all key value pairs after the last |. But we need to mask everything only after the MethodName="get.complete".
SEDCMD-maskall = s/(\w+)="(?:(?:(?!\s*?\|).)*?)"(?!.*\|)/\1="########"/g
Thank you for all of your help and advice.
[Edit: fixed formatting and used the code button so characters no longer are being eaten.]
Hi @smakwana,
If you would like to use props.conf and transforms.conf then please use below configuration on Indexer/Heavy Forwarder whichever comes first. You can test below regex with your sample data here https://regex101.com/r/F6zv8u/1
props.conf
[yoursourcetype]
TRANSFORMS-anonymize = PII-anonymizer
transforms.conf
[PII-anonymizer]
REGEX = (?m)^(.*MethodName=\"get\.complete\").*(.*)$
FORMAT = $1#######$2
DEST_KEY = _raw
EDIT1: Updated transforms.conf configuration.
EDIT2: If you want to you sed
then you can use below regex
\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"
So your SED configuration will be
SEDCMD-maskall = s/\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"/\1="########"/g
For testing purpose I have made below query based on your data
| makeresults
| eval _raw="Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP|
LogEvent=\"Response\",MethodName=\"get.complete\",ActionResult=\"Success\",ApplicationNumber=\"1234567890\",ApplicationLanguage=\"1\",Section=\"SUMMARY\",FirstName=\"jhon\",LastName=\"doe\",Gender=\"M\",DateOfBirth=\"7/19/1993\",SocialSecurityNumber=\"123456789\",MaritalStatus=\"0\",RaceInformation=\"Item8\",CitizenshipCode=\"1\",County=\"20\",AddressLine1=\"221 Street\",City=\"Washington\",State=\"USA\""
| rex mode=sed "s/\b(?:(?!LogEvent|MethodName)(\w+))\b=\"(?:(?:.)*?)\"/\1="########"/g"
Which is giving below result
Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP|
LogEvent="Response",MethodName="get.complete",ActionResult=########,ApplicationNumber=########,ApplicationLanguage=########,Section=########,FirstName=########,LastName=########,Gender=########,DateOfBirth=########,SocialSecurityNumber=########,MaritalStatus=########,RaceInformation=########,CitizenshipCode=########,County=########,AddressLine1=########,City=########,State=########
Hi @smakwana,
If you would like to use props.conf and transforms.conf then please use below configuration on Indexer/Heavy Forwarder whichever comes first. You can test below regex with your sample data here https://regex101.com/r/F6zv8u/1
props.conf
[yoursourcetype]
TRANSFORMS-anonymize = PII-anonymizer
transforms.conf
[PII-anonymizer]
REGEX = (?m)^(.*MethodName=\"get\.complete\").*(.*)$
FORMAT = $1#######$2
DEST_KEY = _raw
EDIT1: Updated transforms.conf configuration.
EDIT2: If you want to you sed
then you can use below regex
\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"
So your SED configuration will be
SEDCMD-maskall = s/\b(?:(?!LogEvent|MethodName)(\w+))\b="(?:(?:.)*?)"/\1="########"/g
For testing purpose I have made below query based on your data
| makeresults
| eval _raw="Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP|
LogEvent=\"Response\",MethodName=\"get.complete\",ActionResult=\"Success\",ApplicationNumber=\"1234567890\",ApplicationLanguage=\"1\",Section=\"SUMMARY\",FirstName=\"jhon\",LastName=\"doe\",Gender=\"M\",DateOfBirth=\"7/19/1993\",SocialSecurityNumber=\"123456789\",MaritalStatus=\"0\",RaceInformation=\"Item8\",CitizenshipCode=\"1\",County=\"20\",AddressLine1=\"221 Street\",City=\"Washington\",State=\"USA\""
| rex mode=sed "s/\b(?:(?!LogEvent|MethodName)(\w+))\b=\"(?:(?:.)*?)\"/\1="########"/g"
Which is giving below result
Current Log format: Value1 | Value2 | Value3 | Value4 | Value5 | Value6 | Value7 | Value8 | Value9 | Value10 | Value11 | Value12 | ClientIP|
LogEvent="Response",MethodName="get.complete",ActionResult=########,ApplicationNumber=########,ApplicationLanguage=########,Section=########,FirstName=########,LastName=########,Gender=########,DateOfBirth=########,SocialSecurityNumber=########,MaritalStatus=########,RaceInformation=########,CitizenshipCode=########,County=########,AddressLine1=########,City=########,State=########
In given solution transforms.conf example mask everything after MethodName="get.complete",
so please use SED option which works perfectly fine irrespective of location of fields ApplicationNumber, FirstName .....
etc.
@harsmarvania57..thank you so much. It resolved our issue.
Feel free to upvote my answer if it really helps. 😛
@harsmarvania57 I had the same issue and this solved it. Thank You. 🙂