I would like to substitute below kind of email address with *
Original :- john.trava@gmail.com
Expected:- Jo*.**va@gmail.com
First two character of first name and last two character before @ should be visible and the domain name like gmail.com should also be visible
Thanks in advace
Hi @m7787580, as cpetterborg suggested, Regex101.com doesn't do sed mode regular expression substitutions. Just try it in Splunk in your search. i have tried and its working fine, including for alphanumerical email ids.. please check the screenshot -
| makeresults
| eval _raw="john2.trava5@gmail.com"
| rex field=_raw mode=sed "s#(\S\S)\S*(\S\S)@#\1**.**\2#"
@m7787580 - does this need to be changed only in a particular field, or are you trying to mask the underlying data? If you want to change the underlying data, then the proper solution is going to require analysis of what the specific data really looks like. If so, please post two or three typical example events of this type, with any sensitive data anonymized.
Hi Inventsekar,
Thanks for your prompt response.
Apologies for not clearing my requirement.
When i try to use it in regex101 and use substitute inside the Regex101 website then this string is not working.
Regex101.com doesn't do sed mode regular expression substitutions. Just try it in Splunk in your search.
Do you need to do this substitution at index time instead of search time?
Yes its at index time not at the search time
So using the rex @inventsekar provided, you can use the props.conf
file this way:
[your-source-type]
SEDCMD = s#([a-zA-Z]{2})([a-zA-Z]+)\.([a-zA-Z]+)([a-zA-Z]{2})#\1**.**\4#
This should perform the substitution that you need. If you need it in a more general form for ALL sourcetypes, then just comment here.
If inventsekar's answer works for you, remember to Accept his solution.
@cpetterborg @m7787580 @inventsekar - Those regexes assume that all email addresses will be alphabetic, which is not a valid assumption.
exactly.. true @DalJeanis.. i thought, once this first step is completed, i thought to do the alphanumerical email addresses.
Hi @cpetterborg, thanks.. that "\S" is a good learning!
The following would probably be the most general case, but it would certainly capture things that might not be email addresses:
SEDCMD = s#(\S\S)\S*(\S\S)@#\1**.**\2@#g
But you could make it more closely related to a standard form for email addresses and go with something like:
SEDCMD = s#([a-zA-Z0-9_-.]{2})[a-zA-Z0-9_-.]*([a-zA-Z0-9_-.]{2}@[a-zA-Z0-9_-.]+)#\1**.**\2#g
as long as the character set for email addresses and domain names is [a-zA-Z0-9_-.]
. It would NOT work on an email address that was something like abc@here .com
, but the stipulation in the beginning was to use the first two and last two username characters, so if you don't have that many, then it won't work.