Splunk Search

How to rex out and substitute it with *

m7787580
Explorer

I would like to substitute below kind of email address with *

Original :- john.trava@gmail.com

Expected:- Jo*.**va@gmail.com

First two character of first name and last two character before @ should be visible and the domain name like gmail.com should also be visible

Thanks in advace

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @m7787580, as cpetterborg suggested, Regex101.com doesn't do sed mode regular expression substitutions. Just try it in Splunk in your search. i have tried and its working fine, including for alphanumerical email ids.. please check the screenshot -

| makeresults
  | eval _raw="john2.trava5@gmail.com"
 | rex field=_raw mode=sed "s#(\S\S)\S*(\S\S)@#\1**.**\2#"

alt text

0 Karma

DalJeanis
Legend

@m7787580 - does this need to be changed only in a particular field, or are you trying to mask the underlying data? If you want to change the underlying data, then the proper solution is going to require analysis of what the specific data really looks like. If so, please post two or three typical example events of this type, with any sensitive data anonymized.

0 Karma

inventsekar
SplunkTrust
SplunkTrust
| makeresults
  | eval _raw="john.trava@gmail.com"
 | rex field=_raw mode=sed "s#([a-zA-Z]{2})([a-zA-Z]+)\.([a-zA-Z]+)([a-zA-Z]{2})#\1**.**\4#"

alt text

0 Karma

m7787580
Explorer

Hi Inventsekar,

Thanks for your prompt response.
Apologies for not clearing my requirement.

When i try to use it in regex101 and use substitute inside the Regex101 website then this string is not working.

Link-->https://regex101.com/r/YPzyE7/1

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Regex101.com doesn't do sed mode regular expression substitutions. Just try it in Splunk in your search.

Do you need to do this substitution at index time instead of search time?

0 Karma

m7787580
Explorer

Yes its at index time not at the search time

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

So using the rex @inventsekar provided, you can use the props.conf file this way:

[your-source-type]
SEDCMD = s#([a-zA-Z]{2})([a-zA-Z]+)\.([a-zA-Z]+)([a-zA-Z]{2})#\1**.**\4#

This should perform the substitution that you need. If you need it in a more general form for ALL sourcetypes, then just comment here.

If inventsekar's answer works for you, remember to Accept his solution.

0 Karma

DalJeanis
Legend

@cpetterborg @m7787580 @inventsekar - Those regexes assume that all email addresses will be alphabetic, which is not a valid assumption.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

exactly.. true @DalJeanis.. i thought, once this first step is completed, i thought to do the alphanumerical email addresses.

Hi @cpetterborg, thanks.. that "\S" is a good learning!

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

The following would probably be the most general case, but it would certainly capture things that might not be email addresses:

SEDCMD = s#(\S\S)\S*(\S\S)@#\1**.**\2@#g

But you could make it more closely related to a standard form for email addresses and go with something like:

SEDCMD = s#([a-zA-Z0-9_-.]{2})[a-zA-Z0-9_-.]*([a-zA-Z0-9_-.]{2}@[a-zA-Z0-9_-.]+)#\1**.**\2#g

as long as the character set for email addresses and domain names is [a-zA-Z0-9_-.]. It would NOT work on an email address that was something like abc@here .com, but the stipulation in the beginning was to use the first two and last two username characters, so if you don't have that many, then it won't work.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...