Splunk Search

Regex to match email id anywhere in raw log

splunker12er
Motivator

I do index an unstructured log file , where i want to extract email_id in that.
Since, email ids are present in different location of the log entry , Splunk's field extraction method doesn't help me.

Is there any trick to resolve this ?

0 Karma

MuS
Legend

Hi splunker12er,

based on the few log events, try something like this:

your base search | rex "(\'((\d{1,3}\.){3}\d{1,3})\'\,\s\')((?<sessionID>[\w\d]+)\'|(?<email>[\w\d\.\-]+\@[\w\d\.]+))" | table sessionID, email

this is based on the fact that either the session ID or an email will follow after the IP address.

cheers, MuS

triest
Communicator

I hate to keep asking for more information, but I'm just not sure which is the session id and the e-mail id.

Even if I knew, its hard based on such a small log sample to tell things like what is the range of lengths? Typically you can do \w+ but one way to minimize false positives is to limit the range (e.g. \w{59} if always a set length of 59 or \w{48,58} if a range from 48 to 58

If you clarify what you want, I'd be glad to attempt to write a regex, but it might take a couple of iterations since I'm not real familiar with your data.

0 Karma

splunker12er
Motivator

Above is the log sample (Case: Password failed) , similarly another case for correct password.

Since, email_id present within the parenthesis in a log entry , as like the session_id also , regex always match both , that's the problem in my case. Either matches both or none.

0 Karma

splunker12er
Motivator
 2014-10-02 11:28:40,545 root : INFO [110] request failed - method 'login_password' - method raised a controlled fault ()

 2014-10-02 11:28:40,541 root : INFO [110] request established - method 'login_password' - parameters '('10.250.200.142', 'LCshfkjsdhfsFds34Fsdfjsdlkl324K3OOybXKJOwh0ApFz5y03N02gRgW2'', '12345678')'

 2014-10-02 11:28:35,106 root : INFO [109] request established - method 'moreauthentication' - parameters '('10.250.200.142', 'LCshfkjsdhfsFds34Fsdfjsdlkl324K3OOybXKJOwh0ApFz5y03N02gRgW2')'

 2014-10-02 11:28:35,092 root : INFO [108] request established - method 'login_email' - parameters '('10.250.200.142', 'jason.jp@gmail.com')'
0 Karma

jrodman
Splunk Employee
Splunk Employee

You can find datamining regex examples for email extraction on various places on the internet. www.regular-rexpressions.info has some possibilities.

However it's rare that the pure data mining approach makes sense in Splunk, where you can contextualize your regexes to the data fairly readily and avoid all the messy false possitives. In order to provide better guidance, as others have said, you really want to show the nature of the data.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Must. Have. Sample. Log. Data. gasp hackhack cough

triest
Communicator

How familiar are you with regular expressions? There are cases where they are not powerful enough, but without seeing logs messages its impossible to be able to tell if they could be dialed in enough. Is this a general product or something specific to your company?

The other question is even if there are some false positives, can you get it so that its good enough? I support a very diverse Splunk installation so we try to get 80% solved and then we move on to the next issue. Its not perfect, it requires we spend more effort when we need to exhaustively rule things out, but 80% generates a whole lot of value.

0 Karma

splunker12er
Motivator

Tried erex option too, but no luck . it still extract unwanted values also as like field extractor

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...