I do index an unstructured log file , where i want to extract email_id in that.
Since, email ids are present in different location of the log entry , Splunk's field extraction method doesn't help me.
Is there any trick to resolve this ?
Hi splunker12er,
based on the few log events, try something like this:
your base search | rex "(\'((\d{1,3}\.){3}\d{1,3})\'\,\s\')((?<sessionID>[\w\d]+)\'|(?<email>[\w\d\.\-]+\@[\w\d\.]+))" | table sessionID, email
this is based on the fact that either the session ID or an email will follow after the IP address.
cheers, MuS
I hate to keep asking for more information, but I'm just not sure which is the session id and the e-mail id.
Even if I knew, its hard based on such a small log sample to tell things like what is the range of lengths? Typically you can do \w+
but one way to minimize false positives is to limit the range (e.g. \w{59}
if always a set length of 59 or \w{48,58}
if a range from 48 to 58
If you clarify what you want, I'd be glad to attempt to write a regex, but it might take a couple of iterations since I'm not real familiar with your data.
Above is the log sample (Case: Password failed) , similarly another case for correct password.
Since, email_id present within the parenthesis in a log entry , as like the session_id also , regex always match both , that's the problem in my case. Either matches both or none.
2014-10-02 11:28:40,545 root : INFO [110] request failed - method 'login_password' - method raised a controlled fault ()
2014-10-02 11:28:40,541 root : INFO [110] request established - method 'login_password' - parameters '('10.250.200.142', 'LCshfkjsdhfsFds34Fsdfjsdlkl324K3OOybXKJOwh0ApFz5y03N02gRgW2'', '12345678')'
2014-10-02 11:28:35,106 root : INFO [109] request established - method 'moreauthentication' - parameters '('10.250.200.142', 'LCshfkjsdhfsFds34Fsdfjsdlkl324K3OOybXKJOwh0ApFz5y03N02gRgW2')'
2014-10-02 11:28:35,092 root : INFO [108] request established - method 'login_email' - parameters '('10.250.200.142', 'jason.jp@gmail.com')'
You can find datamining regex examples for email extraction on various places on the internet. www.regular-rexpressions.info has some possibilities.
However it's rare that the pure data mining approach makes sense in Splunk, where you can contextualize your regexes to the data fairly readily and avoid all the messy false possitives. In order to provide better guidance, as others have said, you really want to show the nature of the data.
Must. Have. Sample. Log. Data. gasp hackhack cough
How familiar are you with regular expressions? There are cases where they are not powerful enough, but without seeing logs messages its impossible to be able to tell if they could be dialed in enough. Is this a general product or something specific to your company?
The other question is even if there are some false positives, can you get it so that its good enough? I support a very diverse Splunk installation so we try to get 80% solved and then we move on to the next issue. Its not perfect, it requires we spend more effort when we need to exhaustively rule things out, but 80% generates a whole lot of value.
Tried erex option too, but no luck . it still extract unwanted values also as like field extractor