Splunk Search

How to extract only unique email ID's from a detailed log

vasuit1242
New Member

Hi Team,

Here is a sample log:

2016-07-11 16:19:52,689 [9] INFO AdidasCoach.EmailProcessorConsumer.EmailProcessorConsumer - Request Id = 4bfb6032-a3bc-4a70-b4d4-5bc1aef91bdd,
Identity = NONE for message: Start processing a new mail request. -- Input mailConfig = [{"From":null,"ReplayToUserMailAddress":null,
"To":["salsakid1@gmail.com"],"Subject":null,"Body":null,"Attachments":null,"TemplateName":"BulkImportEmailTemplate","LocaleCode":"com",
"TemplateVars":{"TitleContent":"Bulk Import","visitmiCoach":"VISIT miCoach ","Url":"https://micoach.adidas.com","FirstName":"",
"LastName":"","ImportResults":"\u003ctable style=\u0027font-family=arial; font-size=11px;\u0027\u003e\u003cthead\u003e\u003ctr
style=\u0027background: #007FC5;\u0027\u003e\u003cth style=\u0027text-align: left; color: white; width: 64%; padding: 5px 5px;
font-family: arial;font-size:12px;\u0027\u003eFile name\u003c/th\u003e\u003cth style=\u0027text-align: left; color: white;
width: 36%; padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003eStatus\u003c/th\u003e\u003cth style=\u0027text-align:
left; color: white; font-family: arial;font-size:12px;\u0027\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody
class=\u0027bulk-import-items\u0027\u003e\u003ctr style=\u0027height: auto;\u0027\u003e\u003ctd style=\u0027width: 64%;
padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003e2016-04-23 538287665 Running Polar.tcx\u003c/td\u003e\u003ctd
style=\u0027width: 36%; padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003e\u003cspan style=\u0027color: red;;
padding: 5px 5px; font-family: arial;font-size:12px;\u0027*u003eWorkout already imported*\u003c/span\u003e\u003c/td\u003e\u003ctd

Each time user performs any duplicate workout upload, this kind of event gets generated and stored in our logs.

Here, we see nearly 10 events generating in the logs for each user.

Now the requirement is

1) to extract the unique list of email ID's from the log.
2) representing the data in a graphical format (currently i managed to create a dashboard with event data only visible on panel)

Can you please assist me.

Many Thanks,
Vasu

0 Karma
1 Solution

sundareshr
Legend

This should extract any email id.

.... | rex "\b(?<emailid>[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid

For requirement number 2, what do you mean by graphical format?

View solution in original post

0 Karma

Richfez
SplunkTrust
SplunkTrust

Your request #2 is rather vague, perhaps a better description of what you want may help.

But, to your first question...

First, you'll probably want to return only the logs with the "Workout already imported" string in it. Unfortunately, because of the preceding and trailing characters (and possibly some copy-n-paste issues), you'll have to fiddle with this a bit. You can try adding the following pieces to your base search (the ... in the below, like index=blah sourcetype=blech 😞

... "Workout already imported" 

or

... "u003eWorkout already imported"

(I think the asterisk character will be a minor breaker, so it should "split" the big string up at those points to let you search on it).

If all else fails, for now you can use the very inefficient

... "*Workout already imported*"

In any case, you should have at the end of that just a pile of the duplicated records.

So, once you have that, let's rex that into a field so you can do things with it. If you append to the end of the search that returns the above logs the following, this should get you a new field called email_address. If you already have such a field, ignore this part and just use the fieldname you already have in the NEXT piece of code.

... | rex "\"To\":\[\"(?<email_address>[^\"]*)\"\]"

After that, you should have the email_address field, so let's now do what you need to do.

My first thought on getting your distinct addresses - and there's all sorts of options for this - is to just do something simple like

... | rex "\"To\":\[\"(?<email_address>[^\"]*)\"\]" | dedup email_address | table email_address

Let us know how it goes!

0 Karma

vasuit1242
New Member

I have enabled two panels in a single dashboard.

Panel 1:

index=project_micoach AND tag=micoach AND "u003eWorkout" NOT tag=stg NOT test1@gmail.com NOT test2@gmail.com NOT test3@gmail.com NOT test4@gmail.com NOT test5@gmail.com | rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | table emailid | stats count by emailid

Result: Showing the list of email ID's along with the event count (removed dedup to see the original count per user)

Panel 2:

index = project_micoach AND tag = micoach AND "u003eWorkout" NOT tag = stg NOT test1@gmail.com NOT test2@gmail.com NOT test3@gmail.com NOT test4@gmail.com NOT test5@gmail.com | rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | table emailid | stats count by emailid

Result: Showing the events in a graphical format. X-axis: Email ID's and Y-axis: count of events for each email ID.

Is there any way to get Date in the X-axis and count of events in the Y-axis?

Please suggest.

0 Karma

sundareshr
Legend

This should extract any email id.

.... | rex "\b(?<emailid>[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid

For requirement number 2, what do you mean by graphical format?

0 Karma

PradnyaGosavi
New Member

It worked. 

Thanks you 🙂

0 Karma

inventsekar
Ultra Champion

Great works Sundaresh Sir.. one question..
may i know how this part works please.. (for grep'ing @gmail.com)
\@[A-Za-z0-9.-_] -.... particularly the .-_]

0 Karma

sundareshr
Legend

@inventsekar If you visit http://regex101.com/ you can paste the _raw event in the test string region and the regex I suggested in the regular expression text box. You will see on the right there is a detailed explanation on how the regex is applied.

Hopefully that helps

0 Karma

vasuit1242
New Member

i just copy pasted this line:

| rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid

to my existing search string. Wow Super it exactly extracted the last 24 hours data of unique email ID's.

Many Thanks for this quick help. Is there any way to represent the data in line format? alt text

Currently the data is being represented in just a raw format. We shall use this graphical data (X-axis - Date and Y-axis number of duplicate workout uploads count).

0 Karma

sundareshr
Legend

Replace the table command with timechart to get time on the x-axis and you can do a count to get Y-axis.

0 Karma

vasuit1242
New Member

Thanks Sundaresh.

I replaced the table command with timechart, but the results are not showing up.

0 Karma

ddrillic
Ultra Champion

Very similar case at Regex to match email id anywhere in raw log

MuS suggested -

 your base search | rex "(\'((\d{1,3}\.){3}\d{1,3})\'\,\s\')((?<sessionID>[\w\d]+)\'|(?<email>[\w\d\.\-]+\@[\w\d\.]+))" | table sessionID, email
0 Karma

Richfez
SplunkTrust
SplunkTrust

Does (in this case) "salsakid1@gmail.com" show up as a field, like "email_address" or "to" or something? Or does it only show up in the raw text as above?

0 Karma

vasuit1242
New Member

There is a filed already for this but it is not only representing this one, but a lengthy text is inside the field.

Is there any way that i can upload the exported data from Splunk for this?

Please suggest.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...