Hi Team,
Here is a sample log:
2016-07-11 16:19:52,689 [9] INFO AdidasCoach.EmailProcessorConsumer.EmailProcessorConsumer - Request Id = 4bfb6032-a3bc-4a70-b4d4-5bc1aef91bdd,
Identity = NONE for message: Start processing a new mail request. -- Input mailConfig = [{"From":null,"ReplayToUserMailAddress":null,
"To":["salsakid1@gmail.com"],"Subject":null,"Body":null,"Attachments":null,"TemplateName":"BulkImportEmailTemplate","LocaleCode":"com",
"TemplateVars":{"TitleContent":"Bulk Import","visitmiCoach":"VISIT miCoach ","Url":"https://micoach.adidas.com","FirstName":"",
"LastName":"","ImportResults":"\u003ctable style=\u0027font-family=arial; font-size=11px;\u0027\u003e\u003cthead\u003e\u003ctr
style=\u0027background: #007FC5;\u0027\u003e\u003cth style=\u0027text-align: left; color: white; width: 64%; padding: 5px 5px;
font-family: arial;font-size:12px;\u0027\u003eFile name\u003c/th\u003e\u003cth style=\u0027text-align: left; color: white;
width: 36%; padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003eStatus\u003c/th\u003e\u003cth style=\u0027text-align:
left; color: white; font-family: arial;font-size:12px;\u0027\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody
class=\u0027bulk-import-items\u0027\u003e\u003ctr style=\u0027height: auto;\u0027\u003e\u003ctd style=\u0027width: 64%;
padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003e2016-04-23 538287665 Running Polar.tcx\u003c/td\u003e\u003ctd
style=\u0027width: 36%; padding: 5px 5px; font-family: arial;font-size:12px;\u0027\u003e\u003cspan style=\u0027color: red;;
padding: 5px 5px; font-family: arial;font-size:12px;\u0027*u003eWorkout already imported*\u003c/span\u003e\u003c/td\u003e\u003ctd
Each time user performs any duplicate workout upload, this kind of event gets generated and stored in our logs.
Here, we see nearly 10 events generating in the logs for each user.
Now the requirement is
1) to extract the unique list of email ID's from the log.
2) representing the data in a graphical format (currently i managed to create a dashboard with event data only visible on panel)
Can you please assist me.
Many Thanks,
Vasu
This should extract any email id.
.... | rex "\b(?<emailid>[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid
For requirement number 2, what do you mean by graphical format?
Your request #2 is rather vague, perhaps a better description of what you want may help.
But, to your first question...
First, you'll probably want to return only the logs with the "Workout already imported" string in it. Unfortunately, because of the preceding and trailing characters (and possibly some copy-n-paste issues), you'll have to fiddle with this a bit. You can try adding the following pieces to your base search (the ...
in the below, like index=blah sourcetype=blech
😞
... "Workout already imported"
or
... "u003eWorkout already imported"
(I think the asterisk character will be a minor breaker, so it should "split" the big string up at those points to let you search on it).
If all else fails, for now you can use the very inefficient
... "*Workout already imported*"
In any case, you should have at the end of that just a pile of the duplicated records.
So, once you have that, let's rex that into a field so you can do things with it. If you append to the end of the search that returns the above logs the following, this should get you a new field called email_address
. If you already have such a field, ignore this part and just use the fieldname you already have in the NEXT piece of code.
... | rex "\"To\":\[\"(?<email_address>[^\"]*)\"\]"
After that, you should have the email_address
field, so let's now do what you need to do.
My first thought on getting your distinct addresses - and there's all sorts of options for this - is to just do something simple like
... | rex "\"To\":\[\"(?<email_address>[^\"]*)\"\]" | dedup email_address | table email_address
Let us know how it goes!
I have enabled two panels in a single dashboard.
Panel 1:
index=project_micoach AND tag=micoach AND "u003eWorkout" NOT tag=stg NOT test1@gmail.com NOT test2@gmail.com NOT test3@gmail.com NOT test4@gmail.com NOT test5@gmail.com | rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | table emailid | stats count by emailid
Result: Showing the list of email ID's along with the event count (removed dedup to see the original count per user)
Panel 2:
index = project_micoach AND tag = micoach AND "u003eWorkout" NOT tag = stg NOT test1@gmail.com NOT test2@gmail.com NOT test3@gmail.com NOT test4@gmail.com NOT test5@gmail.com | rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | table emailid | stats count by emailid
Result: Showing the events in a graphical format. X-axis: Email ID's and Y-axis: count of events for each email ID.
Is there any way to get Date in the X-axis and count of events in the Y-axis?
Please suggest.
This should extract any email id.
.... | rex "\b(?<emailid>[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid
For requirement number 2, what do you mean by graphical format?
It worked.
Thanks you 🙂
Great works Sundaresh Sir.. one question..
may i know how this part works please.. (for grep'ing @gmail.com)
\@[A-Za-z0-9.-_] -.... particularly the .-_]
@inventsekar If you visit http://regex101.com/ you can paste the _raw event in the test string region and the regex I suggested in the regular expression text box. You will see on the right there is a detailed explanation on how the regex is applied.
Hopefully that helps
i just copy pasted this line:
| rex "\b(?[A-Za-z0-9\._\-\%\+]+\@[A-Za-z0-9\.\-\_]+)\b" | dedup emailid | table emailid
to my existing search string. Wow Super it exactly extracted the last 24 hours data of unique email ID's.
Many Thanks for this quick help. Is there any way to represent the data in line format?
Currently the data is being represented in just a raw format. We shall use this graphical data (X-axis - Date and Y-axis number of duplicate workout uploads count).
Replace the table
command with timechart
to get time on the x-axis and you can do a count
to get Y-axis.
Thanks Sundaresh.
I replaced the table command with timechart, but the results are not showing up.
Very similar case at Regex to match email id anywhere in raw log
MuS suggested -
your base search | rex "(\'((\d{1,3}\.){3}\d{1,3})\'\,\s\')((?<sessionID>[\w\d]+)\'|(?<email>[\w\d\.\-]+\@[\w\d\.]+))" | table sessionID, email
Does (in this case) "salsakid1@gmail.com" show up as a field, like "email_address" or "to" or something? Or does it only show up in the raw text as above?
There is a filed already for this but it is not only representing this one, but a lengthy text is inside the field.
Is there any way that i can upload the exported data from Splunk for this?
Please suggest.