Reporting

In a Splunk search, how do you match "from address" and "to address" event IDS to obtain table of emails?

silverlink34
Explorer

I have a qmail type email server sending in SMTP logs. The log entries have separate events for the "from email address" and the "to email address" for an email transaction, but they share the same MSD number (message id). I've located these specific events and extracted the following fields:

Email_From_Address
Email_To_Address
Email_From_MSD_Number
Email_To_MSD_Number
Email_To_Domain

Also, I extracted a generic field "msd_number", which will match to both Email_From_MSD_Number and Email_To_MSD_Number

I was hoping that with these fields I could get the following data: (using different searches of course.)

-table with the top 10 senders with count of total emails sent (this one I can manage, simply count the Email_From_Address by Email_From Address)

-specified Email_From_Address or Email_To_Address and find all corresponding emails from or to based on which one is searched (for example if searching the Email From Address show all Email To Addresses that was sent from the specified Email From Address, or vice versa if Email To Address is specified)

-table with top 10 Email_From_Address count to Email_To_Domain (example: From column: [email@domain.com] To column: [gmail.com] # emails sent column: [25])

-setup an interactive search dashboard where you can specify the Email From, Email To, or MSD Number and get a list of events with the corresponding data, such as a search for the Email From will return the Emails To and MSD numbers.

The problem is that I don't know how to match up the corresponding To or From address fields by using the matching MSD number field. I started trying by using the search:

host = hostIp | stats list(Email_From_Address) list(Email_to_email_address) by msd_number 

This search DOES contain rows with a matching email from and email to by the MSD number, however it also has rows that don't have matching msd_numbers. I don't want those rows.

I appreciate any time and efforts you may put into helping me understand and find the solution. Thank you kindly.

Tags (2)
0 Karma
1 Solution

silverlink34
Explorer

Please close this question. The suggestions to use the transaction search command have prompted me to gather information in a different manner, which has spawned a new question. I will ask the new question separately. Thank you for all time involved.

View solution in original post

0 Karma

silverlink34
Explorer

Please close this question. The suggestions to use the transaction search command have prompted me to gather information in a different manner, which has spawned a new question. I will ask the new question separately. Thank you for all time involved.

0 Karma

renjith_nair
Legend

@silverlink34,
As mentioned in the comment, since you have the address in separate events use the count method,

Below is a run anywhere example to simulate,

|makeresults 
|eval Email_From_Address="from@first.com,from@second.com,from@third.com,from@fourth.com",msd_number="1,2,3,4"
|makemv Email_From_Address delim=","| makemv msd_number delim=","
|eval z=mvzip(Email_From_Address,msd_number)|table z|mvexpand z|eval s=split(z,",")
|eval Email_From_Address=mvindex(s,0),msd_number=mvindex(s,1)|fields - z,s
|append [|makeresults 
|eval Email_To_Address="to@first.com,to@second.com,to@third.com,to@fourth.com",msd_number="1,2,13,14"
|makemv Email_To_Address delim=","| makemv msd_number delim=","
|eval z=mvzip(Email_To_Address,msd_number)|table z|mvexpand z|eval s=split(z,",")
|eval Email_To_Address=mvindex(s,0),msd_number=mvindex(s,1)|fields - z,s]
|rename comment as "---------LINES ABOVE ARE FOR DUMMY DATA GENERATION AND HAS NOTHING TO DO WITH THE RESULT------"
|stats values(Email_From_Address) as From,values(Email_To_Address) as To , count by msd_number|where  count>1|fields - count
Happy Splunking!
0 Karma

silverlink34
Explorer

@renjith.nair, thank you for putting the time into creating dummy data generation to help me find a solution. Your example you provided with the results it yielded is exactly what I am looking for.

I have further fixed more of the issue by extracting a new field, called "To_or_From_MSD_Number" which uses specific regex to capture the MSD number ONLY if it is a To or From email event. This has given me near exactly the results I needed when using your new search query.

However, what I have found is that this email application will re-use the MSD number further in the same day, making results have multiple values in the From and To columns. Due to this unfortunate detail, I wonder if this is beyond the scope of Splunk's ability. The search query would have to somehow get the msd matched To and From events that are within the same hour. Or maybe I could throw the source ip that is in the To event into the query somewhere to isolate the results.

Here is an example, where I specified 1 MSD number because it has been used twice already today:
search query:

host = hostIp To_or_From_MSD_Number="10092" | stats values(Email_From_Address) as From,values(Email_To_Address) as To, count by To_or_From_MSD_Number|where count>1| fields - count

Results:

To_or_From_MSD_Number: From: To:
10092 from@first.com to@first.com
from@second.com to@second.com

And here would be the actual To and From events from the results:

Oct 8 06:27:19 mail msd[10092] MAIL command received, args: FROM: from@first.com
Oct 8 06:27:19 mail msd[10092] RCPT command received (1.1.1.1), args: TO: to@first.com
Oct 8 08:20:10 mail msd[10092] MAIL command received, args: FROM: from@second.com
Oct 8 08:20:11 mail msd[10092] RCPT command received (2.2.2.2), args: TO: to@second.com

After looking at the events here I noticed that the source ip in the To events differs, is it possible to separate the joined results from our search query by using that source ip? (meaning the 1.1.1.1 and the 2.2.2.2 ips in the To events)

The results I would hope could be achievable would look something like this:

Source IP: MSD Number: From: To:
1.1.1.1 10092 from@first.com to@first.com
2.2.2.2 10092 from@second.com to@second.com

I apologize for this difficult request. I will accept that what I ask for is over-complicated and not achievable if necessary.

0 Karma

renjith_nair
Legend

@silverlink34 , no problem! Are the IPs in both FROM & TO events or only in the TO?
If not and if the only way to find it based on time difference,then lets try transaction

My sample events

Oct 8 06:27:19 mail msd=10092 MAIL command received, args: FROM=from@first.com
Oct 8 06:27:19 mail msd=10092 RCPT command received (1.1.1.1), args: TO=to@first.com
Oct 8 06:40:19 mail msd=10098 MAIL command received, args: FROM=from@second.com
Oct 8 06:40:19 mail msd=10099 RCPT command received (1.1.1.1), args: TO=to@second.com
Oct 8 08:20:10 mail msd=10092 MAIL command received, args: FROM=from@third.com
Oct 8 08:20:11 mail msd=10092 RCPT command received (2.2.2.2), args: TO=to@third.com
Oct 8 09:20:10 mail msd=10019 MAIL command received, args: FROM=from@fourth.com
Oct 8 09:20:11 mail msd=10019 RCPT command received (2.2.2.2), args: TO=to@fourth.com

Search
"search terms and field extractions" |transaction MSD startswith="FROM" endswith="TO" keepevicted=false maxspan=1hr|table MSD,FROM,TO

Result

MSD FROM                         TO
10019   from@fourth.com   to@fourth.com
10092   from@third.com         to@third.com
10092   from@first.com         to@first.com
Happy Splunking!
0 Karma

renjith_nair
Legend

@silverlink34,
Are these fields in the same event or different events? If they are in the same event, can you try

     host = hostIp |where  Email_From_MSD_Number== Email_To_MSD_Number| stats list(Email_From_Address) list(Email_to_email_address) by msd_number 

And if they are in different events, you could add another field count to stats and then check if count>1 which will list only those events which has two msd numbers in the events which are same

Happy Splunking!
0 Karma

silverlink34
Explorer

The events are separate. I am trying to put together the search string with the count>1 suggestion you have proposed. Thank you for helping! I'll let you know where it gets me.

0 Karma

renjith_nair
Legend

If the events are separate, then it should work, added as answer with a dummy search. please try and let me know if it works for you

Happy Splunking!
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Hi @silverlink34,
I'm assuming that msd_number is unique per email.
...<index-search-here> | stats list(Email_To_Address) as To, list(Email_From_Address) as From by msd_number
Hope this helps!

0 Karma

silverlink34
Explorer

Thank you for your suggestion, I do appreciate your input. However, this gives the same result as my query does. Each event that contains the Email_From_Address or Email_To_Address already contains the msd_number in it, so the results in the stats list does not change. The results from my query returns every msd_number found, but these results have several rows that are missing the Email_From_Address and/or the Email_To_Address. I am unsure of how to only show results that have the same msd_number that is in the Email_From_Address and the Email_To_Address. I will try @renjith.nair's approach and see what that does.

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Hi @silverlink34, Can you please share some of the sample of your events and what do you want exactly as result?

0 Karma

silverlink34
Explorer

Yes sir! Here are the two events I am pulling information from, and the fields pulled from them:

"<22>Oct 3 15:00:17 mail msd[24056]: MAIL command received, args: FROM:"

"<22>Oct 3 15:00:17 mail msd[24056]: RCPT command received (0.0.0.0), args: TO:"

  1. Email_To_MSD_Number = 24056
  2. Email_To_Address = splunkhelp@domain.com
  3. msd_number = 24056

And the results I am trying to get are:

Column1:From Column2:To Column3:MSD Number
silverlink34@domain.com splunkhelp@domain.com 24056

... and the rest of the rows from the list of results, I want ALL of the Froms, Tos, and MSD numbers.
I ONLY want rows that have all three columns, do not want rows that have only From/To/ or MSD Number.
And of course once I can figure this out I can span off into my other desired results.

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

I've edited query in my answer, try that query. Hope that helps.

0 Karma

silverlink34
Explorer

I thank you all again for your continued support. I have found out why my results are not as expected, which is to two large factors:

  1. Even though the Email_To_MSD_Number and Email_From_MSD_Number fields are always present in ONLY an email To or From event, the generic msd_number field is found not only in Email To and From events, but other events as well. This results in our tested query to give rows without a To or From email address, because when using "by msd_number" it includes the rows that don't have an Email_From_Address or Email_To_Address.
  2. I have found that the msd_number is re-used later in the day, so it is not entirely unique per every message. I would almost have to make sure the Email_From_Address event and Email_To_Address events are within the same hour to ensure the msd_number is unique between them.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...