I'd like to come up with a search/report that can display the number of emails sent "from" a particular to unique "to" addresses over a period of time. The output come look something like:
user@domain1.com 15 somebody@another.com 35 test@someplace.com 50 Total Sent 100
So here is what I'm trying to figure out:
Find all results for "mysender@mydomain.com". Take the qid (example: p2IIboft020468) and do a subsearch to find the "to" address Format the results from the "to" search into the above format.
I've been using Ayn's method, myself, for some time, but transaction is painfully slow over a large log. In looking at this particular question, I actually found a vastly better method, courtesy of gkanapathy ( http://answers.splunk.com/questions/1478/sendmail-transactions ).
If you're looking for data about a particular from address, it's much faster to filter on those qids first and then look specifically at the To addresses, if the below assumptions are correct:
Using:
sourcetype=mail
[search sourcetype=mail sender=myuser@mycompany.com
| dedup qid
| fields qid
]
| stats count by recipient
I'm not sure how to do a totals row in the same table.. If you're going to put this in a dashboard, though, you could do the above (sans the stats command) as a hidden search, have one post process for the main table, and then another post process for a single value field with the total number of emails sent.
I went ahead and did a test in my environment, comparing
tag=mail
[search tag=ab_mail SenderAddress=me@me.com
| dedup ExchangeMSGID
| fields ExchangeMSGID
]
| stats count by RecipientAddress
to
tag=mail | transaction ExchangeMSGID maxspan=30s
| search SenderAddress=me@me.com
| stats count by RecipientAddress
over the last 30 days. The first one completed in 38 seconds; I killed the second one 4.8% in, after 227 seconds. Now I need to go re-write some of my own reports to use this better method...
I would do this:
sourcetype=sendmail | stats values(from) as from,values(to) as tolist by qid | mvexpand tolist | rename tolist as to| stats count by from,to
It's basically the same as transaction
, but if you're running with more than one indexer, it will scale up much better. That's because transaction
does a lot of work that we don't need here, that prevents the job to be more effectively distributed by map-reduce (e.g., keeping timestamps in order, calculating durations). The stats
command above doesn't do this as much, so should run much faster than transaction
if you have multiple indexers.
I would do this:
sourcetype=sendmail | stats values(from) as from,values(to) as tolist by qid | mvexpand tolist | rename tolist as to| stats count by from,to
It's basically the same as transaction
, but if you're running with more than one indexer, it will scale up much better. That's because transaction
does a lot of work that we don't need here, that prevents the job to be more effectively distributed by map-reduce (e.g., keeping timestamps in order, calculating durations). The stats
command above doesn't do this as much, so should run much faster than transaction
if you have multiple indexers.
Thanks. I did notice that using transaction would really chock the system pretty good. I had come up with something that was somewhat useful, but after seeing the responses in this thread, I still have a lot to learn!
I've been using Ayn's method, myself, for some time, but transaction is painfully slow over a large log. In looking at this particular question, I actually found a vastly better method, courtesy of gkanapathy ( http://answers.splunk.com/questions/1478/sendmail-transactions ).
If you're looking for data about a particular from address, it's much faster to filter on those qids first and then look specifically at the To addresses, if the below assumptions are correct:
Using:
sourcetype=mail
[search sourcetype=mail sender=myuser@mycompany.com
| dedup qid
| fields qid
]
| stats count by recipient
I'm not sure how to do a totals row in the same table.. If you're going to put this in a dashboard, though, you could do the above (sans the stats command) as a hidden search, have one post process for the main table, and then another post process for a single value field with the total number of emails sent.
I went ahead and did a test in my environment, comparing
tag=mail
[search tag=ab_mail SenderAddress=me@me.com
| dedup ExchangeMSGID
| fields ExchangeMSGID
]
| stats count by RecipientAddress
to
tag=mail | transaction ExchangeMSGID maxspan=30s
| search SenderAddress=me@me.com
| stats count by RecipientAddress
over the last 30 days. The first one completed in 38 seconds; I killed the second one 4.8% in, after 227 seconds. Now I need to go re-write some of my own reports to use this better method...
My take on this: first, create the needed field extractions for the logs. Let's call the sourcetype "sendmail" and the fields of primary interest "qid", "from" and "to". Now with these extractions defined, group events together with a transaction
based on qid
. Then simply make a toplist using top
, say for 100 entries.
sourcetype="sendmail" | transaction qid | top 100 from,to
That should give you something similar to what you seem to be looking for. The only thing missing would be the total count at the end, which you'll have to handle separately.