I have 3 mail servers like so, 2 postfix servers and the last one not important Exchange, like so:
Postfix1 -> Postfix2 -> Exchange
Both Postfix server logs are in splunk. I see that when postfix1 hands email to postfix2 it correlates mail with a queued_queue_id
How can I correlate the 2 sets of logs together in one transaction?
Sep 5 11:04:27 nagaviper postfix/smtpd[21218]: B570F448E65: client=unknown[192.168.1.65]
Sep 5 11:04:27 nagaviper postfix/cleanup[21216]: B570F448E65: message-id=<20140905150427.B570F448E65@nagaviper.123.com>
Sep 5 11:04:27 nagaviper postfix/qmgr[20283]: B570F448E65: from=<autonotify@test.com>, size=1771, nrcpt=1 (queue active)
Sep 5 11:04:27 nagaviper postfix/smtp[20729]: B570F448E65: to=<user@123.com>, relay=localhost[127.0.0.1]:10025, delay=0.18, delays=0.04/0/0.05/0.09, dsn=2.0.0, status=sent (250 OK: data received)
Sep 5 11:04:27 nagaviper postfix/qmgr[20283]: B570F448E65: removed
Sep 5 11:04:27 habanero postfix/smtpd[46622]: 8966F601B4: client=businessclass.comcast.net[11.22.33.44]
Sep 5 11:04:27 habanero postfix/cleanup[52613]: 8966F601B4: warning: header Subject: Speed Alert!!! from businessclass.comcast.net[11.22.33.44]; from=<autonotify@test.com> to=<user@123.com> proto=ESMTP helo=<businessclass.comcast.net>
Sep 5 11:04:27 habanero postfix/cleanup[52613]: 8966F601B4: message-id=<>
Sep 5 11:04:27 habanero postfix/qmgr[56439]: 8966F601B4: from=<autonotify@test.com>, size=1418, nrcpt=1 (queue active)
Sep 5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)
Sep 5 11:04:27 habanero postfix/qmgr[56439]: 8966F601B4: removed
I initially had:
| transaction queue_id keepevicted=true maxspan=60s| transaction message_id keepevicted=true maxspan=120s
Until I found not all mail servers give a message_id and when it is blank as it was in this example it failed to correlate the entry.
Thank you,
Brian
In the events of the habanero instance there is a line with the queue id of the nagaviper instance:
Sep 5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)
You could try the following:
| transaction queue_id keepevicted=true maxspan=60s| rex "queued\sas\s(?<matching_id>[^\)]+)" | eval matching_id=coalesce(matching_id,queue_id) | transaction matching_id keepevicted=true
In the events of the habanero instance there is a line with the queue id of the nagaviper instance:
Sep 5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)
You could try the following:
| transaction queue_id keepevicted=true maxspan=60s| rex "queued\sas\s(?<matching_id>[^\)]+)" | eval matching_id=coalesce(matching_id,queue_id) | transaction matching_id keepevicted=true
coalesce is the way to go. If you have the proper sourcetype, you may have queued_as field, so "| eval matching_id=coalesce(queued_as,queueid) |" will do and you can drop the previous "rex". Be aware that you may get the same queue_id value on both servers within the maxspan timeframe, so consider differentiating your transactions adding the host field too: "| transaction queue_id, host ..."
Splunk can tame the "Postfix logs" beast! 🙂
You're welcome
Thank you!
It worked and thank you for the awesome explanation!