Too much, too late, but this works:
Spoof data:
| makeresults
| eval raw="[01:00:22.164297] x=ABC mod=mail cmd=msg rule=ruleQ subject=\"Test 123\" size=8583::[01:00:22.136496] x=ABC mod=spam cmd=run rule=notspam::[01:00:22.106325] x=ABC mod=spam cmd=run policy=outbound::[01:00:22.067675] x=ABC mod=mail cmd=attachment file=text.html size=3347::[01:00:22.039732] x=ABC mod=mail cmd=attachment file=text.txt size=2093::[01:00:22.010986] x=ABC mod=session cmd=data rcpt=personA@rec.org::[01:00:22.010986] x=ABC mod=session cmd=data rcpt=personB@rec.org::[01:00:22.000234] x=ABC mod=mail sender=noreply@sender.org"
| makemv delim="::" raw
| mvexpand raw
| rename raw AS _raw
| rex "\[(?<time>[^\]]+)\]\s+x=(?<x>\w+)\s+mod=(?<mod>\w+)\s+cmd=(?<cmd>\w+)\s+rule=(?<rule>\w+)(?:\s+subject=\"(?<subject>[^\"]+)\"\s+size=(?<size>\d+))?"
| rex "\[(?<time>[^\]]+)\]\s+x=(?<x>\w+)\s+mod=(?<mod>\w+)\s+cmd=(?<cmd>\w+)\s+policy=(?<policy>\w+)"
| rex "\[(?<time>[^\]]+)\]\s+x=(?<x>\w+)\s+mod=(?<mod>\w+)\s+cmd=(?<cmd>\w+)\s+file=(?<file>.*)\s+size=(?<size>\d+)"
| rex "\[(?<time>[^\]]+)\]\s+x=(?<x>\w+)\s+mod=(?<mod>\w+)\s+cmd=(?<cmd>\w+)\s+rcpt=(?<rcpt>.*)"
| rex "\[(?<time>[^\]]+)\]\s+x=(?<x>\w+)\s+mod=(?<mod>\w+)\s+sender=(?<sender>.*)"
Now the solution:
| eval spam_rule=if(mod="spam", rule, null())
| eventstats values(spam_rule) AS spam_rule by x
| eval file_detail = time . ":-:" . file . ":-:" . size . ":-:" . spam_rule
| fields - _raw _time time cmd mod policy file size rule spam_rule
| eventstats values(file_detail) AS file_detail BY x
| stats values(*) AS * BY file_detail rcpt x
| rex field=file_detail "^(?<time>.*):-:(?<file>.*):-:(?<size>.*):-:(?<rule>.*)$"
| table time x subject file sender rcpt size rule
Note that my solution preserves the time that the spam rule was executed on the file (which was not in the original ask, I know).
... View more