I am hoping to find a way to sift thru loads of emails to find emails with similar subjects or similar attachment names.
Currently I might search by subject or attachment name.
For example,
index=mail sourcetype="mail"
[search index=mail sourcetype="mail" message_subject = *<something>* |stats count by internal_message_id | fields internal_message_id]
|eval Time=strftime(_time, "%H:%M:%S") | eval Date=strftime(_time, "%A %F")
|stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(file_name) as AttachmentName list(attachment_type) as AttachmentType list(vendor_action) as status values(Time) as Time values(Date) as Date by internal_message_id
or
index=mail sourcetype="mail"
[search index=mail sourcetype="mail" file_name = *<something>* |stats count by internal_message_id | fields internal_message_id]
|eval Time=strftime(_time, "%H:%M:%S") | eval Date=strftime(_time, "%A %F")
|stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(file_name) as AttachmentName list(attachment_type) as AttachmentType list(vendor_action) as status values(Time) as Time values(Date) as Date by internal_message_id
I am looking to find all variations or patterns of similar emails...
for example
subj = Order-008796, Order-008948, Order-009485, etc.
AttachmentName = Order#00879, Order-008948, Order#009485, etc (extns like .doc are already parsed out natively in the log)
Whats the best way to find similar patterns? Cluster? Any other ideas?
Thank you
There are a few ways to do that, depending on the patterns you want to match. One is to use wildcards in the base search
index=mail sourcetype="mail" message_subject ="Order-*" | ...
or use like
index=mail sourcetype="mail" | where like(message_subject,"Order-%") | ...
or use regex
index=mail sourcetype="mail" | regex message_subject = "Order-\d{6}" | ...
There are a few ways to do that, depending on the patterns you want to match. One is to use wildcards in the base search
index=mail sourcetype="mail" message_subject ="Order-*" | ...
or use like
index=mail sourcetype="mail" | where like(message_subject,"Order-%") | ...
or use regex
index=mail sourcetype="mail" | regex message_subject = "Order-\d{6}" | ...
Thank you Rich. Before I accept your answer, just wanted to get your opinion on using cluster. When would you typically use cluster?
Thank you
I haven't used the cluster command, but it could apply in this case. I wonder what you'd get from index=mail sourcetype="mail" | cluster field=message_subject | ...
Thanks for the reply, I was thinking about cluster as more of an automatic check with less manual changes to the query.
I will experiment a bit, and post a new question in a while.
Thank you