Scenario:
I want to search all email logs by a specific subject and identify if any recipient value (email address) matches any email address on my list.
Steps:
1) Search for all emails by subj
index=email sourcetype=xemail |stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID | search subj="Hello" (but then???)
2)With the results I want to run a secondary search to find all the recp values that match any of the (approximately 50 email addresses) on my list. Ideally, another value would follow each result with either yes or no.
For example:
UID subj sender recp on-list
1234 Hello sender@X.com recipient@y.com yes
4556 Hello sender@z.com recipient@w.com no
Please advise what the most efficient way to do this is, with an example 🙂
Thank you
Have you list stored to splunk as static lookup table file, say myemaillist.csv with fields email,flag where email is the email you want to search and flag=1 (hard-coded value)
index=email sourcetype=xemail |stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID | search subj="Hello" | lookup myemaillist.csv email as recp OUTPUT flag | eval On_list=if(flag=1,"yes","no") | fields - flag
Generally, try to limit your results as soon as possible. In your case, move
| search subj="Hello"
to the beginning like so:
index=email sourcetype=xemail message_subject="Hello" |stats ...
This will decrease the time needed for your searches, as the number of events splunk fetches from disk at the very beginning is possibly a lot smaller.
awesome thanks!!!!
Yeah it speeds things up big time, but unfortunately I need the "by UID " correlation, so if I make your suggested move I lose sender recipient and status field values. (I posted this problem somewhere else). Maybe I should use time ranges first?
My previous question [What is the best way to correlate events (from same source type) that share a common field value?] explains why I need to search after stats...
However if you have any other ideas to speed up the search, please add a comment. I am not finding a good way to reduce the cost of the search.
Ah, yeah, I understand the problem. I would recommend using a search like this:
index=email sourcetype=xemail [search index=email sourcetype=xemail subject="Hello"| stats count by UID | fields UID] | stats list(...
That way, the subsearch contains your search for the subject in question, returns the UIDs these subjects appear in and feeds them into the main search.
Thank you, I will try your suggestion and let you know.
You are providing an answer regarding Optimization, and deserve credit.
Should I create a new question so I can accept your answer if it works?
That's thoughtful, thank you, but it's fine. You could upvote the comment that helped you most, but you noticing and appreciating my effort is reward enough 🙂
Hey Jeff - your the man!!! I have been going down the wrong (search) path, but now I see the light. Your example was perfect, it works great!!!! Thanks for the help!
Have you list stored to splunk as static lookup table file, say myemaillist.csv with fields email,flag where email is the email you want to search and flag=1 (hard-coded value)
index=email sourcetype=xemail |stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID | search subj="Hello" | lookup myemaillist.csv email as recp OUTPUT flag | eval On_list=if(flag=1,"yes","no") | fields - flag
Thank you for the suggest. I tried a Lookup but could not get it to work. I will follow your instructions and let you know. Thank you again.
This works great!!!