Hi,
We are using SharePoint ULS Viewer to watch SharePoint logs which are any errors, warnings, and critical things based on category.
I have a few questions here with respect to Splunk.
Currently we are indexing SharePoint ULS logs. I am able to pull information from SharePoint. For example:
sourcetype=SharePoint| where Level="Unexpected" AND Category="Upgrade"| table Message
1) How to trace SharePoint real-time errors or issues in Splunk? How to configure? If there are any articles on this, can you please share? (example: All of a sudden the SharePoint server farm was down, like database is down, farm is shut down...something like this)
2) What is the best way for configuring real-time alerts in Splunk with respect to SharePoint?
Any suggestions will be appreciated.
Thanks,
Guru Prasad K
As there are many billions of different possible errors in Sharepoint, I'm not sure anyone's article on the internet is really going to add value to your use cases and scenarios.
I would just search for "exception", "error", "critical", etc. and develop a baseline for the issues you experience in your farm. Got an issue where a systems engineer always deploys the wrong connection string? look for the errors that causes. Got an issue where a developer always uses the wrong quotes on his macbook? look for the errors that causes, etc etc etc.
Real time searches should generally be avoided in Splunk. Consider configuring a search that looks at data from 5 - 10 minutes ago and alerts based on that instead.
... index_earliest=-10m@m index_latest=-5m@m ...
Configure the search to run every 5 minutes, and you should find this to be the most reliable method unless you have indexing latency greater than 5 minutes.
The best advice that I can give you is to completely avoid real-time. Not only will it not do what you hope it will, it will crater the performance of your Splunk cluster. The second best advice that I can give you is to go research previous outages. Probably there was a RCA and probably inside that RCA are sample logs. You can start there with setting up alerts (NOT REALTIME) to look for problems that you have had before and respond accordingly.
As there are many billions of different possible errors in Sharepoint, I'm not sure anyone's article on the internet is really going to add value to your use cases and scenarios.
I would just search for "exception", "error", "critical", etc. and develop a baseline for the issues you experience in your farm. Got an issue where a systems engineer always deploys the wrong connection string? look for the errors that causes. Got an issue where a developer always uses the wrong quotes on his macbook? look for the errors that causes, etc etc etc.
Real time searches should generally be avoided in Splunk. Consider configuring a search that looks at data from 5 - 10 minutes ago and alerts based on that instead.
... index_earliest=-10m@m index_latest=-5m@m ...
Configure the search to run every 5 minutes, and you should find this to be the most reliable method unless you have indexing latency greater than 5 minutes.
Thanks for the prompt response.
Yes as you said there are so many errors will come in SharePoint. I am agreeing that. shall we configure specific errors list in Splunk which are related to SharePoint? So that I can set a alert, if this kind of errors are comes, Is that possible right? I have checked so many articles which are related to Alerts. Can you provide any simple article which we can configure alerts in Splunk.
Thanks,
Guru Prasad K
You're basically looking for level="Unexpected" , Critical Error, Error, or High. See this article for a list of severities and proper responses.
https://msdn.microsoft.com/en-us/library/office/ff604025(v=office.14).aspx
Creating an alert in splunk:
http://www.splunk.com/view/SP-CAAAGYG
Thanks for the mail. Yes I watched the video which is related to the alerts creation. I will work on this.
Thank you very much for your suggestions on this.