I have a scheduled saved search which runs every 15mins. This search calls a custom python external command that does some HTTP calls.
index=main | head 100 | mypythoncommand | collect index=mysummaryindex
Sometimes the the external command fails (say because the service it's calling is down and I get an exception doing the request), in which case I log the error and exit with a failure code.
Imagine the python external command looks something like this:
"""
mypythoncommand.py does an http call which sometimes fails
"""
import sys
import splunk.Intersplunk
import requests
def main():
(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
if isgetinfo:
splunk.Intersplunk.outputInfo(False, False, True, False, None, False)
return
results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
try:
response = requests.get("https://www.example.com",
timeout=30, verify=false)
except:
splunk.Intersplunk.parseError(sys.exc_info())
exit(-1)
splunk.Intersplunk.outputResults(results)
if __name__ == "__main__":
main()
When you run this search from Splunk Web, the search fails with a descriptive error, and the search.log has the details. I see an error like:
command="mypythoncommand", (<type 'exceptions.Exception'>, Exception(u'Non-OK response from external mypythoncommand service: {"message":"Something bad"}',), <traceback object at 0x7faa83763200>)
However, I can't find any log entries in _internal or _audit that show that the search failed. In fact, _internal shows that the search completed with status success ( source = /opt/splunk/var/log/splunk/scheduler.log sourcetype=scheduler
)
So far I've tried the _audit and _internal indexes, but no luck. Is there any other way to determine which scheduled saved search runs failed with an error?
Give this a try
index=_internal sourcetype=splunk_python OR sourcetype=scheduler | rex "(Saved Search \[|savedsearch_name=\")(?<SavedSearchName>[^\"\]]+)" | transaction SavedSearchName startswith=sourcetype=splunk_python endswith=sourcetype=scheduler