We have some scheduled jobs that I recently noticed on the Jobs page have error messages ("max_mem_usage_mb has been reached" in our case). I wasn't aware that these searches were not producing the correct results due to running out of memory. Is there a way to set up an email alert to be notified when scheduled jobs have error messages? I'm able to find the messages in var/run/splunk/dispatch, but that data doesn't appear to be searchable (like in _internal for instance) in which case I could set up a scheduled search to detect these occurrences. In the absence of the error messages being searchable, how can we be notified?
Also, I am able to find the job run in index=_internal (sourcetype = scheduler)
, but the entry says "status=success"
even though the Job page lists an error.
I think this usually comes up in the splunkd.log:
02-15-2019 09:56:05.815 ERROR StatsProcessor - Reached limit max_mem_usage_mb (200 MB), results may be incomplete! Please increase the max_mem_usage_mb in limits.conf .
You may be able to build an alert using something like this as a base search:
index=_internal sourcetype=splunkd component=StatsProcessor log_level=ERROR max_mem_usage_mb
I haven't encountered this error, so this is just my best guess here.
Not even this basic search returns anything related to the job that failed:
index=_internal max_mem_usage_mb