On patch night some of my splunk servers are not starting.
I can see the ones that are starting with this search
host=*mysplunkservers* index=sos sourcetype=ps process="splunkd *-p_8089_*start" | stats count by host process
This will not tell me which ones are not running
If i had a search that looked for all the servers and filled in null for the ones that are not reporting I could run an alert and send out a pager notification.
How can I write the search to find the servers that are not reporting?
Have you looked at the platform alerts in the distributed management console? There is an alert there for when an indexer is stopped. See Platform alerts in the Admin Manual for more information.
I'd suggest either maintaining a lookup table with the full list of servers and doing a join to it, or you can extend your search by a few hours / days and instead of stats count ...
do stats latest(_time) as last_event_time ...
so you get to see when the latest event for that server came in. You can then sort by the time and see which are the oldest servers (and whether they are from before your patch).