[Unit]
Description=Splunk
After=network.service
Wants=network.service
[Service]
Type=forking
User=splunk
Group=splunk
TimeoutSec=200
RemainAfterExit=yes
PIDFile=/opt/splunk/var/run/splunk/conf-mutator.pid
ExecStart=/opt/splunk/bin/splunk start --answer-yes --no-prompt --accept-license
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
StandardOutput=null
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EDIT:
At first I thought this unit file with RemainAfterExit and PIDFile populated resolved the problem, however with further testing and studying the systemd documentation I've found it to be ineffective.
Due to the way systemd handles process execution (systemctl->cgroup->process), restarting the splunk service without using systemctl commands will drop the process out of management no matter if you set the PID file or not.
Right now I only see two options when running splunk through systemd unit files;
1) Run the unit file with RemainAfterExit=yes. This forces systemd to mark the process as active even after the tracked splunkd PID has exited. Unfortunately, this also means that if Splunk crashes the process is still marked as healthy.
2) Run the unit file without RemainAfterExit=yes (defaults to no). This means that if systemd sees the root splunkd process exit (even if it soon after restarts) it marks the service as down. This of course doesn't play nice with watchdog/puppet/chef etc.
To my understanding, for this to be resolved either systemd or Splunk would have to make significant codebase changes.
Even using the sysvinit compat layer (the default on RHEL7 installs where splunk enable boot-start is run) causes the same issue where the splunkd process restarting, stopping, or crashing causes systemd to loose track of the process state, marking it as "active (exited)" (seems to be using RemainAfterExit=yes like my unit file). I'm stumped.
... View more