Deployment Architecture

Restarting HOST(server) without stopping splunkd service

earakam
Path Finder

Hi ,

I was wondering, whether there will be any effect on Splunk when the host is restarted suddenly while splunk service is still running.
What kind of possibilities are there with this? Or shouldn't i worry at all?
I couldn't find any documents about this issue so i would appreciate it if anyone could tell me the link to the docs aswell, if it exists.

Thank you.

Tags (1)
0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

I have not seen any UF have an issue during an unexpected restart (e.g. power outage, crash or what have you), AS LONG AS the system itself recovers well enough.

Given their purpose and what they do, a Universal Forwarder should be robust. In a situation like you describe only processes that are writing something to disk (and something significant, to be honest) should ever have a problem. The UF doesn't really write much of anything important to disk, so can't be interrupted.

I understand this is not canon, but it makes sense. Of course, the more times you pull the plug from under a running system the higher the likelihood that the OS itself will not come back up. I would therefore heartily recommend doing your best to limit such unintended problems!

As two anecdotal pieces of evidence...

We accidentally tested this two weeks ago, with several dozen UFs (among a few other systems) getting their disks pulled from under them when one of our older SANs went offline. In all those cases, which consisted of mostly Microsoft Windows Server 2012 with some Server 2008R2 and a handful of *nix, the OS came back fine and the UF came back fine with no issues, picking up right where it left off. In one case I had an old ext2 filesystem have minor issues after coming back, but even so - an fsck and a bit of tidying up and once the OS was fine the UF started pushing data just like it had.

And, we seem to have one VMware host per year provide us with an unexpected test of High Availability. In other words, we have one die about once per year taking down 50-100 Virtual machines hard (the guests see this as a power loss event, unlike the previous example where the guest was still running - sort of - but just lost all its hard drives). They restart on other hosts in a few moments and I've never seen a UF not work fine afterwards.

View solution in original post

Richfez
SplunkTrust
SplunkTrust

I have not seen any UF have an issue during an unexpected restart (e.g. power outage, crash or what have you), AS LONG AS the system itself recovers well enough.

Given their purpose and what they do, a Universal Forwarder should be robust. In a situation like you describe only processes that are writing something to disk (and something significant, to be honest) should ever have a problem. The UF doesn't really write much of anything important to disk, so can't be interrupted.

I understand this is not canon, but it makes sense. Of course, the more times you pull the plug from under a running system the higher the likelihood that the OS itself will not come back up. I would therefore heartily recommend doing your best to limit such unintended problems!

As two anecdotal pieces of evidence...

We accidentally tested this two weeks ago, with several dozen UFs (among a few other systems) getting their disks pulled from under them when one of our older SANs went offline. In all those cases, which consisted of mostly Microsoft Windows Server 2012 with some Server 2008R2 and a handful of *nix, the OS came back fine and the UF came back fine with no issues, picking up right where it left off. In one case I had an old ext2 filesystem have minor issues after coming back, but even so - an fsck and a bit of tidying up and once the OS was fine the UF started pushing data just like it had.

And, we seem to have one VMware host per year provide us with an unexpected test of High Availability. In other words, we have one die about once per year taking down 50-100 Virtual machines hard (the guests see this as a power loss event, unlike the previous example where the guest was still running - sort of - but just lost all its hard drives). They restart on other hosts in a few moments and I've never seen a UF not work fine afterwards.

earakam
Path Finder

hi rick7177!
thanks you for the detailed response.
This is very useful information.

Thank you!

0 Karma

earakam
Path Finder

sorry additional information.
By splunk, i meant splunk universal forwarder.

thanks.

0 Karma

MuS
Legend

There shouldn't be any problem since the UF is only reading logs. As well the UF will pick up reading any logs file from the last know position in the logs.

cheers, MuS

0 Karma

earakam
Path Finder

understood...thanks for the response!

0 Karma

masonmorales
Influencer

Windows or Linux? And is it a full version of Splunk or the Universal Forwarder?

0 Karma

earakam
Path Finder

Thanks for the response!
it's a Linux and Universal forwarder!

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...