Getting Data In

What is recommended to get a Splunk universal forwarder to pick up data after a SQL server cluster failover?

sat94541
Communicator

Customer has many SQL Server clusters that are using Windows Failover Clustering.
Splunk is installed at the node-level (so if there are 4 physical servers, Splunk is installed 4 times).

When an instance "fails over" from one node to another (say from node 1 to node 2), the Splunk agent on node 2 does not start picking up the log file.
A simple restart of the forwarding agent on node 2 "fixes" this problem.

We suspect that cause of this is the fact that the SQL Failover groups have a drive letter that moves from one host to another on failover. For example, when the SQL failover group is on node 1, the "E:\" drive will be physically mounted on node 1. When it "fails over" to node 2, the "E:\" drive is pulled away from node 1 and mounted on node 2. The Splunk forwarder on node 2, while configured to look on the E:\ drive has already "excluded" this search path sine the drive didn't exist on startup.

This is why restarting the forwarder allows it to "find" the E:\ drive and grab the log files.

What sort of solution does Splunk recommend we do for this?
Is there a Splunk configuration setting where we can have the forwarder continue to look for log files, even if the drive isn't there? Do we need to auto-restart the agent on failover?

0 Karma
1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...