Getting Data In

Forwarder Redundancy

melonman
Motivator

Hi,

I want to know the best practice and patterns that makes Forwarders highly available and redundant.
- SH pooling for Search Head redundancy,
- Indexer's Index&Forward (Replication) for Indexer redundandy,
- AutoLB provides HA for connection between Forwarders and indexers.

However, Forwarder itself looks Single Point of Failure.

How do people configure forwarders to eliminate Single Point of Failure?

Thanks,

Tags (3)
1 Solution

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

View solution in original post

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

melonman
Motivator

Thanks a lot R.Turk.
This helped a lot!, and also found this.
Simply share this for everyone else.

http://splunk-base.splunk.com/answers/39482/anycast-redundancy-with-syslog

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...