Getting Data In

Forwarder Redundancy

melonman
Motivator

Hi,

I want to know the best practice and patterns that makes Forwarders highly available and redundant.
- SH pooling for Search Head redundancy,
- Indexer's Index&Forward (Replication) for Indexer redundandy,
- AutoLB provides HA for connection between Forwarders and indexers.

However, Forwarder itself looks Single Point of Failure.

How do people configure forwarders to eliminate Single Point of Failure?

Thanks,

Tags (3)
1 Solution

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

View solution in original post

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

melonman
Motivator

Thanks a lot R.Turk.
This helped a lot!, and also found this.
Simply share this for everyone else.

http://splunk-base.splunk.com/answers/39482/anycast-redundancy-with-syslog

Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...