Solved: Suggestions for distributing over geographic areas

robertlabrie · ‎06-27-2014

Good morning,

Today we have 3 separate sites (HQ US, PROD US, and UK) all tied together with WAN links. We're a Windows shop running AD. For the last few years, we've used a single Splunk installation at HQ as both indexer and search head. We've noticed some issues polling large log files with CIFS over the WAN link, and have decided to break things up. My question is:

Is it better to have one indexer at each location, and let the search head query all 3 over the WAN, or should I use forwarders at the remote locations for send everything back to a single indexer/search head at HQ?

I swear I've googled for this with limited success. Any suggestions are welcome.

martin_mueller · ‎06-27-2014

Alright - if all the users are in HQ, you have hardware in HQ, and don't need cross-site redundancy... then I'd personally put the indexer(s) and search head(s) in HQ and universal forwarders wherever the data is produced/collected.

That's the generic answer, here's some thoughts/exceptions.
If you intend to heavily filter data before indexing you should consider deploying heavy forwarders across your data centres to filter there and hence reduce the load on the WAN.
If you search very very rarely, e.g. use Splunk just for rainy-day storage and don't intend to ever look into the data unless something very rare occurs, then placing indexers at each data centre might be better. Keeps the traffic locally, and the impact on searches isn't a problem because you search so very rarely in that scenario.

View solution in original post

martin_mueller · ‎06-27-2014

Alright - if all the users are in HQ, you have hardware in HQ, and don't need cross-site redundancy... then I'd personally put the indexer(s) and search head(s) in HQ and universal forwarders wherever the data is produced/collected.

That's the generic answer, here's some thoughts/exceptions.
If you intend to heavily filter data before indexing you should consider deploying heavy forwarders across your data centres to filter there and hence reduce the load on the WAN.
If you search very very rarely, e.g. use Splunk just for rainy-day storage and don't intend to ever look into the data unless something very rare occurs, then placing indexers at each data centre might be better. Keeps the traffic locally, and the impact on searches isn't a problem because you search so very rarely in that scenario.

martin_mueller · ‎07-02-2014

You're burdening your source systems with running the Splunk HTTP server at all times just in case someone may want to change its configuration. You also require a lot of manual interaction when such a change does happen, especially over a large number of systems.

martin_mueller · ‎07-02-2014

The default Splunk way would be to deploy Universal Forwarders connected to a central Deployment Server in your HQ. You'd build apps on the DS that get sent out to the UFs, containing for example the outputs.conf listing your central Indexer, or the various inputs.conf each UF needs to find the logs. Setting this up usually pays for itself in very short order, especially if you have several UFs with at least partially identical configuration. Heavy Forwarders can be configured in the same way.

Configuring Forwarders through the UI as a long-term solution feels weird. (cont.)

robertlabrie · ‎07-02-2014

Hi Martin,

Thanks. It sounds like searching over the WAN isn't such a hot idea. I was leaning the same way.

Do you know how to go about configuring inputs via the GUI when distributed? If I used a heavy forwarder, would I just connect to that instance and configure there?

robertlabrie · ‎06-27-2014

Hi Martin,

Thanks for your response. Fault tolerance isn't that important to us, and the only users are at HQ. The other locations are co-lo data center with no staff. I probably should have said that sooner. I think maybe I don't need the overhead of replicating everything between all nodes in a cluster.

martin_mueller · ‎06-27-2014

Best might be to have a distributed cluster with indexers at each site and replication between the indexers, in case fault tolerance is of any issue to you. With new clustering features you can teach Splunk about which box is sitting in which location, so you can enforce replicated copies to exist in each location and let your search head(s) preferably search replicated buckets on local indexers.

http://docs.splunk.com/Documentation/Splunk/6.1.1/Indexer/Multisiteclusters

Note, clustering - and especially multisite clustering - is moderately advanced Splunkfu.

Suggestions for distributing over geographic areas

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes