Getting Data In

Universal Forwarder block stops all indexing completely

phoenixdigital
Builder

Hi All,

We have a customer who could not justify the cost of a clustered solution. So they went down the following route.

Basic System

2x Indexers with Splunk frontends
3x Universal Forwarders

Data from Forwarders

  1. One set of polling logs goes to Indexer-1
  2. A second set of logs goes to Indexer-2 (same data sent to Indexer-1 but less frequent polling)
  3. And the Unix TA logs go to both indexers

It was envisioned that if Indexer-1 dies Indexer-2 will still be chugging along with a similar data set that is polled less frequently.

This all currently works perfectly.

However if you take one of the indexers offline the universal forwarders queues fill up as they cannot send data to the offline indexer. The whole indexer grinds to a halt and no new data is sent to the indexer that is still online.

While I understand the system is protecting against data loss. The whole system grinding to a halt is actually much worse.

I thought blockOnCloning in outputs.conf might resolve this as the Unix TA logs are cloned but based on the default behaviour of this is not the issue causing the queue to fill up either.

dropEventsOnQueueFull does not appear behave how I would expect it to behave. Docuemntation seems to indicate it doesn't drop the queue contents it cannot deliver (due to indexer outage) it just keeps the queue full and drops any new data. So instead of getting rid of the data that is causing the blockage and continuing it just drops everything new??? Seems a bit backwards to me.

Is there any way to resolve this?

I dont care if data is lost for the offline indexer I just want my remaining online indexer to keep getting data.

0 Karma

phoenixdigital
Builder

dropEventsOnQueueFull (in outputs.conf) seems to have resolved it even though the manual seems to indicate it does the exact opposite and drops NEW events.
http://docs.splunk.com/Documentation/Splunk/latest/admin/outputsconf

May I recommend to a Splunk staff member to reword the manual entry for this to be less amiguous.

  • If set to a positive number, wait seconds before throwing out all new events until the output queue has space.

change to

  • If set to a positive number, wait seconds before throwing out all new events (already in the queue) until the output queue has space. New events arriving at the indexer will still be placed onto the queue.

The way it is currently worded seems to indicate that once the queue is full any new events arriving at the indexer will be dropped. It makes no mention of removing/dropping data from the queue itself.

Is there a better solution here?

I have also tried setting queues in inputs.conf which has no effect.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...