Solved: Index and Forward Cloning Recovery

ephemeric · ‎03-10-2013

Greetz,

When a heavy forwarder is indexing and forwarding, does it keep track of what is indexed at what point and what was forwarded and acknowledged?

If the output is blocked, say from down uplink, and the local indexing continues, does it keep a marker at which event was the last to be forwarded in the output queue and then forward the backlog of events from the local index once the uplink is back?

Thank you.

jrodman · ‎03-12-2013

The general forwarding protocol recovery design is as follows:

The forwarder (heavy or light, it doesn't matter actually) sends chunks of data to one or more indexers. It tracks which items it has sent to who, and keeps them in what we can describe as a sliding window. I don't remember exactly how many items are tracked (I believe it's actually the total size in KB of the data, because we have to keep it in memory for resending efficiency).

Anyway, we have our sliding window of sent items. The indexer or indexers are receiving these items, and processing them, eventually committing the data to the index. Through a reference-counting system, the receiving side knows when it has finished processing all of the data associated with a given chunk, and when this occurs, an acknowledgement is sent back to the forwarding system, indicating that the data has been handled. Usually for an indexer, this means it has all been written out to the operating system.

When the forwarder gets the acknowledgement (ack) for that particular item, it checks to see if it can advance the idea of the data committed from the file. Typically each acknowledgement can advance the point-of-reading in the file, but you could have out-of-order scenarios across multiple forwarders. The forwarder can only commit the earliest complete dataset. Essentially all it's storing is the quantity of bytes into the file have have been fully sent, and acknowledged by the receiving side.

In more complex scenarios, this can chain, where a forwarder f1 can send to a forwarder f2, which sends to an indexer. The intermediate node will use the same logic as the forwarder to wait for an aknowledgement, and it will use the same logic as the indexer to send an acknowledgement back to forwarder f1. It will not store any data about file progression, since it does not own the tailing logic for the data.

There are other ways to "fully handle" data, such as intentionally routing it to dev-null, or choosing to send it exclusively via a network link that doesn't participate in our scheme (raw text over tcp). These would also be "fully handled" states.

Ultimately, we only advance our disk-persisted idea of how far we have processed files when there is a end-to-end agreement as to the data being handled. This means that unplanned outages (crashes, networks breaking, etc) can result in data being retransmitted. For example, if data is in the waiting-for-ack window, and an indexer drops the socket without acknowledging data that was sent to it, the forwarder will resend this data to an indexer (typically a different one). So duplicated data does exist in our system, although we try to keep it to a minimum.

View solution in original post

jrodman · ‎03-12-2013

The general forwarding protocol recovery design is as follows:

The forwarder (heavy or light, it doesn't matter actually) sends chunks of data to one or more indexers. It tracks which items it has sent to who, and keeps them in what we can describe as a sliding window. I don't remember exactly how many items are tracked (I believe it's actually the total size in KB of the data, because we have to keep it in memory for resending efficiency).

Anyway, we have our sliding window of sent items. The indexer or indexers are receiving these items, and processing them, eventually committing the data to the index. Through a reference-counting system, the receiving side knows when it has finished processing all of the data associated with a given chunk, and when this occurs, an acknowledgement is sent back to the forwarding system, indicating that the data has been handled. Usually for an indexer, this means it has all been written out to the operating system.

When the forwarder gets the acknowledgement (ack) for that particular item, it checks to see if it can advance the idea of the data committed from the file. Typically each acknowledgement can advance the point-of-reading in the file, but you could have out-of-order scenarios across multiple forwarders. The forwarder can only commit the earliest complete dataset. Essentially all it's storing is the quantity of bytes into the file have have been fully sent, and acknowledged by the receiving side.

In more complex scenarios, this can chain, where a forwarder f1 can send to a forwarder f2, which sends to an indexer. The intermediate node will use the same logic as the forwarder to wait for an aknowledgement, and it will use the same logic as the indexer to send an acknowledgement back to forwarder f1. It will not store any data about file progression, since it does not own the tailing logic for the data.

There are other ways to "fully handle" data, such as intentionally routing it to dev-null, or choosing to send it exclusively via a network link that doesn't participate in our scheme (raw text over tcp). These would also be "fully handled" states.

Ultimately, we only advance our disk-persisted idea of how far we have processed files when there is a end-to-end agreement as to the data being handled. This means that unplanned outages (crashes, networks breaking, etc) can result in data being retransmitted. For example, if data is in the waiting-for-ack window, and an indexer drops the socket without acknowledging data that was sent to it, the forwarder will resend this data to an indexer (typically a different one). So duplicated data does exist in our system, although we try to keep it to a minimum.

jrodman · ‎03-12-2013

Full disclosure, it may be possible to disable acks if you try really hard (outputs.conf settings), or use really old versions of splunk.

ephemeric · ‎03-12-2013

Thank you! This is exactly the kind of technical detail I was looking for!

Index and Forward Cloning Recovery

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes