Distributed Systems Lecture 8

Lecture Given by Lindsey Kuper on April 15^th, 2020 via YouTube

Previous	Next
Lecture 7	Lecture 9

Rules of the Chandy-Lamport Algorithm

This is an example of a decentralised¹ algorithm that allows you to take a global snapshot of a running distributed system, and its design gives us two key advantages:

Any participating process can initiate a snapshot.
The process initiating the snapshot is not required to occupy some elevated role (such as "supervisor") because this task not considered "special" or "privileged".
The process initiating the snapshot does not need to warn the other processes that this action is about to take place.

The act of initiating a snapshot creates a cascade of marker messages throughout the entire system. This message cascade then causes all the other processes to take a snapshot of themselves.

The Initiator Process

Records its own state
Sends a marker message out on all its outgoing channels
Starts recording messages arriving on all incoming channels

If process P1 decides to initiate a snapshot, then the following sequence of events takes place:

P1 records its own state as S1
Immediately after recording its own state, P1 sends out marker messages on all its outgoing channels (only one in this case: channel C₁₂)
P1 starts recording any messages that might arrive on its incoming channels (again, only one in this case: channel C₂₁)

Notice that at the time P1's snapshot happens, message m is currently in the channel from P2 to P1 (channel C₂₁).

Processes Receiving a Marker Message

IMPORTANT

Due the fact that all channels behave as FIFO queues, we do not need to be concerned about the possibility of FIFO anomalies. This system is designed such that marker messages cannot arrive before earlier message-send events in the originating process.

None of what follows would work if we had not first eliminated the possibility of FIFO anomalies!

When a process receives a marker message, it can react in one of two different ways. How it reacts depends on whether or not that process has already seen a marker message during this run of the global snapshot.

Scenario 1: Nope, I Haven't Seen a Marker Message Before...

If this is the first time this process has seen a marker message, the receiver:

Records its own state
Flags the channel on which the marker message was received as empty
Sends out a marker message on each of its outgoing channels
Starts recording incoming messages on all channels except the one on which it received the original marker message (now flagged as empty)

Q: During a snapshot, once a channel is marked as empty, what happens if you then receive a message on that channel?
A: Whilst the snapshot is running, messages received on channels marked as empty are ignored!

In the diagram below, since this is the first marker message P2 has seen, it does the following:

It records its own state as S2
Flags channel C₁₂ as empty
Sends out a marker message on all its outgoing channels (in this case, only channel C₂₁)
Normally, it would now start recording any messages that arrive on its other, incoming channels; however, in this case, since its only incoming channel (C₁₂) has already been marked as empty, there is nothing to record

Scenario 2: Yup, I've Already Seen a Marker Message...

If a process sends out a marker message, then we consider that process already to have "seen" a marker message (its own). So when a process that has already sent out its own marker message receives someone else's marker message, it:

Stops recording incoming messages on that channel
Sets that channel's final state to be the sequence of all messages received whilst recording was active

Message m from P2 (sent at event C) arrives on channel C₂₁ as event D in process P1. This message arrived before the marker message because channels always behave as FIFO queues.

Upon receiving this marker message, P1 then:

Stops recording on the marker message's channel (C₂₁ in this case)
The final state of channel C₂₁ is set to the sequence of messages that arrived whilst recording was active

So, we now have a consistent snapshot of our entire system, which in this simple case, consists of four things:

The state of our two processes:
- P1's state recorded as S1
- P2's state recorded as S2
The state of all channels between those processes:
- Channel C₁₂ recorded by P2 (Empty)
- Channel C₂₁ recorded by P1 (Message m)

The Chandy-Lamport Algorithm in a More Detailed Scenario

When a snapshot takes place, every process ends up sending out a marker message to every other process. So, for a system containing N participating processes, N * (N - 1) marker messages will be sent. This might seem inefficient as the number of messages rises quadratically with the number of participating processes, but unfortunately, there is no better approach.

As stated in the previous lecture, the success of the Chandy-Lamport algorithm relies entirely on the truth of the following assumptions:

Eventual message delivery is guaranteed, thus making delivery failure impossible
All channels act as FIFO queues, thus eliminating the possibility of messages being delivered out of order (FIFO anomalies)
Processes don't crash! (See lecture 10)

A Worked Example

In this example, we have three communicating processes P1, P2 and P3 in our system, and we want to take a snapshot.

Process P1 acts as the initiator; so it follows the above steps:

It records its own state as S₁
It sends out two marker messages; one to P2 and one to P3 - but notice that the arrival of the marker message at P2 is delayed. This turns out not to be a problem.
P1 starts recording on both its incoming channels C₂₁ and C₃₁

Next, P3 receives the marker message from P1. Since this is the first marker message it has received:

It records its own state as S3
Marks the channel on which it received the marker message (C₁₃) as empty
Sends out marker messages on all its outgoing channels
Starts recording on its other incoming channel (C₂₃)

Looking at P3's marker message that now arrives at P1, since P1 initiated the snapshot process, this is not the first marker it has seen, so P1:

Stops recording incoming messages on that channel (C₃₁)
Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C₃₁ is {}.

Now look at the other marker message from P3 to P2. This is the first marker P2 has seen, so it:

It records its own state as S2
Marks the channel on which it received the marker message (C₃₂) as empty
Sends out marker messages on all its outgoing channels
Starts recording on its other incoming channel (C₁₂)

Eventually, the initial marker message from P1 arrives at P2. This is the second marker P2 has seen, so it:

Stops recording incoming messages on that channel (C₁₂)
Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C₁₂ is {}.

P2's marker message now arrives at P1. This is not the first marker P1 has seen, so it:

Stops recording incoming messages on that channel (C₂₁)
Sets that channel's final state to be the sequence of all messages received whilst recording was active - which in this case is the message m3 sent at event H in P2 to event D in P1 - so the channel state of C₁₂ is {m3}.

Lastly, the marker message from P2 arrives at P3. Similarly, this is not the first marker P3 has seen, so it:

Stops recording incoming messages on that channel (C₂₃)
Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C₂₃ is {}.

We now have a consistent snapshot of the entire system composed of three process states:

P1 = S₁
P2 = S₂
P3 = S₃

And six channel states:

C₁₂ = {}
C₂₁ = {m3}
C₁₃ = {}
C₃₁ = {}
C₂₃ = {}
C₃₂ = {}

What About the Internal Events Not Recorded in the Process Snapshots?

In the above diagram, events C, D and E do not form part of P1's snapshot recorded in state S₁ because these events had not yet occurred at the time P1 decided to take its snapshot.

Similarly, events J and K do not form part of P3's snapshot recorded in state S₃ because these events had not yet occurred at the time the marker message arrived from P1.

These events will all be recorded the next time a snapshot is taken.

How Does the Entire System Know When the Snapshot Is Complete?

An individual process knows its local snapshot is complete when it has recorded:

Its own internal state, and
The state of all its incoming channels

If it can be shown that the snapshot process terminates for an individual process, and all individual processes use the same snapshot algorithm, then it follows that the snapshot will terminate for all participating processes in the system.

Now we can appreciate the importance of the assumptions listed at the start. The success of this entire algorithm rests on the fact that:

Eventual message delivery is guaranteed, and
Messages never arrive out of order (all channels are FIFO queues), and
Processes do not crash (yeah, right! Again, see lecture 10)

In Chandy & Lamport's original paper they provide a proof that the snapshot process does in fact terminate.

However, determining when the snapshot for the entire system is complete lies outside the rules of the Chandy-Lamport algorithm itself. Management of an entire system snapshot needs to be handled by some external coordinating process that:

Receives all the snapshot data from the individual processes, then
Collates that data to form an overall system snapshot.

Previous	Next
Lecture 7	Lecture 9

Endnotes

1 In this context, a "decentralised algorithm" is one that does not need to be invoked from a special coordinating process; any process in the system can act as the initiator. A beneficial side-effect of this is that if two processes simultaneously decide to initiate a snapshot, then nothing bad happens.

↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lecture 08.md

Lecture 08.md

Distributed Systems Lecture 8

Lecture Given by Lindsey Kuper on April 15^th, 2020 via YouTube

Rules of the Chandy-Lamport Algorithm

The Initiator Process

Processes Receiving a Marker Message

Scenario 1: Nope, I Haven't Seen a Marker Message Before...

Scenario 2: Yup, I've Already Seen a Marker Message...

The Chandy-Lamport Algorithm in a More Detailed Scenario

A Worked Example

What About the Internal Events Not Recorded in the Process Snapshots?

How Does the Entire System Know When the Snapshot Is Complete?

Files

Lecture 08.md

Latest commit

History

Lecture 08.md

File metadata and controls

Distributed Systems Lecture 8

Lecture Given by Lindsey Kuper on April 15th, 2020 via YouTube

Rules of the Chandy-Lamport Algorithm

The Initiator Process

Processes Receiving a Marker Message

Scenario 1: Nope, I Haven't Seen a Marker Message Before...

Scenario 2: Yup, I've Already Seen a Marker Message...

The Chandy-Lamport Algorithm in a More Detailed Scenario

A Worked Example

What About the Internal Events Not Recorded in the Process Snapshots?

How Does the Entire System Know When the Snapshot Is Complete?

Lecture Given by Lindsey Kuper on April 15^th, 2020 via YouTube