Background and definitions in .NET Drawer Code-128 in .NET Background and definitions

13.2 Background and definitions use none none creation toembed none for none Microsoft Office Word Website propagation may ext none for none end back to the initial state of the computation, losing all the work performed before the failure. In a distributed system, if each participating process takes its checkpoints independently, then the system is susceptible to the domino effect. This approach is called independent or uncoordinated checkpointing.

It is obviously desirable to avoid the domino effect and therefore several techniques have been developed to prevent it. One such technique is coordinated checkpointing where processes coordinate their checkpoints to form a system-wide consistent state. In case of a process failure, the system state can be restored to such a consistent set of checkpoints, preventing the rollback propagation.

Alternatively, communication-induced checkpointing forces each process to take checkpoints based on information piggybacked on the application messages it receives from other processes. Checkpoints are taken such that a system-wide consistent state always exists on stable storage, thereby avoiding the domino effect. The approaches discussed so far implement checkpoint-based rollback recovery, which relies only on checkpoints to achieve fault-tolerance.

Logbased rollback recovery combines checkpointing with logging of nondeterministic events. Log-based rollback recovery relies on the piecewise deterministic (PWD) assumption, which postulates that all non-deterministic events that a process executes can be identified and that the information necessary to replay each event during recovery can be logged in the event s determinant. By logging and replaying the non-deterministic events in their exact original order, a process can deterministically recreate its pre-failure state even if this state has not been checkpointed.

Log-based rollback recovery in general enables a system to recover beyond the most recent set of consistent checkpoints. It is therefore particularly attractive for applications that frequently interact with the outside world, which consists of input and output devices that cannot roll back..

13.2 Background and definitions 13.2.1 System model A distributed syste m consists of a fixed number of processes, P1 , P2 PN , which communicate only through messages. Processes cooperate to execute a distributed application and interact with the outside world by receiving and sending input and output messages, respectively. Figure 13.

1 shows a system consisting of three processes and interactions with the outside world. Rollback-recovery protocols generally make assumptions about the reliability of the inter-process communication. Some protocols assume that the communication subsystem delivers messages reliably, in first-in-first-out (FIFO) order, while other protocols assume that the communication subsystem can.

Checkpointing and rollback recovery Figure 13.1 An example of a distributed system with three processes. Output message Input message Outside world Distributed system P1 m0 P2 m2 P3 m3 m1 m4 lose, duplicate, or reorder messages. The choice between these two assumptions usually affects the complexity of checkpointing and failure recovery. A generic correctness condition for rollback-recovery can be defined as follows [36]: a system recovers correctly if its internal state is consistent with the observable behavior of the system before the failure.

Rollback-recovery protocols therefore must maintain information about the internal interactions among processes and also the external interactions with the outside world.. 13.2.2 A local checkpoint In distributed syst ems, all processes save their local states at certain instants of time. This saved state is known as a local checkpoint. A local checkpoint is a snapshot of the state of the process at a given instance and the event of recording the state of a process is called local checkpointing.

The contents of a checkpoint depend upon the application context and the checkpointing method being used. Depending upon the checkpointing method used, a process may keep several local checkpoints or just a single checkpoint at any time. We assume that a process stores all local checkpoints on the stable storage so that they are available even if the process crashes.

We also assume that a process is able to roll back to any of its existing local checkpoints and thus restore to and restart from the corresponding state. Let Ci k denote the kth local checkpoint at process Pi . Generally, it is assumed that a process Pi takes a checkpoint Ci 0 before it starts execution.

A local checkpoint is shown in the process-line by the symbol ..
Copyright © . All rights reserved.