[SOLVED] C algorithm parallel concurrency operating system database graph software network theory Impossibility of Distributed Consensuswith One Faulty Process

$25

File Name: C_algorithm_parallel_concurrency_operating_system_database_graph_software_network_theory_Impossibility_of_Distributed_Consensuswith_One_Faulty_Process.zip
File Size: 1413 KB

5/5 - (1 vote)

Impossibility of Distributed Consensuswith One Faulty Process
MICHAEL J. FISCHER
Yale University, New Haven, Connecticut NANCY A. LYNCH
Massachusetts Institute of Technology, Cambridge, Massachusetts AND
MICHAEL S. PATERSON
University of Warwick, Coventry, England
Abstract. The consensusproblem involves an asynchronous system of processes,some of which may be unreliable. The problem is for the reliable processesto agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the Byzantine Generals problem.
Categories and Subject Descriptors: C.2.2 ComputerCommunication Networks: Network Protocols protocol architecture; C.2.4 ComputerCommunication Networks: Distributed Systemsdistributed applications; distributed databases; network operating systems; C.4 Performance of Systems: Reliabil ity, Availability, and Serviceability; F. 1.2 Computation by Abstract Devices: Modes of Computation parallelism; H.2.4 Database Management: Systemsdistributed systems; transaction processing
General Terms: Algorithms, Reliability, Theory
Additional Key Words and Phrases: Agreement problem, asynchronous system, Byzantine Generals problem, commit problem, consensusproblem, distributed computing, fault tolerance, impossibility proof, reliability
I. Introduction
The problemofreachingagreementamongremoteprocessesisoneofthemost fundamental problems in distributed computing and is at the core of many
Editing of this paper was performed by guest editor S. L. Graham. The EditorinChief of JACM did not participate in the processing of the paper.
This work was supported in part by the OBice of Naval Research under Contract NO001482KO154, by the Office of Army Research under Contract DAAG2979C0155, and by the National Science Foundation under Grants MCS7924370 and MCS8 116678.
This work was originally presented at the 2nd ACM Symposium on Principles of Database Systems, March 1983.
Authors present addresses:M. J. Fischer, Department of Computer Science, Yale University, P.O. Box 2158,Yale Station, New Haven, CT 06520; N. A. Lynch, Laboratory for Computer Science,Massachu setts Institute of Technology, 545 Technology Square, Cambridge, MA 02 139; M. S. Paterson, Depart ment of Computer Science, University of Warwick, Coventry CV4 7AL, England
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee andor specific permission.
0 1985 ACM 0004541 l8504000374 00.75
Journal of the Assccktion for Computing Machinery, Vol. 32, No. 2, April 1985, pp. 374382.

Impossibility of Distributed Consensuswith OneFaulty Process 375
algorithms for distributed data processing, distributed file management, and fault tolerant distributed applications.
A wellknown form of the problem is the transaction commit problem, which arises in distributed database systems 6, 13, 1517, 21241 see also G. LeLann, private communication, quoted in 151.The problem is for all the data manager processesthat have participated in the processing of a particular transaction to agree on whether to install the transactions results in the database or to discard them. The latter action might be necessary, for example, if some data managers were, for any reason, unable to carry out the required transaction processing. Whatever decision is made, all data managers must make the same decision in order to preserve the consistency of the database.
Reaching the type of agreement needed for the commit problem is straightfor ward if the participating processesand the network are completely reliable. How ever, real systemsare subject to a number of possible faults, such asprocesscrashes, network partitioning, and lost, distorted, or duplicated messages.One can even consider more Byzantine types of failure 5, 7, 8, 11, 14, 18, 191in which faulty processesmight go completely haywire, perhaps even sending messagesaccording to some malevolent plan. One therefore wants an agreement protocol that is as reliable as possible in the presence of such faults. Of course, any protocol can be overwhelmed by faults that are too frequent or too severe, so the best that one can hope for is a protocol that is tolerant to a prescribed number of expected faults.
In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assumethat the messagesystemis reliable
it delivers all messagescorrectly and exactly once. Nevertheless, even with these assumptions,thestoppingofasingleprocessataninopportune timecancauseany distributed commit protocol to fail to reach agreement. Thus, this important problem has no robust solution without further assumptions about the computing environment or still greater restrictions on the kind of failures to be tolerated!
Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speedsof processesor about the delay time in delivering a message.We also assumethat processesdo not have accessto synchronized clocks, so algorithms based on timeouts, for example, cannot be used. In particular, the solutions in 6 are not applicable. Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died stopped entirely or is just running very slowly.
Our impossibility result applies to even a very weak form of the consensus problem.Assumethateveryprocessstartswithaninitial valuein0,11.Anonfaulty process decides on a value in 0, 1 by entering an appropriate decision state. All nonfaulty processesthat make a decision are required to choose the same value. For the purpose of the impossibility proof, we require only that some process eventually make a decision. Of course, any algorithm of interest would require that all nonfaulty processesmake a decision. The trivial solution in which, say, 0 is always chosen is ruled out by stipulating that both 0 and 1 are possible decision values, although perhaps for different initial conligurations.
Oursystemmodelisratherstrongsoastomakeourimpossibility proofaswidely applicable as possible. Processesare modeled as automata with possibly infinitely many statesthat communicate by meansof messagesI.n oneatomic step,aprocess can attempt to receive a message,perform local computation on the basis of

376 M. J. FISCHER, N. A. LYNCH, AND M. S. PATERSON
whether or not a messagewas delivered to it and if so, which one, and send an arbitrary but finite set of messagesto other processes.In particular, an atomic broadcast capability is assumed,soa processcan sendthe samemessagein one stepto all other processeswith the knowledgethat if any nonfaulty processreceives the message,then all the nonfaulty processeswill. Every messageis eventually deliveredaslongasthedestinationprocessmakesinfinitely manyattemptsto receive,but messagescan be delayed, arbitrarily long, and delivered out of order.
The asynchronouscommit protocols in current useall seemto havea window of vulnerability an interval of time during the execution of the algorithm in which the delay or inaccessibility of a single processcan causethe entire algorithm to wait indefinitely. It follows from our impossibility result that every commit protocolhassuchawindow, confirmingawidelybelievedtenetinthefolklore.
2. ConsensusProtocols
A consensus protocol P is an asynchronous system of N processesN I 2. Eachprocessp hasa onebit input register x,, an output register ypwith valuesin b, 0, 11,and an unbounded amount of internal storage.The valuesin the input and output registers,together with the program counter and internal storage, comprisetheinternal state.Initial statesprescribefixedstartingvaluesforallbut the input register; in particular, the output register starts with value b. The states in which the output register has value 0 or 1 are distinguished as being decision states. p acts deterministically according to a transition function. The transition function cannot change the value of the output register once the process has reached a decision state; that is, the output register is writeonce. The entire systemP is specifiedby the transition functions associatedwith eachof the processes and the initial valuesof the input registers.
Processescommunicate by sending each other messagesA. message is a pair p, m, where p is the name of the destination processand m is a messagevalue from a fixed universe M. The message system maintains a multiset, called the messagebuffer, of messagesthat havebeensentbut not yet delivered.It supports two abstractoperations:
sendp, m: Placesp, m in the messagebuffer;
receivep: Deletessomemessagep, m from the buffer and returns m, in which
casewe say p, m is delivered, or returns the special null marker 0 and leavesthe buffer unchanged.
Thus, the messagesystemacts nondeterministically, subject only to the condition that if receivep is performed infinitely many times, then every messagep, m in the messagebuffer is eventually delivered. In particular, the messagesystem is allowed to return 0 a finite number of times in responseto receivep, eventhough a messagep, m is present in the buffer.
A configuration of the system consists of the internal state of each process, together with the contents of the messagebuffer. An initial configuration is one in which eachprocessstartsat an initial stateand the messagebuffer is empty.
A step takes one configuration to another and consistsof a primitive step by a single processp. Let C be a configuration. The step occurs in two phases.First, receivep is performed on the messagebuffer in C to obtain a value m E M u 0. Then, depending on ps internal statein C and on m, p entersa new internal state and sends a finite set of messagesto other processes.Since processesare deterministic, the step is completely determined by the pair ep, m, which we

Impossibility ofDistributed Consensuswith OneFaulty Process 377
FIGURE1
call an event.This event should be thought of as the receipt of m by p. eC denotes the resulting configuration, and we say that e can be applied to C. Note that the event p, 0 can always be applied to C, so it is always possible for a processto take another step.
A schedule from C is a finite or infinite sequence u of events that can be applied, in turn, starting from C. The associated sequence of steps is called a run. If u is finite, we let aC denote the resulting configuration, which is said to be reachable from C. A configuration reachable from some initial configuration is said to be accessible.Hereafter, all configurations mentioned are assumedto be accessible.
The following lemma expressesa commutativity property of schedules.
LEMMA 1. Suppose that from some configuration C, the schedules ul, 2lead to configurations C1, C2, respectively. If the sets of processestaking steps in c1 and 02, respectively, are disjoint, then u2 can be applied to Cl and 1can be applied to C2,and both lead to the same configuration C,. SeeFigure 1.
PROOF. The result follows at once from the system definition, since u1 and u2 do not interact. Cl
A configuration C has decision value v if some processp is in a decision state with y,,v. A consensus protocol is partially correct if it satisfies two conditions:
1 No accessibleconfiguration has more than one decision value.
2 For each v E 0, I, some accessibleconfiguration has decision value v.
A processp is nonfaulty in a run provided that it takes infinitely many steps,and it is faulty otherwise. A run is admissible provided that at most one process is faulty and that all messagessent to nonfaulty processesare eventually received.
A run is a deciding run provided that some process reaches a decision state in that run. A consensusprotocol P is totally correct in spite of onefault if it is partially correct, and every admissible run is a deciding run. Our main theorem shows that every partially correct protocol for the consensus problem has some admissible run that is not a deciding run.
3. Main Result
THEOREM 1. No consensusprotocol is totally correct in spite of onefault.
PROOF. Assume to the contrary that P is a consensus protocol that is totally correct in spite of one fault. We prove a sequence of lemmas which eventually lead to a contradiction.

378 M. J. FISCHER,N. A. LYNCH, AND M. S. PATERSON
The basic idea is to show circumstances under which the protocol remains forever indecisive. This involves two steps. First, we argue that there is some initial configuration in which the decision is not already predetermined. Second, we construct an admissible run that avoids ever taking a step that would commit the systemtoaparticular decision.
Let C be a configuration and let V be the set of decision values of configurations reachable from C. C is bivalent if 11l2. C is univalent if 1VI1, let us say 0valent or Ivalent according to the corresponding decision value. By the total correctness of P, and the fact that there are always admissible runs, Vf 0. Cl
LEMMA 2. P has a bivalent initial configuration.
PROOF. Assume not. Then P must have both 0valent and 1valent initial configurations by the assumed partial correctness. Let us call two initial configu rations adjacent if they differ only in the initial value x, of a single processp. Any two initial configurations are joined by a chain of initial configurations, each adjacent to the next. Hence, there must exist a 0valent initial configuration CO adjacent to a 1valent initial configuration Ci. Let p be the processin whose initial value they differ.
Now consider some admissible deciding run from COin which processp takes no steps, and let u be the associated schedule. Then u can be applied to C, also, and corresponding configurations in the two runs are identical except for the internal state of process p. It is easily shown that both runs eventually reach the same decision value. If the value is 1, then Co is bivalent; otherwise, Ci is bivalent. Either case contradicts the assumed nonexistence of a bivalent initial configura tion. Cl
LEMMA 3. Let C be a bivalent configuration of P, and let ep, m be an event that is applicable to C. Let ?be the set of configurations reachablefrom C without applying e, and let 9egeE I E E ?and e is applicable to E. Then, 9 contains a bivalent configuration.
PROOF. Since e is applicable to C, then by definition of Z and the fact that messagescan be delayed arbitrarily, e is applicable to every E E ?7.
Now assumethat 9 contains no bivalent configurations, soevery configuration D E 9 is univalent. We proceed to derive a contradiction.
Let Ei be an ivalent configuration reachable from C, i0, 1. Ei exists since C is bivalent. If Ei E Yle, t FieEi E g. Otherwise, e was applied in reaching Ei, and so there exists Fi E .9 from which Ei is reachable. In either case,Fi is ivalent since Fi is not bivalent since Fi E B and 5contains no bivalent configurations and one of Ei and Fi is reachable from the other. Since Fi E 9, i0, 1, B contains both 0valent and 1valent configurations.
Call two configurations neighborsif one results from the other in a single step. By an easy induction, there exist neighbors Co, Ci E 5such that DieCi is ivalent, i0, 1. Without loss of generality, C,eCo where ep, m.
Case 1. Ifpp, then D1e, by Lemma 1. This is impossible, since any successorof a 0valent configuration is 0valent. SeeFigure 2.
Case 2. If pp, then consider any finite deciding run from Co in which p takes no steps.
Let u be the corresponding schedule, and let AaC,,. By Lemma 1, u is applicable to Di, and it leads to an ivalent configuration EiaDi, i0, 1. Also

Impossibility of Distributed Consensus with One Faulty Process 379
FIGURE2
FIGURE3
by Lemma 1, eA and eeAE,. SeeFigure 3. Hence, A is bivalent. But this is impossible since the run to A is deciding by assumption, so A must be univalent.
In each case, we reached a contradiction, so .2contains a bivalent configura tion. 0
Any deciding run from a bivalent initial configuration goes to a univalent configuration, so there must be some single step that goes from a bivalent to a univalent configuration. Such a step determines the eventual decision value. We now show that it is always possible to run the system in a way that avoids such steps,leading to an admissible nondeciding run.
The run is constructed in stages,starting from an initial configuration. We ensure that the run is admissible in the following way. A queue of processesis maintained, initially in an arbitrary order, and the messagebuffer in a configuration is ordered according to the time the messageswere sent, earliest first. Each stageconsists of one or more process steps. The stage ends with the first process in the process queue taking a step in which, if its messagequeue was not empty at the start of the

380 M. J. FISCHER, N. A. LYNCH, AND M. S. PATERSON
stage,its earliest messageis received. This processis then moved to the back of the processqueue. In any infinite sequenceof such stagesevery processtakesinfinitely many steps and receives every messagesent to it. The run is therefore admissible. Our problem, of course, is to do this in such a way as to avoid a decision ever being reached.
Let Co be a bivalent initial configuration whose existence is assured by Lemma 2. Execution begins in CO,and we ensure that every stagebegins from a bivalent configuration. Suppose then that configuration C is bivalent and that process p heads the priority queue. Let m be the earliest messageto p in Cs messagebuffer, if any, and 0 otherwise. Let ep, m. By Lemma 3, there is a bivalent configuration C reachable from C by a schedule in which e is the last event applied. The corresponding sequenceof stepsdefines the stage.
Since each stageends in a bivalent configuration, every stagein the construction of the infinite schedule succeeds.The resulting run is admissible, and no decision is ever reached. It follows that P is not totally correct. Cl
4. Initially Dead Processes
In this section, we exhibit a protocol that solves the consensus problem for N processesas long as a majority of the processesare nonfaulty and no process dies during the execution of the protocol. No process knows in advance, however, which of the processesare initially dead and which are not.
The protocol works in two stages.During the first stage,the processesconstruct a directed graph G with a node corresponding to each process. Every process broadcasts a messagecontaining its process number and then listens for messages from L1 other processes,where LIN121. G has an edge from i to j iffj receives a messagefrom i. Thus, G has indegree L1.
In the second stage, the processesconstruct G the transitive closure of G in the sensethat upon completion of this stage,eachprocessk knows about all of the edgesj, k incident on k in G as well as the initial values of all suchj.
To carry out this stage,each processbroadcasts to all other processesits process number and initial value together with the names of the L1 processesit heard from during the first stage. It then waits until it has received a stage 2 message from every ancestor in G that it knows about. Initially, it knows only about the L 1 processesfrom which it heard directly during the first stage, but it learns about additional ancestors from the stage 2 messagesthat it receives. Waiting continues until such time as all currently knownabout processeshave been heard from.
At this point, each process knows all of its own ancestors and the edges of G incident on them. Using this information, it computes all of the edgesof G incident on each of its ancestors. It then determines which of its ancestors belong to an initial clique of G, that is, a clique with no incoming edges.To do this, it usesthe fact that a node k is in an initial clique iff k is itself an ancestor of every node j that is an ancestor of k. Since every node in G has at ieast L1 predecessors,there can be only one initial clique; it has cardinality at least L, and every process that completes the second stage knows exactly the set of processes comprising it.
Finally, eachprocessmakesadecisionbasedontheinitial valuesoftheprocesses in the initial clique using any agreedupon rule. Since all processesknow the initial values of all members of the initial clique, they all reach the same decision.
The correctness of this protocol proves the following theorem.

Impossibility of Distributed Consensus with One Faulty Process 381
THEOREM 2. There is a partially correct consensusprotocol in which all non faulty processes always reach a decision, provided no processes die during its
execution and a strict majority of the processesare alive initially.
5. Conclusion
We haveshown that a natural and important problem of faulttolerant cooperative computing cannot be solved in a totally asynchronous model of computation. These results do not show that such problems cannot be solved in practice; rather, they point up the need for more refined models of distributed computing that better reflect realistic assumptionsabout processorand communication tim ings, and for lessstringent requirements on the solution to such problems. For example, termination might be required only with probability 1. Subsequentto t h e o r i g i n a l a n n o u n c e m e n t o f t h e s e r e s u l t s1 2 1 ,p r o g r e s sh a s b e e n m a d e a l o n g b o t h of theselines14, 9, 10, 20, 251.
ACKNOWLEDGMENT. The authors would like to thank John Guttag for helpful discussionsduring the initial phaseof this work, and Gene Stark for discussionof the results and a careful reading of the text. They also thank the refereesfor pointing out severalplaceswhere the presentation neededimprovement.
REFERENCES
1. ATTIYA, C., DOLEV, D., AND GIL, J. Asynchronous Byzantine consensus. In Proceedings of the 3rd Annual ACM Symposium on Principles o!Distributed Computing Vancouver, B.C., Canada, Aug. 2729. ACM, New York, 1984, pp. I 19 133.
BENOR,M. Another advantage of free choice: Completely asynchronous agreement protocols. In Proceedings of the 2nd Annual ACM Symposium on Principles ofDistributed Computing Montreal, Quebec, Canada, Aug. 1719. ACM, New York, 1983, pp. 2730.
2.
3.
4.
5.
6.
7.
8. 9.
BRACHA,G. An asynchronous Ln13Jresilient consensusprotocol. In Proceedings ofthe 3rd Annual ACM Symposium on Principles ofDistributed Computing Vancouver, B.C., Canada, Aug. 2729. ACM, New York, 1984,pp. 154162.
BRACHA,G., AND TOUEG, S. Resilient consensus protocols. In Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing Montreal, Quebec, Canada, Aug. 17
19. ACM, New York, 1983, pp. 1226.
DEMILLO, R. A., LYNCH,N. A., ANDMERRITT,M. J. Cryptographic protocols. In Proceedingsof the 14th Annual ACM Symposium on Theory of Computing San Francisco, Calif., May 57. ACM, New York, 1982, pp. 383400.
DLEV, D., ANDSTRONG, H. R. Distributed commit with bounded waiting. In Proceedingsofthe 2nd Annual IEEE Symposium on Reliability in Distributed Software and Database Systems. IEEE, New York, 1982,pp. 5360.
DOLEV, D., AND STRONG,H. R. Polynomial algorithms for multiple processor agreement. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing San Francisco, Calif., May 57. ACM, New York, 1982,pp. 401407.
DLEV, D., FISCHER, M., FOWLER, R., LYNCH,N., AND STRONG, H. R. An efficient algorithm for Byzantine agreement without authentication. InkyControl 52, 3 1983, 257274.
DLEV, D., LYNCH, N., PINTER, S., STARK, E., AND WEIHL, W. Reaching approximate agreement in the presence of faults. In Proceedings of the 3rd Annual IEEE Symposium on Reliability in Distributed Software and Database Systems. IEEE, New York, 1983, pp. 145154.
10. DWORK, C., LYNCH, N., AND STOCKMEYER, L. Consensusin the presenceof partial synchrony. In Proceedings of the 3rd Annual ACM Symposium on Principles ofDistributed Computing Vancou ver, B.C., Canada, Aug. 2729. ACM, New York, 1984, pp. 103l 18.
FISCHER, M., AND LYNCH. N. A lower bound for the time to assureinteractive consistency. In: Proc. Lett. 14, 4 1982, 183186.
FISCHER, M., LYNCH, N., AND PATERSON, M. Impossibility of distributed consensus with one faulty process.In Proceedingsofthe2ndAnnual ACM SIGACTSIGMOD Symposium onPrinciples of Database Systems Atlanta, Ga., Mar. 2123. ACM, New York, 1983, pp. l7.
11. 12.

382 M. J. FISCHER, N. A. LYNCH, AND M. S. PATERSON
13. GARCIAM LINA, H. Elections in a distributed computing system. IEEE Trans. Comput. C31, 1 1982, 4859.
14. LAMPORT, L., SHOSTAK, R., AND PEASE, M. The Byzantine Generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 July 1982, 382401.
15. LAMPSON,B. Replicated Commit. CSL Notebook Entry, Xerox Palo Alto Research Center, Palo Alto, Calif., 1981.
16. LAMPSON, B., AND STURGIS, H. Crash recovery in a distributed data storage system. Manuscript, Xerox Palo Alto Research Center, Palo Alto, Calif., 1979.
17. LINDSAY, B. G., SELINGER, P. G., GALTIERI, C., GRAY, J. N., LORIE, R. A., PRICE, T. G., PUTZOLU, F., TRAIGER, I. L., AND WADE, B. W. Notes on distributed databases. IBM Res. Rep. RJ2571, IBM ResearchDivision, SanJose,Calif., 1979.
18. LYNCH, N., FISCHER, M., AND FOWLER, R. A simple and efficient Byzantine Generals algorithm.
In Proceedings of the 2nd Annual IEEE Symposium on Reliability in Distributed Software and DatabaseSystems.IEEE,NewYork, 1982,pp.4652.
19. PEASE, M., SHOSTAK, R., AND LAMPORT, L. Reaching agreement in the presence of faults. J. ACM 27,2 Apr. 1980,228234.
20. RABIN, M. Randomized Byzantine Generals. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science. IEEE, New York, 1983, pp. 403409.
2 1. REED, D. Naming and synchronization in a decentralized computer system. Ph.D. dissertation, Technical Report MITLCSTR205, MassachusettsInstitute of Technology, Cambridge, Mass.,
1978.
22. ROSENKRANTZ, D. J., STEARNS, R. E., AND LEWIS, P. M., II. System level concurrency control for
distributed database systems. ACM Trans. Database Syst. 3, 2 June 1978, 178198.
23. SKEEN, D. A decentralized termination protocol. In Proceedings of the 2nd Annual IEEE Sym posium on Reliability in Distributed Software and Database Systems. IEEE, New York, 1982, pp.
2732.
24. SKEEN, D., AND STONEBRAKER, M. A formal model of crash recovery in a distributed system.
IEEE Trans. Softw. Engineering SE9, 3 May 1983, 2 19228.
25. TOUEG, S. Randomized Byzantine Agreements. In Proceedings ofthe 3rd Annual ACM Sympo
sium on Principles ofDistributed Computing Vancouver, B.C., Canada, Aug. 2729. ACM, New York, 1984,pp. 163178.
RECEIVED SEPTEMBER 1983; REVISED OCTOBER 1984; ACCEPTED OCTOBER 1984
Journal of the Association for Computing Machinery, Vol. 32, No. 2, April 1985.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C algorithm parallel concurrency operating system database graph software network theory Impossibility of Distributed Consensuswith One Faulty Process
$25