Saturday, December 31, 2016

RPCs, groups and multicast

This is first entry in the series I mentioned earlier. I've tried to replace the references with links to the actual papers or PhD theses where possible, but some are not available online.

Remote Procedure Call

The idea behind the Remote Procedure Call (RPC) [Birrell 84] is the fact that conventional procedure calls are well known and are a well understood mechanism for the transfer of data and control within a program running on a single processor. When a remote procedure is invoked, the calling process is suspended, any parameters are passed across the network to the node where the server resides, and then the desired procedure is executed. When the procedure finishes, any results are passed back to the calling process, where execution resumes as if returning from a local procedure call. Thus the RPC provides the system or application programmer a level of abstraction above the underlying message stream. Instead of sending and receiving messages, the programmer invokes remote procedures and receives return values.

The figure shows a client and server interacting via a Remote Procedure Call interface. When the client makes the call it is suspended until the server has sent a reply. To prevent the sender being suspended indefinitely the call can have a timeout value associated with it: after this time limit has elapsed the call could be retried or the sender could decide that the receiver has failed. Another method, which does not make use of timeouts in the manner described, instead relies on the sender and receiver transmitting additional probe messages which indicate that they are alive. As long as these messages are acknowledged then the original call can continue to be processed and the sender will continue to wait.

Groups

[ANSA 90][ANSA 91a][Liang 90][Olsen 91] describe the general role of groups in a distributed system. Groups provide a convenient and natural way to structure applications into a set of members cooperating to provide a service. They can be used as a transparent way of providing fault tolerance using replication, and also as a way of dividing up a task to exploit parallelism.

A group is a composite of objects sharing common application semantics as well as the same group identifier (address). Each group is viewed as a single logical entity, without exposing its internal structure and interactions to users. If a user cannot distinguish the interaction with a group from the interaction with a single member of that group, then the group is said to be fully transparent.

Objects are generally grouped together for several reasons: abstracting the common characteristics of group members and the services they provide; encapsulating the internal state and hiding interactions among group members from the clients so as to provide a uniform interface (group interface) to the external world; using groups as building blocks to construct larger system objects. A group may be composed of many objects (which may themselves be groups), but users of the group see only the single group interface. [ANSA 90] refers to such a group as an Interface Group.

An object group is defined to be a collection of objects which are grouped together to provide a service (the notion of an abstract component) and accessible only through the group interface. An object group is composed of one or more group members whose individual object interfaces must conform to that of the group.

Interfaces are types, so that if an interfacex has typeXand an interfacey has type Y, andX conforms to Y, thenx can be used wherey is used. This type conformance criteria is similar to that in Emerald [Black 86]. In the rest of this thesis, we shall assume for simplicity that a given object group is composed of objects which possess identical interfaces (although their internal implementations could be different).

The object group concept allows a service to be distributed transparently among a set of objects. Such a group could then be used to support replication to improve reliability of service (a replica group), or the objects could exploit parallelism by dividing tasks into parallel activities. Without the notion of the object group and the group interface through which all interactions take place, users of the group would have to implement their own protocols to ensure that interactions with the group members occur consistently e.g., to guarantee that each group member sees the same set of update requests.

By examining the different ways in which groups are required by different applications, it is possible to define certain requirements which are imposed on groups and the users of groups (e.g., whether collation of results is necessary from a group used for reliability purposes). [ANSA 9 la] discusses the logical components which constitute a generic group, some of which may not be required by every group for every application. These components are:
  • an arbiter, which controls the order in which messages are seen by group members.
  • a distributor/collator, which collates messages going out of the group, and distributes messages coming into the group.
  • member servers, which are the actual group members to which invocations are directed.
For some applications collation may not be necessary e.g., if it can be guaranteed that all members of a group will always respond with the same result. As we shall see later, if the communication primitives can guarantee certain delivery properties for messages, then arbitration may also not be necessary. In general, all of these components constitute a group. In the rest of this thesis the logical components will not be mentioned explicitly, and the term group member will be used to mean a combination of these components.

Multicast Communication

Conventional RPC communication is a unicast call since it involves one—to—one interaction between a single client and a single server. However, when considering replication it is more natural to consider interactions with replica groups. Group communication is an access transparent way to communicate with the members of such a group. Such group communication is termed multicasting [Cheriton 85][Hughes 86].

Multicast communication schemes allow a client to send a message to multiple receivers simultaneously. The receivers are members of a group which the sender specifies as the destination of the message. A broadcast is the general case of a multicast whereby instead of speci!ring a subset of the receivers in the system every receiver is sent a copy.
Most multicast communication mechanisms are unreliable as they do not guarantee that delivery of a given message will occur even if the receiver is functioning correctly (e.g., the underlying communication medium could lose a message). When considering the interaction of client and replica group (or even replica group to replica group communication) such unreliable delivery can cause problems in maintaining consistency of state between the individual replicas, complicating the replication control protocol (if one replica fails to receive a given state-modifying request but continues to receive and respond to other requests, this resulting state divergence could result in inconsistencies at the clients). Thus, it is natural to consider such group-to-group communication to be carried out using reliable multicasts, which give certain guarantees about delivery in the presence of failures. These can include the guarantee that if a receiver is operational then the message will be delivered even if the sender fails during transmission, and that the only reason a destination will not receive a message is because that destination has failed. By using a reliable multicast communication protocol many of the problems posed by replicating services can be handled at this low level, simplifying the higher level replica consistency protocol.

Some historical blogging

Over Christmas I was doing some cleaning up of my study and came across a copy of my PhD. Any excuse to stop cleaning, so I took some time to skim through it and thought some of the background research I had included might still be useful today for a new audience. Now although the PhD is available for download, it's not exactly easily searchable or referenced, so the next few entries will try to rectify some of that.