Post by James BardinYes, it was just the amount of code, plus the zfl that confused me,
and being slow in c, it seemed a little daunting.
I'll explain in more detail using some of the in progress docs which
describe two prototype designs (proto2, proto3). Warning, long email
coming up...
--- snippet 1 ---
Here is a basic reliable RPC design for proto2:
[diagram]
+-----------+ +-----------+ +-----------+
| | | | | |
| Client | | Client | | Client |
| | | | | |
+-----------+ +-----------+ +-----------+
| REQ | | REQ | | REQ |
\-----------/ \-----------/ \-----------/
^ ^ ^
| | |
\---------------+---------------/
|
v
/-----------\
| XREP |
+-----------+
| |
| Queue |
| |
+-----------+
| XREP |
\-----------/
^
|
/---------------+---------------\
| | |
v v v
/-----------\ /-----------\ /-----------\
| XREQ | | XREQ | | XREQ |
+-----------+ +-----------+ +-----------+
| | | | | |
| Server | | Server | | Server |
| | | | | |
+-----------+ +-----------+ +-----------+
Figure # - Reliable Request-Reply
[/diagram]
## Client design
The client connects to the queue and sends requests in a synchronous
fashion. If it does not get a reply within a certain time (1 second),
it reports an error, and retries. It will retry three times, then
exit with a final message. It uses a REQ socket and zmq_poll.
## Server design
The server connects to the queue and uses the [least-recently used
routing][lru] design, i.e. connects an XREQ socket to the queue's XREP
socket and signals when ready for a new task. The server will
randomly simulate two problems when it receives a task:
1. A crash and restart while processing a request, i.e. close its
socket, block for 5 seconds, reopen its socket and restart.
2. A temporary busy wait, i.e. sleep 1 second then continue as normal.
[lru]: http://zguide.zeromq.org/chapter:all#toc46
When waiting for work, the server will send a heartbeat message (which
is an empty message) to the queue each second.
## Queue design
The queue binds to a frontend and backend socket and handles requests
and replies asynchronously on these using the LRU design. It manages
two lists of servers:
* Servers that are ready for work.
* Servers that are disabled.
It also manages a task queue.
The queue polls all sockets for input and then processes all incoming
messages. It queues tasks and distributes them to workers that are
alive. Any replies from workers are sent back to their original
clients, unless the worker is disabled, in which case the reply is
dropped.
Idle workers must signal that they are alive with a ready message or a
heartbeat, or they will be marked as disabled until they send a
message again. This is to detect blocked and disconnected workers
(since 0MQ does not report disconnections).
The queue detects a disabled worker in two ways: heartbeating, as
explained, and timeouts on request processing. If a reply does not
come back within (e.g.) 10ms, the queue marks the worker as disabled,
and retries with the next worker.
--- end snippet ---
This first design delegates a lot of the work to a queue device, which
makes client apps simpler. However it introduces an extra box on the
network, which we can eliminate if we are prepared to move more code
into the client edge.
--- snippet 2 ---
## Issues With this Design
* Why does the client RPC layer need a queue? It cannot, as designed,
have more than one request in flight at a time.
## Overview & Goals
This prototype continues the work done in proto2, turning it into a
reusable RPC layer built as two ZFL classes (zfl_rpcd, zfl_rpc). This
RPC layer exchanges zfl_msg structures (multipart messages) with no
attention to data encoding or decoding. It implements the retrying,
heartbeating, and failover mechanisms from proto2.
## RPC layer
Proto2 designs the RPC layer partially in the queue and partially in
the client, with complementary functionality in the server. In proto3
we move the queue's reliability logic into the client. This lets us
connect clients directly to servers:
[diagram]
+-----------+
| |
| Client |
| |
+-----------+
| RPC layer |
\-----+-----/
^
|
|
/---------------+---------------\
| | |
| | |
v v v
/-----------\ /-----------\ /-----------\
| | | | | |
| Server | | Server | | Server |
| | | | | |
+-----------+ +-----------+ +-----------+
Figure # - 1-hop RPC
[/diagram]
Or connect clients to queues (acting as servers), and queues (acting
as clients) to servers:
[diagram]
+-----------+
| |
| Client |
| |
+-----------+
| RPC layer |
\-----------/
^
|
|
/---------------+---------------\
| | |
| | |
v v v
/-----------\ /-----------\ /-----------\
| | | | | |
| Queue | | Queue | | Queue |
| | | | | |
+-----------+ +-----------+ +-----------+
| RPC layer | | RPC layer | | RPC layer |
\-----------/ \-----------/ \-----------/
: ^ :
: | :
|
/---------------+---------------\
| | |
| | |
v v v
/-----------\ /-----------\ /-----------\
| | | | | |
| Server | | Server | | Server |
| | | | | |
+-----------+ +-----------+ +-----------+
Figure # - N-hop RPC
[/diagram]
In fact there are two distinct RPC layers, one for clients and one for
servers. The RPC client layer would be an *internal queue device*.
That is, an application thread that uses the same model as the queue
from proto2. The RPC server layer would be an internal device that
implements the request-reply pattern and heartbeating.
## RPC Client Layer
### Overall Architecture
The RPC client layer handles a single client thread synchronously, so
does no queuing. It manages a set of servers using a least-recently
used model that combines failover with load-balancing. This assumes
that all servers are equivalent and stateless, i.e. pure "workers" in
the workload distribution sense.
The RPC client layer *could* be tuned or adapted to always use a
specific server first, allowing a primary/secondary backup scheme.
However proto3 does not do this.
We connect the single client application thread to the RPC client
thread using inproc:// socket, and we connect the RPC client thread to
the servers over TCP:
[diagram]
+-----------+
| |
| Client |
| |
+-----------+
| REQ |
\-----------/
^
| inproc://something
v
/-----------\
| REP |
+-----------+
| |
| LRU & RPC |
| |
+-----------+
| XREP |
\-----------/
^
| tcp://something
v
Figure # - RPC Client Layer
[/diagram]
Notes:
* The RPC client thread uses a REP socket to talk to client threads
since it talks to exactly one client in this prototype.
* The RPC client thread would connect outwards to servers, to keep the
'client server' semantics clear (clients connect to servers). This is
different from proto2, where servers connect to clients.
### Technical Implementation
As a technical implementation we'd use a zfl_rpc class that works as follows:
// Create thread and bind to inproc endpoint
zfl_rpc_t *rpc;
rpc = zfl_rpc_new (context);
// Connect to three servers
zfl_rpc_connect (rpc, "tpc://192.168.0.55:6061");
zfl_rpc_connect (rpc, "tpc://192.168.0.56:6061");
zfl_rpc_connect (rpc, "tpc://192.168.0.57:6061");
// Format request message as zfl_msg object
zfl_msg_t *request = zfl_msg_new ();
...
// Send message (destroy after sending) and return reply
zfl_msg_t *reply = zfl_rpc_send (rpc, &request);
if (!reply)
printf ("No service available\n");
// End RPC client thread
zfl_rpc_destroy (&rpc);
Notes:
* The server names would in practice be resolved via a name service (proto4).
* The client application talks to the RPC client thread using inproc
sockets whose names are generated automatically.
### Configuration
We will use the zfl_config layer, as used in proto1, for configuring
the RPC client layer:
// Load configuration from somewhere
zfl_config_t config = zfl_config_new (...);
// Configure RPC layer
zfl_rpc_configure (rpc, config);
## RPC Server Layer
### Overall Architecture
The RPC server layer would work much like the client layer except that
it implements the server side of proto2 (heartbeating and LRU ready
signaling):
[diagram]
^
| tcp://something
v
/-----------\
| XREP |
+-----------+
| |
| Queue |
| |
+-----------+
| REQ |
\-----------/
^
| inproc://something
v
/-----------\
| REP |
+-----------+
| |
| Server |
| |
+-----------+
Figure # - RPC Server Layer
[/diagram]
Note that clients and servers connect N-to-N, without (needing)
intermediate devices. The server queue stores requests from multiple
clients, if necessary, feeding them to the server application
one-by-one. Each client can have at most one queued request.
### Technical Implementation
Here is a mockup of the zfl_rpcd class interface:
// Create thread and bind to inproc endpoint
zfl_rpcd_t *rpcd;
rpcd = zfl_rpcd_new (context);
// Wait for request
zfl_msg_t *request = zfl_rpcd_recv (rpcd);
// Process request and prepare reply
...
zfl_msg_destroy (&request);
// Send reply (destroy after sending)
zfl_rpc_send (rpc, &reply);
// End RPC server thread
zfl_rpcd_destroy (&rpcd);
### Handshaking at Startup
We must use XREP-to-XREP sockets because we want to connect N clients
to N servers without (necessarily) an intermediary queue device.
In an XREP-to-XREP socket connection, one side of the connection must
know the identity of the other. You cannot do xrep-to-xrep flows
between two anonymous sockets since an XREP socket requires an
explicit identity. In practice this means we will need a name service
share the identities of the servers. The client will connect to the
server, then send it a message using the server's known identity as
address, and then the server can respond to the client.
In this prototype we'll use fixed, hardcoded identities for the
servers. We'll develop the name service in a later prototype.
## General Retry Mechanism
The retry mechanism is as follows:
* If there is a single server, the client will wait a certain period
(timeout) for the server to respond.
* If the server does not respond within the timeout, the client will
retry a number of times (retries).
* If there are multiple servers, the client will wait on each server
with the timeout, but will not retry the same server twice.
--- end snippet ---
And this is basically what turned into zfl_rpc and zfl_rpcd.
-Pieter