SimGrid
3.7
Scalable simulation of distributed systems
|
This is the end of the first big part of this tutorial. At this point, you know pretty much everything about message passing in GRAS. In second big part, you will learn how to describe data to the middleware in order to convey your own structures in messages instead of the predefined scalar types. But for now, it is time to recap what we have seen so far.
In GRAS, you pass message to get remote component to achieve some work for you. In some sense, this is very similar to the good old procedure abstraction: you call some procedure to get it doing some work for you. Looking closer at this good old abstraction, we notice 4 call semantics:
void procedure(void)
this is a procedure accepting no argument and returning no result (type 1).void procedure2(int i)
this is a procedure accepting an argument, but returning no result (type 2).int function(void)
this is a function accepting no argument, but returning a result (type 3).int function(int i)
this is a function accepting an argument, and returning a result (type 4).The whole story of the GRAS message passing subsystem is to allow to reproduce these call semantics in a distributed setting. That being said, We must also note that some messages exchanged using GRAS do not intend to mimic these semantics, but instead to help the syncronisation between distributed processes. When exchanged from peer A to peer B, they don't mean that A requests a service from B, but rather that A gives an information (a signal) to B. It could be for example that A reached a specific point of its computation, and that B can proceed with its own (this syncronisation schema being a simple rendez-vous). These messages are covered in the next part of this recapping (Syncronization messages in GRAS (TODO)).
To return on the call semantics described above, there is a big difference between the types T1 and T2 on one side and the types T3 and T4 on the other side. In the second case, the caller do wait for an answer from the callee side. In a distributed setting, you have to exchange one extra message in that case. That is why T1;T2 (sometimes refered as one-way messages) are treated quite differently in GRAS from T3;T4.
Mimicing the same call semantic in a distributed setting is the goal of the RPC systems. To do so in GRAS, you must do the following actions:
If you want that your messages convey complex datatypes and not only scalar, you have to do it first. Learning how to do so is the subject of the second part of this tutorial. For now, simply observe that this is the same thing than doing a typedef
in your code before using this type:
typedef struct { int a; char b; } *mytype;
This is very similar to forward procedure declarations in your sequential code:
int myfunction(mytype myarg);
More formally it comes down to associating a data type to a given symbol name. For example, in Lesson 9: Exchanging simple data, we specified that the message "kill"
conveyed a double as payload.
Doing so depends on whether you have a one-way message (ie, type 1 or 2) or not. One-way messages are declared with gras_msgtype_declare() while RPC messages are declared with gras_msgtype_declare_rpc()
If the message is intended to be a work request one (and not a syncronization one as detailed below), you then want to attach some code to execute when a process receives the given request. In other words, you want to attach a callback to the message. Of course, you usualy don't want to do so on every nodes, but only on "servers" or "workers" or such. First of all, you need to declare the callback itself. This function that will be executed on request incomming must follow a very specific prototype (the same regardless of the call semantic):
int callback_name(gras_msg_cb_ctx_t context, void *payload)
The first argument (of type gras_msg_cb_ctx_t) is an opaque structure describing the callback context: who sent you the message (which you can find back using gras_msg_cb_ctx_from()), whether it's a one-way call or not, and so on. GRAS functions altering the call (such as gras_msg_rpcreturn(), used to return the result to the caller) require this context as argument.
The second argument is a pointer to where the message payload is stored. In the T1 and T3 semantics (ie, when the message had no payload), this second argument is NULL. If not, the first line of your callback will probably retrieve the payload and store it in a variable of yours. The semantic for this is very systematic, if not elegant: If your payload is of type TOTO, payload
is a pointer to a TOTO variable. So, cast it properly (add (TOTO*)
in front of it), and dereference it (add a star right before the cast). For example:
TOTO myvariable = *(TOTO*) payload;
This becomes even uglier if the conveyed type is a pointer itself, but you must stick to the rule anyway:
int **myvariable = *(int ** *) payload;
If your message is of semantic T3 or T4 (ie, it returns a value to the caller), then you must use the function gras_msg_rpcreturn() to do so. It takes three arguments:
The callback is expected to return 0 if ok, as detailed in Aside: stacking callbacks.
To attach a given callback to a given message, you simply use gras_cb_register(). If you even want to de-register a callback, use gras_cb_unregister().
Again, sending messages depend on the semantic call. If you have a one-way message, you should call gras_msg_send() while RPC calls are sent with gras_msg_rpccall(). The main difference between them is that you have an extra argument for RPC, to specify where to store the result.
It is also possible to call RPC asyncronously: you send the request (using gras_msg_rpc_async_call()), do some other computation. And then later, you wait for the answer of your RPC (using gras_msg_rpc_async_wait()).
The callback is expected to return 0 if everything went well, which is the same semantic than the "main()" function. You can also build stacks of callbacks. It is perfectly valid to register several callbacks to a given message. When a message arrives, it is passed to the firstly-registered callback. If the callback returns 0, it means that it consumed the message, which is discarded. If the callback returns 1 (or any other "true" value), the message is passed to the next callback of the stack, and so on. At the end, if no callback returned 0, an exception is raised.
This mecanism can for example be used to introduce dupplication and replay. You add a callback simply in charge of storing the message in a database, and since it returns 1, the message is then passed to the "real" callback. To be perfectly honest, I never had use of this functionnality myself, but I feel it could be useful in some cases...
One of the parts I'm the most proud of in GRAS is this one: imagine you have a rpc callback executing on a remote server. If this callback raises an exception, it will be propagated on the network back to the client, and revived there. So, the client will get automatically any exception raised by the server. Cool, isn't it? Afterward, simply refer to the host
field of the xbt_ex_t to retrieve the machine on which it was initialy raised.
In case you wonder about the exceptions I'm speaking about (after all, SimGrid is in C ANSI, and there is usually no exception mecanism in C ANSI), you may want to refer to the section Exception support. Note that exception can be troublesome to use (mainly because the compiler won't catch your mistakes on this).
This section covers a point not explicited elsewhere in the documentation. It may be seen as a bit hardcore, and you should probably skip it the first time you read the tutorial.
Back to the main Simgrid Documentation page |
The version of SimGrid documented here is v3.7. Documentation of other versions can be found in their respective archive files (directory doc/html). |
Generated by ![]() |