SimGrid  3.7
Scalable simulation of distributed systems
HOWTO design a GRAS application
HOWTOs

This page tries to give some hints on how to design a GRAS application. The provided model and functionnalities are somehow different from the other existing solutions (nammly, MPI), and this page tries to give you the feeling of the GRAS philosophy. You may also want to (re)read the What is GRAS.

As an example, you may have a look at Example: mutual exclusion with centralized coordinator, which somehow follows the guidelines given here.

Table of content

What is a GRAS application

As explained in What is GRAS, you should see a GRAS application as a distributed service. There is two main parts:

It is naturally possible to not follow this split, and let the users of your service directly sending messages by themselves. Nevertheless, this encapsulation is a good thing because distributed computing is a bit hard to achieve. You may thus not want to force your users to go into this tricky part. Instead, shield them with a regular C API.

If you are the only user of the code you develop, I'd advice you to still follow this approach. But do as you prefer, of course.

The design phase

Specify the user API

These will be the entry points of your system, and you should think twice about their syntax and semantic.

Identify the types of processes in your application

Note that we are not speaking about the amount of processes in your application, but rather on the type of processes. The amount of distinct roles.

Depending on your application, there may be only one type (for example in a peer-to-peer system), two types of processes (for example in a client/server system), three types of processes (for example if you add a forwarder to a client/server system), or maybe much more.

Design an application-level protocol

During this phase, you should try to sketch your application before implementing it. You should sketch temporal execution of the application. I personnaly prefer to do so graphically on a blackboard, drawing comic strips of the several steps of the algorithm under specific conditions. Ask yourself questions as "What gets exchanged between who when the user request this?", "How to react when this condition occures?" and "What is the initial state of the system?"

Here are some important points to identify in your future application during the design phase:

Message types

The most important point to design a GRAS protocol is to identify the different type of messages that can be exchanged in your system, and the datatype of their payload. Doing so, remember that GRAS messages are strongly typed. They are formed by a so-called message type (identified by its name) and a type description describing what kind of data you can include in the message. The message type is a qualitative information "someone asked you to run that function" while the payload is the quantitative information (the arguments to the function).

A rule of thumb: A given message type have only one semantic meaning, and one syntaxic payload datatype. If you don't do so, you have a problem.

Message callbacks

Also sketch what your processes should do when they get the given message. These action will constitute the message callbacks during the implementation. But you should first go through a design phase of the application level-protocol, were these action remain simplistic. Also identify the process-wide globals needed to write the callbacks (even if it can be later).

The implementation phase

Write your callbacks

Most of the time, you will find tons of specific cases you didn't thought about in the design phase, and going from a code sketch to an actual implementation is often quite difficult.

You will probably need some process-wide globals to store some state of your processes. See Lesson 5: Using globals in processes on need.

Implement your user-API

If your protocol was wisely designed, writting the user API should be quite easy.

Implement the initialization code

You must write some initialization code for all the process kinds of your application.

You may want to turn your code into a clean GRAS library by using the adhoc mecanism, but unfortunately, it's not documented yet. Worse, there is two such systems for now, one of them being deprecated. Check the Peer Management implementation to see how to use the new system. Naturally, this is only optional.

The test phase

Testing software is important to check that it works. But in a distributed setting, this is mandatory.

Test it on the simulator

In addition to all the good old bugs you find in a classical C program (memory corruption, logical errors, etc), distributed computing introduce subtil ways to get things wrong. Actually, this is why I wrote GRAS: to get able to test my code on simulator before using it in distributed settings.

The simulator is a nicely controled environment. You can rerun the same code in the exact same condition as often as you want to reproduce a bug. You can run the simulator in a debugger such as valgrind or gdb to debug all processes at once. You can cheat and have global variables (to check global invariants or such). You can test your code on platform you don't have access to in the real life, or in condition which almost never occur (such as having 80% of the nodes stopping at the same time).

Use all these possibilities, and test your code throughfully in the simulator. You may find it boring, but when you'll go to the next steps, you'll dream of going back into the comfort of the simulator.

Test it locally

Once it works in the simulator, test it in real-life, but will all processes on the same host. Even if the GRAS API is made to make the simulator as close to the real life as possible, you often discover new and exciting bugs when going outside of the simulator. Fix'em all before going further.

If you find you've found a discrepency between both implementation of GRAS, please drop us a mail. That would be a bug.

Test it in a distributed setting

You are now ready for the big jump...


Back to the main Simgrid Documentation page The version of SimGrid documented here is v3.7.
Documentation of other versions can be found in their respective archive files (directory doc/html).
Generated by doxygen