SimGrid
3.9.90
Versatile Simulation of Distributed Systems
|
You are at the right place... Having a look to these the slides of the HPCS'10 tutorial (or to these ancient slides, or to these "obsolete" slides) may give you some insights on what SimGrid can help you to do and what are its limitations. Then you definitely should read the MSG examples.
If you are stuck at any point and if this FAQ cannot help you, please drop us a mail to the user mailing list: simgr. id-u ser@l ists .gfor ge.i nria. fr
It depend on how you define "purpose", I guess ;)
They all allow you to build a prototype of application which you can run within the simulator afterward. They all share the same simulation kernel, which is the core of the SimGrid project. They differ by the way you express your application.
With SimDag, you express your code as a collection of interdependent parallel tasks. So, in this model, applications can be seen as a DAG of tasks. This is the interface of choice for people wanting to port old code designed for SimGrid v1 or v2 to the framework current version.
With MSG, your application is seen as a set of communicating processes, exchanging data by the way of messages and performing computation on their own.
It is sometime convenient to "see" how the agents are behaving. If you like colors, you can use tools/MSG_visualization/colorize.pl
as a filter to your MSG outputs. It works directly with INFO. Beware, INFO() prints on stderr. Do not forget to redirect if you want to filter (e.g. with bash):
./msg_test small_platform.xml small_deployment.xml 2>&1 | ../../tools/MSG_visualization/colorize.pl
We also have a more graphical output. Have a look at section Configuring the tracing subsystem.
Currently bindings on top of MSG are supported for Java, Ruby and Lua. You can find a few documentation about them on the doc page. Note that bindings are released separately from the main dist and so have their own version numbers.
Moreover If you use C++, you should be able to use the SimGrid library as a standard C library and everything should work fine (simply link against this library; recompiling SimGrid with a C++ compiler won't work and it wouldn't help if you could).
For now, we do not feel a real demand for any other language. But if you think there is one, please speak up!
Here is the deal. The whole SimGrid project (MSG, SURF, ...) is meant to be kept as simple and generic as possible. We cannot add functions for everybody's needs when these functions can easily be built from the ones already in the API. Most of the time, it is possible and when it was not possible we always have upgraded the API accordingly. When somebody asks us a question like "How to do that? Is there a function in the API to simply do this?", we're always glad to answer and help. However if we don't need this code for our own need, there is no chance we're going to write it... it's your job! :) The counterpart to our answers is that once you come up with a neat implementation of this feature (task duplication, RPC, thread synchronization, ...), you should send it to us and we will be glad to add it to the distribution. Thus, other people will take advantage of it (and we don't have to answer this question again and again ;).
You'll find in this section a few "Missing In Action" features. Many people have asked about it and we have given hints on how to simply do it with MSG. Feel free to contribute...
Many people have come to ask me a more complex example and each time, they have realized afterward that the basics were in the previous three examples.
Of course they have often been needing more complex functions like MSG_process_suspend(), MSG_process_resume() and MSG_process_isSuspended() (to perform synchronization), or MSG_task_Iprobe() and MSG_process_sleep() (to avoid blocking receptions), or even MSG_process_create() (to design asynchronous communications or computations). But the examples are sufficient to start.
We know. We should add some more examples, but not really some more complex ones... We should add some examples that illustrate some other functionalists (like how to simply encode asynchronous communications, RPC, process migrations, thread synchronization, ...) and we will do it when we will have a little bit more time. We have tried to document the examples so that they are understandable. Tell us if something is not clear and once again feel free to participate! :)
There is no task duplication in MSG. When you create a task, you can process it or send it somewhere else. As soon as a process has sent this task, he doesn't have this task anymore. It's gone. The receiver process has got the task. However, you could decide upon receiving to create a "copy" of a task but you have to handle by yourself the semantic associated to this "duplication".
As we already told, we prefer keeping the API as simple as possible. This kind of feature is rather easy to implement by users and the semantic you associate really depends on people. Having a generic* task duplication mechanism is not that trivial (in particular because of the data field). That is why I would recommend that you write it by yourself even if I can give you advice on how to do it.
You have the following functions to get information about a task: MSG_task_get_name(), MSG_task_get_compute_duration(), MSG_task_get_remaining_computation(), MSG_task_get_data_size(), and MSG_task_get_data().
You could use a dictionary (xbt_dict_t) of dynars (xbt_dynar_t). If you still don't see how to do it, please come back to us...
In the past (version <= 3.4), there was no function to perform asynchronous communications. It could easily be implemented by creating new process when needed though. Since version 3.5, we have introduced the following functions:
We refer you to the description of these functions for more details on their usage as well as to the example section on Asynchronous communications.
You obviously cannot use pthread_mutexes of pthread_conds since we handle every scheduling related decision within SimGrid.
In the past (version <=3.3.4) you could do it by playing with MSG_process_suspend() and MSG_process_resume() or with fake communications (using MSG_task_get(), MSG_task_put() and MSG_task_Iprobe()).
Since version 3.4, you can use classical synchronization structures. See page Synchro stuff or simply check in include/xbt/synchro_core.h.
There is no such thing because its semantic wouldn't be really clear. Of course, it is something about the amount of host throughput, but there is as many definition of "host load" as people asking for this function. First, you have to remember that resource availability may vary over time, which make any load notion harder to define.
It may be instantaneous value or an average one. Moreover it may be only the power of the computer, or may take the background load into account, or may even take the currently running tasks into account. In some SURF models, communications have an influence on computational power. Should it be taken into account too?
First of all, it's near to impossible to predict the load beforehand in the simulator since it depends on too much parameters (background load variation, bandwidth sharing algorithmic complexity) some of them even being not known beforehand (other task starting at the same time). So, getting this information is really hard (just like in real life). It's not just that we want MSG to be as painful as real life. But as it is in some way realistic, we face some of the same problems as we would face in real life.
How would you do it for real? The most common option is to use something like NWS that performs active probes. The best solution is probably to do the same within MSG, as in next code snippet. It is very close from what you would have to do out of the simulator, and thus gives you information that you could also get in real settings to not hinder the realism of your simulation.
double get_host_load() { m_task_t task = MSG_task_create("test", 0.001, 0, NULL); double date = MSG_get_clock(); MSG_task_execute(task); date = MSG_get_clock() - date; MSG_task_destroy(task); return (0.001/date); }
Of course, it may not match your personal definition of "host load". In this case, please detail what you mean on the mailing list, and we will extend this FAQ section to fit your taste if possible.
Communications are synchronous and thus if you simply get the time before and after a communication, you'll only get the transmission time and the time spent to really communicate (it will also take into account the time spent waiting for the other party to be ready). However, getting the real communication time is not really hard either. The following solution is a good starting point.
int sender() { m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size, calloc(1,sizeof(double))); *((double*) task->data) = MSG_get_clock(); MSG_task_put(task, slaves[i % slaves_count], PORT_22); XBT_INFO("Send completed"); return 0; } int receiver() { m_task_t task = NULL; double time1,time2; time1 = MSG_get_clock(); a = MSG_task_get(&(task), PORT_22); time2 = MSG_get_clock(); if(time1<*((double *)task->data)) time1 = *((double *) task->data); XBT_INFO("Communication time : \"%f\" ", time2-time1); free(task->data); MSG_task_destroy(task); return 0; }
A classic question of SimDag newcomers is about how to express a communication delay between tasks. The thing is that in SimDag, both computation and communication are seen as tasks. So, if you want to model a data dependency between two DAG tasks t1 and t2, you have to create 3 SD_tasks: t1, t2 and c and add dependencies in the following way:
SD_task_dependency_add(NULL, NULL, t1, c); SD_task_dependency_add(NULL, NULL, c, t2);
This way task t2 cannot start before the termination of communication c which in turn cannot start before t1 ends.
When creating task c, you have to associate an amount of data (in bytes) corresponding to what has to be sent by t1 to t2.
Finally to schedule the communication task c, you have to build a list comprising the workstations on which t1 and t2 are scheduled (w1 and w2 for example) and build a communication matrix that should look like [0;amount ; 0; 0].
Distributed is somehow "contagious". If you start making distributed decisions, there is no way to handle DAGs directly anymore (unless I am missing something). You have to encode your DAGs in term of communicating process to make the whole scheduling process distributed. Here is an example of how you could do that. Assume T1 has to be done before T2.
int your_agent(int argc, char *argv[] { ... T1 = MSG_task_create(...); T2 = MSG_task_create(...); ... while(1) { ... if(cond) MSG_task_execute(T1); ... if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood)) MSG_task_execute(T2) else { /* do something else */ } } }
If you decide that the distributed part is not that much important and that DAG is really the level of abstraction you want to work with, then you should give a try to SimDag.
Here are a few tricks you can apply if you want to increase the amount of processes in your simulations.
–with-context
option of the ./configure
script allows you to choose between UNIX98 contexts (–with-context=ucontext
) and the pthread version (–with-context=pthread
). The default value is ucontext when the script detect a working UNIX98 context implementation. On Windows boxes, the provided value is discarded and an adapted version is picked up.examples/gras/mutual_exclusion/simple_token/make_deployment.pl
, which you may want to adapt to your case. You could also think about hijacking the SURFXML parser (have look at Bypassing the XML parser with your own C functions).STACK_SIZE
define in src/xbt/xbt_context_sysv.c
, which is 128kb by default. Reduce this as much as you can, but be warned that if this value is too low, you'll get a segfault. The token ring example, which is quite simple, runs with 40kb stacks.No, there is no native support for batch schedulers and none is planned because this is a very specific need (and doing it in a generic way is thus very hard). However some people have implemented their own batch schedulers. Vincent Garonne wrote one during his PhD and put his code in the contrib directory of our SVN so that other can keep working on it. You may find inspiring ideas in it.
Actually, it depends on whether you want to checkpoint the simulation, or to simulate checkpoints.
The first one could help if your simulation is a long standing process you want to keep running even on hardware issues. It could also help to rewind the simulation by jumping sometimes on an old checkpoint to cancel recent calculations.
Unfortunately, such thing will probably never exist in SG. One would have to duplicate all data structures because doing a rewind at the simulator level is very very hard (not talking about the malloc free operations that might have been done in between). Instead, you may be interested in the Libckpt library (http://www.cs.utk.edu/~plank/plank/www/libckpt.html). This is the checkpointing solution used in the condor project, for example. It makes it easy to create checkpoints (at the OS level, creating something like core files), and rerunning them on need.
If you want to simulate checkpoints instead, it means that you want the state of an executing task (in particular, the progress made towards completion) to be saved somewhere. So if a host (and the task executing on it) fails (cf. MSG_HOST_FAILURE), then the task can be restarted from the last checkpoint.
Actually, such a thing does not exist in SimGrid either, but it's just because we don't think it is fundamental and it may be done in the user code at relatively low cost. You could for example use a watcher that periodically get the remaining amount of things to do (using MSG_task_get_remaining_computation()), or fragment the task in smaller subtasks.
There are several little examples in the archive, in the examples/msg directory. From time to time, we are asked for other files, but we don't have much at hand right now.
You should refer to the Platform Description Archive (http://pda.gforge.inria.fr) project to see the other platform file we have available, as well as the Simulacrum simulator, meant to generate SimGrid platforms using all classical generation algorithms.
We are working on a project called ALNeM (Application-Level Network Mapper) which goal is to automatically discover the topology of an existing network. Its output will be a platform description file following the SimGrid syntax, so everybody will get the ability to map their own lab network (and contribute them to the catalog project). This tool is not ready yet, but it move quite fast forward. Just stay tuned.
The third possibility to get a platform file (after manual or automatic mapping of real platforms) is to generate synthetic platforms. Getting a realistic result is not a trivial task, and moreover, nobody is really able to define what "realistic" means when speaking of topology files. You can find some more thoughts on this topic in these slides.
If you are looking for an actual tool, there we have a little tool to annotate Tiers-generated topologies. This perl-script is in tools/platform_generation/
directory of the SVN. Dinda et Al. released a very comparable tool, and called it GridG.
The specified computing power will be available to up to 6 sequential tasks without sharing. If more tasks are placed on this host, the resource will be shared accordingly. For example, if you schedule 12 tasks on the host, each will get half of the computing power. Please note that although sound, this model were never scientifically assessed. Please keep this fact in mind when using it.
The best way to model the resouce power using a random variable is to use an availability trace that is directed by a probability distribution. This can be done using the function tmgr_trace_generator_value() below. The date and value generators is created with one of tmgr_event_generator_new_uniform(), tmgr_event_generator_new_exponential() or tmgr_event_generator_new_weibull() (if you need other generators, adding them to src/surf/trace_mgr.c should be quite trivial and your patch will be welcomed). Once your trace is created, you have to connect it to the resource with the function sg_platf_new_trace_connect().
That the process is very similar if you want to model the resource availability with a random variable (deciding whether it's on/off instead of deciding its speed) using the function tmgr_trace_generator_state() or tmgr_trace_generator_avail_unavail() instead of tmgr_trace_generator_value().
Unfortunately, all this is currently lacking a proper documentation, and there is even no proper example of use. You'll thus have to check the header file include/simgrid/platf.h and experiment a bit by yourself. The following code should be a good starting point, and contributing a little clean example would be a good way to help the SimGrid project.
I guess that you want to read the ChangeLog file, that always contains all the information that could be important to the users during the upgrade. Actually, you may want to read it (alongside with the NEWS file that highlights the most important changes) even before you upgrade your copy of SimGrid, too.
Backward compatibility is very important to us, as we want to provide a scientific tool allowing to evaluate the code you write in several years, too. That being said, we sometimes change the interface to make them more usable to the users. When we do so, we always keep the old interface as DEPRECATED. If you still want to use them, you want to define the SIMGRID_DEPRECATED preprocessor symbol before loading the SimGrid files:
#define SIMGRID_DEPRECATED #include <msg/msg.h>
We know only one reason for the configure to fail:
If you experience other kind of issue, please get in touch with us. We are always interested in improving our portability to new systems.
Don't assume we never run this target, because we do. Check http://cdash.inria.fr/CDash/index.php?project=Simgrid (click on previous if there is no result for today: results are produced only by 11am, French time) and https://buildd.debian.org/status/logs.php?pkg=simgrid if you don't believe us.
If it's failing on your machine in a way not experienced by the autobuilders above, please drop us a mail on the mailing list so that we can check it out. Make sure to read So I've found a bug in SimGrid. How to report it? before you do so.
This is because you are using the log mecanism, but you didn't created any default category in this file. You should refer to Logging support for all the details, but you simply forgot to call one of XBT_LOG_NEW_DEFAULT_CATEGORY() or XBT_LOG_NEW_DEFAULT_SUBCATEGORY().
This indicates that one of the library SimGrid depends on (libpthread here) was missing on the linking command line. Dependencies of libsimgrid are expressed directly in the dynamic library, so it's quite impossible that you see this message when doing dynamic linking.
If you compile your code statically (and if you use a pthread version of SimGrid – see Increasing the amount of simulated processes), you must absolutely specify -lpthread
on the linker command line. As usual, this should come after -lsimgrid
on this command line.
Since version 3.7 all the m_channel_t mecanism is deprecated. So functions about this mecanism may get removed in future releases.
List of functions:
If you want them you have to compile Simgrid v3.7 with option "-Denable_msg_deprecated=ON". Using them should print warning to inform what new function you have to use.
This is because your platform file is too big for the parser.
Actually, the message comes directly from FleXML, the technology on top of which the parser is built. FleXML has the bad idea of fetching the whole document in memory before parsing it. And moreover, the memory buffer size must be determined at compilation time.
We use a value which seems big enough for our need without bloating the simulators footprints. But of course your mileage may vary. In this case, just edit src/surf/surfxml.l modify the definition of FLEXML_BUFFERSTACKSIZE. E.g.
#define FLEXML_BUFFERSTACKSIZE 1000000000
Then recompile and everything should be fine, provided that your version of Flex is recent enough (>= 2.5.31). If not the compilation process should warn you.
A while ago, we worked on FleXML to reduce a bit its memory consumption, but these issues remain. There is two things we should do:
These are changes to FleXML itself, not SimGrid. But since we kinda hijacked the development of FleXML, I can grant you that any patches would be really welcome and quickly integrated.
Update: A new version of FleXML (1.7) was released. Most of the work was done by William Dowling, who use it in his own work. The good point is that it now use a dynamic buffer, and that the memory usage was greatly improved. The downside is that William also changed some things internally, and it breaks the hack we devised to bypass the parser, as explained in Bypassing the XML parser with your own C functions. Indeed, this is not a classical usage of the parser, and Will didn't imagine that we may have used (and even documented) such a crude usage of FleXML. So, we now have to repair the bypassing functionality to use the latest FleXML version and fix the memory usage in SimGrid.
The format of the XML platform description files is sometimes improved. For example, we decided to change the units used in SimGrid from MBytes, MFlops and seconds to Bytes, Flops and seconds to ease people exchanging small messages. We also reworked the route descriptions to allow more compact descriptions.
That is why the XML files are versionned using the 'version' attribute of the root tag. Currently, it should read:
<platform version="2">
If your files are too old, you can use the simgrid_update_xml.pl script which can be found in the tools directory of the archive.
If you don't, you really should use valgrind to debug your code, it's almost magic.
This is when valgrind starts complaining about longjmp things, just like:
==21434== Conditional jump or move depends on uninitialised value(s) ==21434== at 0x420DBE5: longjmp (longjmp.c:33) ==21434== ==21434== Use of uninitialised value of size 4 ==21434== at 0x420DC3A: __longjmp (__longjmp.S:48)
This is the sign that you didn't used the exception mecanism well. Most probably, you have a return;
somewhere within a TRY{}
block. This is evil, and you must not do this. Did you read the section about Exception support??
It may happen that valgrind, the memory debugger beloved by any decent C programmer, spits tons of warnings like the following :
==8414== Conditional jump or move depends on uninitialised value(s) ==8414== at 0x400882D: (within /lib/ld-2.3.6.so) ==8414== by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so) ==8414== by 0x400B105: (within /lib/ld-2.3.6.so) ==8414== by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so) ==8414== by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so) ==8414== by 0x400B105: (within /lib/ld-2.3.6.so) ==8414== by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so) ==8414== by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so) ==8414== by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714) ==8414== by 0x80764C1: xbt_dictelm_get (dict_elm.c:732) ==8414== by 0x8079010: xbt_cfg_register (config.c:208) ==8414== by 0x806821B: MSG_config (msg_config.c:42)
This problem is somewhere in the libc when using the backtraces and there is very few things we can do ourselves to fix it. Instead, here is how to tell valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or create this file on need). Make sure to change the obj line according to your personnal mileage (change 2.3.6 to the actual version you are using, which you can retrieve with a simple "ls /lib/ld*.so").
{ name: Backtrace madness Memcheck:Cond obj:/lib/ld-2.3.6.so fun:dl_open_worker fun:_dl_open fun:do_dlopen fun:dlerror_run fun:__libc_dlopen_mode }
Then, you have to specify valgrind to use this suppression file by passing the –suppressions=$HOME/.valgrind.supp
option on the command line. You can also add the following to your ~/.bashrc so that it gets passed automatically. Actually, it passes a bit more options to valgrind, and this happen to be my personnal settings. Check the valgrind documentation for more information.
export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp"
When debugging SimGrid, it's easier to pass the –disable-compiler-optimization flag to the configure if valgrind or gdb get fooled by the optimization done by the compiler. But you should remove these flag when everything works before going in production (before launching your 1252135 experiments), or everything will run only one half of the true SimGrid potential.
Unfortunately, we cannot debug every code written in SimGrid. We furthermore believe that the framework provides ways enough information to debug such informations yourself. If the textual output is not enough, Make sure to check the Visualizing and analyzing the results FAQ entry to see how to get a graphical one.
Now, if you come up with a really simple example that deadlocks and you're absolutely convinced that it should not, you can ask on the list. Just be aware that you'll be severely punished if the mistake is on your side... We have plenty of FAQ entries to redact and new features to implement for the impenitents! ;)
OK, first of all, remember that units should be Bytes, Flops and Seconds. If you don't use such units, some SimGrid constants (e.g. the SG_TCP_CTE_GAMMA constant used in most network models) won't have the right unit and you'll end up with weird results.
Here is what happens with a single transfer of size L on a link (bw,lat) when nothing else happens.
0-----lat--------------------------------------------------t |-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
In more complex situations, this min is the solution of a complex max-min linear system. Have a look here and read the two threads "Bug in SURF?" and "Surf bug not fixed?". You'll have a few other examples of such computations. You can also read "A Network Model for Simulation of Grid Application" by Henri Casanova and Loris Marchal to have all the details. The fact that the real_bw is smaller than bw is easy to understand. The fact that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the window-based congestion mechanism of TCP. With TCP, you can't exploit your huge network capacity if you don't have a good round-trip-time because of the acks...
Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).
if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully
use your link) if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the limit) if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)
This bound on the effective bandwidth of a flow is not the only thing that may make your result be unexpected. For example, two flows competing on a saturated link receive an amount of bandwidth inversely proportional to their round trip time.
We do our best to make sure to hammer away any bugs of SimGrid, but this is still an academic project so please be patient if/when you find bugs in it. If you do, the best solution is to drop an email either on the simgrid-user or the simgrid-devel mailing list and explain us about the issue. You can also decide to open a formal bug report using the relevant interface. You need to login on the server to get the ability to submit bugs.
We will do our best to solve any problem repported, but you need to help us finding the issue. Just telling "it segfault" isn't enough. Telling "It segfaults when running the attached simulator" doesn't really help either. You may find the following article interesting to see how to repport informative bug repports: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html (it is not SimGrid specific at all, but it's full of good advices).