Hardware Locality (hwloc)
2.0.0rc2
|
See How do I handle ABI breaks and API upgrades? for detecting the hwloc version that you are compiling and/or running against.
In hwloc v1.x, NUMA nodes were inside the tree, for instance Packages contained 2 NUMA nodes which contained a L3 and several cache.
Starting with hwloc v2.0, NUMA nodes are not in the main tree anymore. They are attached under objects as Memory Children on the side of normal children. This memory children list starts at obj->memory_first_child
and its size is obj->memory_arity
. Hence there can now exist two local NUMA nodes, for instance on KNL.
The normal list of children (starting at obj->first_child
, ending at obj->last_child
, of size obj->arity
, and available as the array obj->children
) now only contains CPU-side objects: PUs, Cores, Packages, Caches, Groups, Machine and System. hwloc_get_next_child() may still be used to iterate over all children of all lists.
Hence the CPU-side hierarchy is built using normal children, while memory is attached to that hierarchy depending on its affinity.
For instance:
a machine with 2 packages but a single NUMA node is now modeled as a "Machine" object with two "Package" children and one "NUMANode" memory children (displayed first in lstopo below).
Machine (1024MB total) NUMANode L#0 (P#0 1024MB) Package L#0 Core L#0 + PU L#0 (P#0) Core L#1 + PU L#1 (P#1) Package L#1 Core L#2 + PU L#2 (P#2) Core L#3 + PU L#3 (P#3)
a machine with 2 packages with one NUMA node and 2 cores in each is now
Machine (2048MB total) Package L#0 NUMANode L#0 (P#0 1024MB) Core L#0 + PU L#0 (P#0) Core L#1 + PU L#1 (P#1) Package L#1 NUMANode L#1 (P#1 1024MB) Core L#2 + PU L#2 (P#2) Core L#3 + PU L#3 (P#3)
if there are two NUMA nodes per package, a Group object is used to keep cores together with their local NUMA node:
Machine (4096MB total) Package L#0 Group0 L#0 NUMANode L#0 (P#0 1024MB) Core L#0 + PU L#0 (P#0) Core L#1 + PU L#1 (P#1) Group0 L#1 NUMANode L#1 (P#1 1024MB) Core L#2 + PU L#2 (P#2) Core L#3 + PU L#3 (P#3) Package L#1 [...]
Machine (4096MB total) Package L#0 L3 L#0 (16MB) NUMANode L#0 (P#0 1024MB) Core L#0 + PU L#0 (P#0) Core L#1 + PU L#1 (P#1) L3 L#1 (16MB) NUMANode L#1 (P#1 1024MB) Core L#2 + PU L#2 (P#2) Core L#3 + PU L#3 (P#3) Package L#1 [...]
NUMA nodes are not in "main" tree of normal objects anymore. Hence, they don't have a meaningful depth anymore. They have a virtual (negative) depth (HWLOC_TYPE_DEPTH_NUMANODE) so that functions manipulating depths still work, and so that we can still iterate over the level of NUMA nodes just like for any other level. For instance we can still use lines such as
int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, 4); hwloc_obj_t node = hwloc_get_next_obj_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE, prev);
Also tt is still possible to look at a nodeset and then iterate over NUMA nodes whose nodeset is included.
However, applications that ever walked up/down to find NUMANode parent/children must now be updated. For instance, finding a NUMANode parent should be replaced with finding a parent that has a memory child, and using that child.
Also the NUMA depth should not be compared with others. An unmodified code that still NUMA and Package depths (to find out whether Packages contain NUMA or the contrary) would now always assume Packages contain NUMA (because the NUMA depth is negative). If all NUMA nodes are attached to Normal parents at the same depth, the depth of these parents may be used instead. It may be retrieved with hwloc_get_memory_parents_depth(). However this function may return HWLOC_TYPE_DEPTH_MULTIPLE on future platforms with NUMA nodes attached to different levels.
I/O children are not in the main object children list anymore either. They are in the list starting at obj->io_first_child
and whose size if obj->io_arity
.
Misc children are not in the main object children list anymore. They are in the list starting at obj->misc_first_child
nd whose size if obj->misc_arity
.
hwloc_get_next_child() may still be used to iterate over all children of all lists.
Given the above, objects may now be of 4 kinds:
For a given object type, the kind may be found with hwloc_obj_type_is_normal(), hwloc_obj_type_is_memory(), hwloc_obj_type_is_normal(), or comparing with HWLOC_OBJ_MISC.
Instead of a single HWLOC_OBJ_CACHE, there are now 8 types HWLOC_OBJ_L1CACHE, ..., HWLOC_OBJ_L5CACHE, HWLOC_OBJ_L1ICACHE, ..., HWLOC_OBJ_L3ICACHE.
Cache object attributes are unchanged.
hwloc_get_cache_type_depth() is not needed to disambiguate cache types anymore since new types can be passed to hwloc_get_type_depth() without ever getting HWLOC_TYPE_DEPTH_MULTIPLE anymore.
hwloc_obj_type_is_cache(), hwloc_obj_type_is_dcache() and hwloc_obj_type_is_icache() may be used to check whether a given type is a cache, data/unified cache or instruction cache.
Objects do not have allowed_cpuset
and allowed_nodeset
anymore. They are only available for the entire topology using hwloc_topology_get_allowed_cpuset() and hwloc_topology_get_allowed_nodeset().
As usual, those are only needed when the WHOLE_SYSTEM topology flag is given, which means disallowed objects are kept in the topology. If so, one may find out whether some PUs inside an object is allowed by checking
hwloc_bitmap_intersects(obj->cpuset, hwloc_topology_get_allowed_cpuset(topology))
Replace cpusets with nodesets for NUMA nodes. To find out which ones, replace intersects() with and() to get the actual intersection.
obj->depth
as well as depths given to functions such as hwloc_get_obj_by_depth() or returned by hwloc_topology_get_depth() are now signed int.
Other depth such as cache-specific depth attribute are still unsigned.
Memory attributes such as obj->memory.local_memory
are now only available in NUMANode-specific attributes in obj->attr->numanode.local_memory
.
obj->memory.total_memory
is available in all objects as obj->total_memory
.
hwloc_topology_ignore_type(), hwloc_topology_ignore_type_keep_structure() and hwloc_topology_ignore_all_keep_structure() are respectively superseded by
hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_NONE); hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_STRUCTURE); hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_STRUCTURE);
Also, the meaning of KEEP_STRUCTURE has changed (only entire levels may be ignored, instead of single objects), the old behavior is not available anymore.
HWLOC_TOPOLOGY_FLAG_ICACHES is superseded by
hwloc_topology_set_icache_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
HWLOC_TOPOLOGY_FLAG_WHOLE_IO, HWLOC_TOPOLOGY_FLAG_IO_DEVICES and HWLOC_TOPOLOGY_FLAG_IO_BRIDGES replaced.
To keep all I/O devices (PCI, Bridges, and OS devices), use:
hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
To only keep important devices (Bridges with children, common PCI devices and OS devices):
hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_IMPORTANT);
2.0 XML files are not compatible with 1.x
2.0 can load 1.x files, but only NUMA distances are imported. Other distance matrices are ignored (they were never used by default anyway).
2.0 can export 1.x-compatible files, but only distances attached to the root object are exported (i.e. distances that cover the entire machine). Other distance matrices are dropped (they were never used by default anyway).
Users are advised to negociate hwloc versions between exporter and importer: If the importer isn't 2.x, the exporter should export to 1.x. Otherwise, things should work by default.
Hence hwloc_topology_export_xml() and hwloc_topology_export_xmlbuffer() have a new flags argument. to force a hwloc-1.x-compatible XML export.
#if HWLOC_API_VERSION >= 0x20000 if (need 1.x compatible XML export) hwloc_topology_export_xml(...., HWLOC_TOPOLOGY_EXPORT_XML_FLAG_V1); else /* need 2.x compatible XML export */ hwloc_topology_export_xml(...., 0); #else hwloc_topology_export_xml(....); #endif
Additionally, hwloc_topology_diff_load_xml(), hwloc_topology_diff_load_xmlbuffer(), hwloc_topology_diff_export_xml(), hwloc_topology_diff_export_xmlbuffer() and hwloc_topology_diff_destroy() lost the topology argument: The first argument (topology) isn't needed anymore.
The new distances API is in hwloc/distances.h.
Distances are not accessible directly from objects anymore. One should first call hwloc_distances_get() (or a variant) to retrieve distances (possibly with one call to get the number of available distances structures, and another call to actually get them). Then it may consult these structures, and finally release them.
The set of object involved in a distances structure is specified by an array of objects, it may not always cover the entire machine or so.
Bitmap functions (and a couple other functions) can return errors (in theory).
Most bitmap functions may have to reallocate the internal bitmap storage. In v1.x, they would silently crash if realloc failed. In v2.0, they now return an int that can be negative on error. However the preallocated storage is 512 bits, hence realloc will not even be used unless you run hwloc on machines with larger PU or NUMAnode indexes.
hwloc_obj_add_info(), hwloc_cpuset_from_nodeset() and hwloc_nodeset_to_cpuset() also return an int, which would be -1 in case of allocation errors.
hwloc_type_sscanf() extends hwloc_obj_type_sscanf() by passing a union hwloc_obj_attr_u which may receive Cache, Group, Bridge or OS device attributes.
hwloc_type_sscanf_as_depth() is also added to directly return the corresponding level depth within a topology.
hwloc_topology_insert_misc_object_by_cpuset() is replaced with hwloc_topology_alloc_group_object() and hwloc_topology_insert_group_object().
hwloc_topology_insert_misc_object_by_parent() is replaced with hwloc_topology_insert_misc_object().
HWLOC_OBJ_SYSTEM removed: The root object is always HWLOC_OBJ_MACHINE
_membind_nodeset() memory binding interfaces deprecated: One should use the variant without _nodeset suffix and pass the new HWLOC_MEMBIND_BYNODESET flag
HWLOC_MEMBIND_REPLICATE removed: no supported operating system supports it anymore.
hwloc_obj_snprintf() removed because it was long-deprecated by hwloc_obj_type_snprintf() and hwloc_obj_attr_snprintf().
hwloc_obj_type_sscanf() deprecated, hwloc_obj_type_of_string() removed.
hwloc_cpuset_from/to_nodeset_strict() deprecated: Now useless since all topologies are NUMA. Use the variant without the _strict suffix
hwloc_distribute() and hwloc_distributev() removed, deprecated by hwloc_distrib()
The Custom interface (hwloc_topology_set_custom(), etc.) was removed, as well as the corresponding command-line tools (hwloc-assembler, etc.). Topologies always start with object with valid cpusets and nodesets.
obj->online_cpuset
removed: Offline PUs are simply listed in the complete_cpuset
as previously.
obj->os_level
removed.