Data Objects

What are Data Objects in yt?

Data objects (also called Data Containers) are used in yt as convenience structures for grouping data in logical ways that make sense in the context of the dataset as a whole. Some of the data objects are geometrical groupings of data (e.g. sphere, box, cylinder, etc.). Others represent data products derived from your dataset (e.g. slices, streamlines, surfaces). Still other data objects group multiple objects together or filter them (e.g. data collection, cut region).

To generate standard plots, objects rarely need to be directly constructed. However, for detailed data inspection as well as hand-crafted derived data, objects can be exceptionally useful and even necessary.

How to Create and Use an Object

To create an object, you usually only need a loaded dataset, the name of the object type, and the relevant parameters for your object. Here is a common example for creating a Region object that covers all of your data volume.

import yt
ds = yt.load("RedshiftOutput0005")
ad = ds.all_data()

Alternatively, we could create a sphere object of radius 1 kpc on location [0.5, 0.5, 0.5]:

import yt
ds = yt.load("RedshiftOutput0005")
sp = ds.sphere([0.5, 0.5, 0.5], (1, 'kpc'))

After an object has been created, it can be used as a data_source to certain tasks like ProjectionPlot (see ProjectionPlot), one can compute the bulk quantities associated with that object (see Processing Objects: Derived Quantities), or the data can be examined directly. For example, if you want to figure out the temperature at all indexed locations in the central sphere of your dataset you could:

import yt
ds = yt.load("RedshiftOutput0005")
sp = ds.sphere([0.5, 0.5, 0.5], (1, 'kpc'))

# Show all temperature values
print sp["temperature"]

# Print things in a more human-friendly manner: one temperature at a time
print "(x,  y,  z) Temperature"
print "-----------------------"
for i in range(sp["temperature"].size):
    print "(%f,  %f,  %f)    %f" % (sp["x"][i], sp["y"][i], sp["z"][i], sp["temperature"][i])

Available Objects

As noted above, there are numerous types of objects. Here we group them into:

  • Geometric Objects Data is selected based on spatial shapes in the dataset
  • Filtering Objects Data is selected based on other field criteria
  • Collection Objects Multiple objects grouped together
  • Construction Objects Objects represent some sort of data product constructed by additional analysis

If you want to create your own custom data object type, see Creating Data Objects.

Geometric Objects

For 0D, 1D, and 2D geometric objects, if the extent of the object intersects a grid cell, then the cell is included in the object; however, for 3D objects the center of the cell must be within the object in order for the grid cell to be incorporated.

0D Objects

Point
Usage: point(coord, ds=None, field_parameters=None, data_source=None)
A point defined by a single cell at specified coordinates.

1D Objects

Ray (Axis-Aligned)
Usage: ortho_ray(axis, coord, ds=None, field_parameters=None, data_source=None)
A line (of data cells) stretching through the full domain aligned with one of the x,y,z axes. Defined by an axis and a point to be intersected. Please see this note about ray data value ordering.
Ray (Arbitrarily-Aligned)
Class YTRayBase
Usage: ray(start_coord, end_coord, ds=None, field_parameters=None, data_source=None)
A line (of data cells) defined by arbitrary start and end coordinates. Please see this note about ray data value ordering.

2D Objects

Slice (Axis-Aligned)
Usage: slice(axis, coord, center=None, ds=None, field_parameters=None, data_source=None)
A plane normal to one of the axes and intersecting a particular coordinate.
Slice (Arbitrarily-Aligned)
Usage: cutting(normal, coord, north_vector=None, ds=None, field_parameters=None, data_source=None)
A plane normal to a specified vector and intersecting a particular coordinate.

3D Objects

All Data
Function all_data()
Usage: all_data(find_max=False)
all_data() is a wrapper on the Box Region class which defaults to creating a Region covering the entire dataset domain. It is effectively ds.region(ds.domain_center, ds.domain_left_edge, ds.domain_right_edge).
Box Region
Usage: region(center, left_edge, right_edge, fields=None, ds=None, field_parameters=None, data_source=None)
Alternatively: box(left_edge, right_edge, fields=None, ds=None, field_parameters=None, data_source=None)
A box-like region aligned with the grid axis orientation. It is defined by a left_edge, a right_edge, and a center. The left_edge and right_edge are the minimum and maximum bounds in the three axes respectively. The center is arbitrary and must only be contained within the left_edge and right_edge. By using the box wrapper, the center is assumed to be the midpoint between the left and right edges.
Disk/Cylinder
Class: YTDiskBase
Usage: disk(center, normal, radius, height, fields=None, ds=None, field_parameters=None, data_source=None)
A cylinder defined by a point at the center of one of the circular bases, a normal vector to it defining the orientation of the length of the cylinder, and radius and height values for the cylinder’s dimensions.
Ellipsoid
Usage: ellipsoid(center, semi_major_axis_length, semi_medium_axis_length, semi_minor_axis_length, semi_major_vector, tilt, fields=None, ds=None, field_parameters=None, data_source=None)
An ellipsoid with axis magnitudes set by semi_major_axis_length, semi_medium_axis_length, and semi_minor_axis_length. semi_major_vector sets the direction of the semi_major_axis. tilt defines the orientation of the semi-medium and semi_minor axes.
Sphere
Usage: sphere(center, radius, ds=None, field_parameters=None, data_source=None)
A sphere defined by a central coordinate and a radius.

Filtering and Collection Objects

See also the section on Filtering your Dataset.

Intersecting Regions
Most Region objects provide a data_source parameter, which allows you to subselect
one region from another (in the coordinate system of the DataSet). Note, this can
easily lead to empty data for non-intersecting regions.
Usage: slice(axis, coord, ds, data_source=sph)
Boolean Regions
Note: not yet implemented in yt 3.0
Usage: boolean()
Filter
Usage: cut_region(base_object, conditionals, ds=None, field_parameters=None)
A cut_region is a filter which can be applied to any other data object. The filter is defined by the conditionals present, which apply cuts to the data in the object. A cut_region will work for either particle fields or mesh fields, but not on both simulaneously. For more detailed information and examples, see Cut Regions.
Collection of Data Objects
Usage: data_collection(center, obj_list, ds=None, field_parameters=None)
A data_collection is a list of data objects that can be sampled and processed as a whole in a single data object.

Construction Objects

Fixed-Resolution Region
Usage: covering_grid(level, left_edge, dimensions, fields=None, ds=None, num_ghost_zones=0, use_pbar=True, field_parameters=None)
A 3D region with all data extracted to a single, specified resolution. See Examining Grid Data in a Fixed Resolution Array.
Fixed-Resolution Region with Smoothing
Usage: smoothed_covering_grid(level, left_edge, dimensions, fields=None, ds=None, num_ghost_zones=0, use_pbar=True, field_parameters=None)
A 3D region with all data extracted and interpolated to a single, specified resolution. Identical to covering_grid, except that it interpolates as necessary from coarse regions to fine. See Examining Grid Data in a Fixed Resolution Array.
Fixed-Resolution Region for Particle Deposition
Usage: arbitrary_grid(left_edge, right_edge, dimensions, ds=None, field_parameters=None)
When particles are deposited on to mesh fields, they use the existing mesh structure, but this may have too much or too little resolution relative to the particle locations (or it may not exist at all!). An arbitrary_grid provides a means for generating a new independent mesh structure for particle deposition. See Arbitrary Grids Objects for Particle Deposition for more information.
Projection
Usage: proj(field, axis, weight_field=None, center=None, ds=None, data_source=None, method="integrate", field_parameters=None)
A 2D projection of a 3D volume along one of the axis directions. By default, this is a line integral through the entire simulation volume (although it can be a subset of that volume specified by a data object with the data_source keyword). Alternatively, one can specify a weight_field and different method values to change the nature of the projection outcome. See Types of Projections for more information.
Streamline
Usage: streamline(coord_list, length, fields=None, ds=None, field_parameters=None)
A streamline can be traced out by identifying a starting coordinate (or list of coordinates) and allowing it to trace a vector field, like gas velocity. See Streamlines: Tracking the Trajectories of Tracers in your Data for more information.
Surface
Usage: surface(data_source, field, field_value)
The surface defined by all an isocontour in any mesh field. An existing data object must be provided as the source, as well as a mesh field and the value of the field which you desire the isocontour. See 3D Surfaces and Sketchfab.

Processing Objects: Derived Quantities

Derived quantities are a way of calculating some bulk quantities associated with all of the grid cells contained in a data object. Derived quantities can be accessed via the quantities interface. Here is an example of how to get the angular momentum vector calculated from all the cells contained in a sphere at the center of our dataset.

ds = load("my_data")
sp = ds.sphere('c', (10, 'kpc'))
print sp.quantities.angular_momentum_vector()

Available Derived Quantities

Angular Momentum Vector
Usage: angular_momentum_vector(use_gas=True, use_particles=True)
The mass-weighted average angular momentum vector of the particles, gas, or both.
Bulk Velocity
Usage: bulk_velocity(use_gas=True, use_particles=True)
The mass-weighted average velocity of the particles, gas, or both.
Center of Mass
Usage: center_of_mass(use_cells=True, use_particles=False)
The location of the center of mass. By default, it computes of the non-particle data in the object, but it can be used on particles, gas, or both.
Extrema
Class Extrema
Usage: extrema(fields, non_zero=False)
The extrema of a field or list of fields.
Maximum Location
Usage: max_location(fields)
The maximum of a field or list of fields as well as the x,y,z location of that maximum.
Minimum Location
Usage: min_location(fields)
The minimum of a field or list of fields as well as the x,y,z location of that minimum.
Spin Parameter
Usage: spin_parameter(use_gas=True, use_particles=True)
The spin parameter for the baryons using the particles, gas, or both.
Total Mass
Class TotalMass
Usage: total_mass()
The total mass of the object as a tuple of (total gas, total particle) mass.
Total of a Field
Usage: total_quantity(fields)
The sum of a given field (or list of fields) over the entire object.
Weighted Average of a Field
Usage: weighted_average_quantity(fields, weight)
The weighted average of a field (or list of fields) over an entire data object. If you want an unweighted average, then set your weight to be the field: ones.
Weighted Variance of a Field
Usage: weighted_variance(fields, weight)
The weighted variance of a field (or list of fields) over an entire data object and the weighted mean. If you want an unweighted variance, then set your weight to be the field: ones.

Arbitrary Grids Objects for Particle Deposition

The covering grid and smoothed covering grid objects mandate that they be exactly aligned with the mesh. This is a holdover from the time when yt was used exclusively for data that came in regularly structured grid patches, and does not necessarily work as well for data that is composed of discrete objects like particles. To augment this, the YTArbitraryGridBase object was created, which enables construction of meshes (onto which particles can be deposited or smoothed) in arbitrary regions. This eliminates any assumptions on yt’s part about how the data is organized, and will allow for more fine-grained control over visualizations.

An example of creating an arbitrary grid would be to construct one, then query the deposited particle density, like so:

import yt
ds = yt.load("snapshot_010.hdf5")

obj = ds.arbitrary_grid([0.0, 0.0, 0.0], [0.99, 0.99, 0.99],
                       dims=[128, 128, 128])
print obj["deposit", "all_density"]

While these cannot yet be used as input to projections or slices, slices and projections can be taken of the data in them and visualized by hand.

Combining Objects: Boolean Data Objects

Note

Boolean Data Objects have not yet been ported to yt 3.0 from yt 2.x. If you are interested in aiding in this port, please contact the yt-dev mailing list. Until it is ported, this functionality below will not work.

A special type of data object is the boolean data object. It works only on three-dimensional objects. It is built by relating already existing data objects with boolean operators. The boolean logic may be nested using parentheses, and it supports the standard “AND”, “OR”, and “NOT” operators:

  • “AND” When two data objects are related with an “AND”, the combined data object is the volume of the simulation covered by both objects, and not by just a single object.
  • “OR” When two data objects are related with an “OR”, the combined data object is the volume(s) of the simulation covered by either of the objects. For example, this may be used to combine disjoint objects into one.
  • “NOT” When two data objects are related with a “NOT”, the combined data object is the volume of the first object that the second does not cover. For example, this may be used to cut out part(s) of the first data object utilizing the second data object.
  • “(” or ”)” Nested logic is surrounded by parentheses. The order of operations is such that the boolean logic is evaluated inside the inner-most parentheses, first, then goes upwards. The logic is read left-to-right at all levels (crucial for the “NOT” operator).

Please see the The Cookbook for some examples of how to use the boolean data object.

Connected Sets and Clump Finding

The underlying machinery used in Clump Finding is accessible from any data object. This includes the ability to obtain and examine topologically connected sets. These sets are identified by examining cells between two threshold values and connecting them. What is returned to the user is a list of the intervals of values found, and extracted regions that contain only those cells that are connected.

To use this, call extract_connected_sets() on any 3D data object. This requests a field, the number of levels of levels sets to extract, the min and the max value between which sets will be identified, and whether or not to conduct it in log space.

sp = ds.sphere("max", (1.0, 'pc'))
contour_values, connected_sets = sp.extract_connected_sets(
     "density", 3, 1e-30, 1e-20)

The first item, contour_values, will be an array of the min value for each set of level sets. The second (connected_sets) will be a dict of dicts. The key for the first (outer) dict is the level of the contour, corresponding to contour_values. The inner dict returned is keyed by the contour ID. It contains YTCutRegionBase objects. These can be queried just as any other data object. The clump finder (Clump Finding) differs from the above method in that the contour identification is performed recursively within each individual structure, and structures can be kept or remerged later based on additional criteria, such as gravitational boundedness.

Storing and Loading Objects

Often, when operating interactively or via the scripting interface, it is convenient to save an object or multiple objects out to disk and then restart the calculation later. For example, this is useful after clump finding (Clump Finding), which can be very time consuming. Typically, the save and load operations are used on 3D data objects. yt has a separate set of serialization operations for 2D objects such as projections.

yt will save out objects to disk under the presupposition that the construction of the objects is the difficult part, rather than the generation of the data – this means that you can save out an object as a description of how to recreate it in space, but not the actual data arrays affiliated with that object. The information that is saved includes the dataset off of which the object “hangs.” It is this piece of information that is the most difficult; the object, when reloaded, must be able to reconstruct a dataset from whatever limited information it has in the save file.

You can save objects to an output file using the function save_object():

import yt
ds = yt.load("my_data")
sp = ds.sphere([0.5, 0.5, 0.5], (10.0, 'kpc'))
sp.save_object("sphere_name", "save_file.cpkl")

This will store the object as sphere_name in the file save_file.cpkl, which will be created or accessed using the standard python module shelve.

To re-load an object saved this way, you can use the shelve module directly:

import yt
import shelve
ds = yt.load("my_data")
saved_fn = shelve.open("save_file.cpkl")
ds, sp = saved_fn["sphere_name"]

Additionally, we can store multiple objects in a single shelve file, so we have to call the sphere by name.

For certain data objects such as projections, serialization can be performed automatically if serialize option is set to True in the configuration file or set directly in the script:

from yt.config import ytcfg; ytcfg["yt", "serialize"] = "True"

Note

Use serialization with caution. Enabling serialization means that once a projection of a dataset has been created (and stored in the .yt file in the same directory), any subsequent changes to that dataset will be ignored when attempting to create the same projection. So if you take a density projection of your dataset in the ‘x’ direction, then somehow tweak that dataset significantly, and take the density projection again, yt will default to finding the original projection and not your new one.

Note

It’s also possible to use the standard cPickle module for loading and storing objects – so in theory you could even save a list of objects!

This method works for clumps, as well, and the entire clump index will be stored and restored upon load.