How to Develop yt

Note

If you already know how to use version control and are comfortable with handling it yourself, the quickest way to contribute to yt is to fork us on BitBucket, make your changes, and issue a pull request. The rest of this document is just an explanation of how to do that.

yt is a community project!

We are very happy to accept patches, features, and bugfixes from any member of the community! yt is developed using mercurial, primarily because it enables very easy and straightforward submission of changesets. We’re eager to hear from you, and if you are developing yt, we encourage you to subscribe to the developer mailing list

Please feel free to hack around, commit changes, and send them upstream. If you’re new to Mercurial, these three resources are pretty great for learning the ins and outs:

The commands that are essential for using mercurial include:

  • hg commit which commits changes in the working directory to the repository, creating a new “changeset object.”
  • hg add which adds a new file to be tracked by mercurial. This does not change the working directory.
  • hg pull which pulls (from an optional path specifier) changeset objects from a remote source. The working directory is not modified.
  • hg push which sends (to an optional path specifier) changeset objects to a remote source. The working directory is not modified.
  • hg log which shows a log of all changeset objects in the current repository. Use -g to show a graph of changeset objects and their relationship.
  • hg update which (with an optional “revision” specifier) updates the state of the working directory to match a changeset object in the repository.
  • hg merge which combines two changesets to make a union of their lines of development. This updates the working directory.

Keep in touch, and happy hacking! We also provide doc/coding_styleguide.txt and an example of a fiducial docstring in doc/docstring_example.txt. Please read them before hacking on the codebase, and feel free to email any of the mailing lists for help with the codebase.

Submitting Changes

We provide a brief introduction to submitting changes here. yt thrives on the strength of its communities ( http://arxiv.org/abs/1301.7064 has further discussion) and we encourage contributions from any user. While we do not discuss in detail version control, mercurial or the advanced usage of BitBucket, we do provide an outline of how to submit changes and we are happy to provide further assistance or guidance.

Licensing

yt has, with the 2.6 release, been relicensed under the BSD 3-clause license. Previously versions were released under the GPLv3.

All contributed code must be BSD-compatible. If you’d rather not license in this manner, but still want to contribute, please consider creating an external package, which we’ll happily link to.

Requirements for Code Submission

Modifications to the code typically fall into one of three categories, each of which have different requirements for acceptance into the code base. These requirements are in place for a few reasons – to make sure that the code is maintainable, testable, and that we can easily include information about changes in changelogs during the release procedure. (See YTEP-0008 for more detail.)

  • New Features
    • New unit tests (possibly new answer tests) (See Testing)
    • Docstrings in the source code for the public API
    • Addition of new feature to the narrative documentation (See How to Write Documentation)
    • Addition of cookbook recipe (See How to Write Documentation)
    • Issue created on issue tracker, to ensure this is added to the changelog
  • Extension or Breakage of API in Existing Features
  • Bug fixes
    • Unit test is encouraged, to ensure breakage does not happen again in the future. (See Testing)
    • Issue created on issue tracker, to ensure this is added to the changelog

When submitting, you will be asked to make sure that your changes meet all of these requirements. They are pretty easy to meet, and we’re also happy to help out with them. In Code Style Guide there is a list of handy tips for how to structure and write your code.

How to Use Mercurial with yt

This document doesn’t cover detailed mercurial use, but on IRC we are happy to walk you through any troubles you might have. Here are some suggestions for using mercurial with yt:

  • Named branches are to be avoided. Try using bookmarks (hg bookmark) to track work. (More)
  • Make sure you set a username in your ~/.hgrc before you commit any changes! All of the tutorials above will describe how to do this as one of the very first steps.
  • When contributing changes, you might be asked to make a handful of modifications to your source code. We’ll work through how to do this with you, and try to make it as painless as possible.
  • Please avoid deleting your yt forks, as that eliminates the code review process from BitBucket’s website.
  • In all likelihood, you only need one fork. To keep it in sync, you can sync from the website. (See Bitbucket’s Blog Post about this.)
  • If you run into any troubles, stop by IRC (see Go on IRC to ask a question) or the mailing list.

Building yt

If you have made changes to any C or Cython (.pyx) modules, you have to rebuild yt. If your changes have exclusively been to Python modules, you will not need to re-build, but (see below) you may need to re-install.

If you are running from a clone that is executable in-place (i.e., has been installed via the installation script or you have run setup.py develop) you can rebuild these modules by executing:

$ python2.7 setup.py develop

If you have previously “installed” via setup.py install you have to re-install:

$ python2.7 setup.py install

Only one of these two options is needed.

Developing yt on Windows

If you plan to develop yt on Windows, it is necessary to use the MinGW gcc compiler that can be installed using the Anaconda Python Distribution. The libpython package must be installed from Anaconda as well. These can both be installed with a single command:

$ conda install libpython mingw

Additionally, the syntax for the setup command is slightly different; you must type:

$ python2.7 setup.py build --compiler=mingw32 develop

or

$ python2.7 setup.py build --compiler=mingw32 install

Making and Sharing Changes

The simplest way to submit changes to yt is to do the following:

  • Build yt from the mercurial repository
  • Navigate to the root of the yt repository
  • Make some changes and commit them
  • Fork the yt repository on BitBucket
  • Push the changesets to your fork
  • Issue a pull request.

Here’s a more detailed flowchart of how to submit changes.

  1. If you have used the installation script, the source code for yt can be found in $YT_DEST/src/yt-hg. Alternatively see Installing yt Using pip or from Source for instructions on how to build yt from the mercurial repository. (Below, in How To Read The Source Code, we describe how to find items of interest.)

  2. Edit the source file you are interested in and test your changes. (See Testing for more information.)

  3. Fork yt on BitBucket. (This step only has to be done once.) You can do this at: https://bitbucket.org/yt_analysis/yt/fork. Call this repository yt.

  4. Commit these changes, using hg commit. This can take an argument which is a series of filenames, if you have some changes you do not want to commit.

  5. Remember that this is a large development effort and to keep the code accessible to everyone, good documentation is a must. Add in source code comments for what you are doing. Add in docstrings if you are adding a new function or class or keyword to a function. Add documentation to the appropriate section of the online docs so that people other than yourself know how to use your new code.

  6. If your changes include new functionality or cover an untested area of the code, add a test. (See Testing for more information.) Commit these changes as well.

  7. Push your changes to your new fork using the command:

    hg push -r . https://bitbucket.org/YourUsername/yt/
    

    If you end up doing considerable development, you can set an alias in the file .hg/hgrc to point to this path.

    Note

    Note that the above approach uses HTTPS as the transfer protocol between your machine and BitBucket. If you prefer to use SSH - or perhaps you’re behind a proxy that doesn’t play well with SSL via HTTPS - you may want to set up an SSH key on BitBucket. Then, you use the syntax ssh://hg@bitbucket.org/YourUsername/yt, or equivalent, in place of https://bitbucket.org/YourUsername/yt in Mercurial commands. For consistency, all commands we list in this document will use the HTTPS protocol.

  8. Issue a pull request at https://bitbucket.org/YourUsername/yt/pull-request/new A pull request is essentially just asking people to review and accept the modifications you have made to your personal version of the code.

During the course of your pull request you may be asked to make changes. These changes may be related to style issues, correctness issues, or even requesting tests. The process for responding to pull request code review is relatively straightforward.

  1. Make requested changes, or leave a comment indicating why you don’t think they should be made.

  2. Commit those changes to your local repository.

  3. Push the changes to your fork:

  4. Your pull request will be automatically updated.

Working with Multiple BitBucket Pull Requests

Once you become active developing for yt, you may be working on various aspects of the code or bugfixes at the same time. Currently, BitBucket’s modus operandi for pull requests automatically updates your active pull request with every hg push of commits that are a descendant of the head of your pull request. In a normal workflow, this means that if you have an active pull request, make some changes locally for, say, an unrelated bugfix, then push those changes back to your fork in the hopes of creating a new pull request, you’ll actually end up updating your current pull request!

There are a few ways around this feature of BitBucket that will allow for multiple pull requests to coexist; we outline one such method below. We assume that you have a fork of yt at http://bitbucket.org/YourUsername/Your_yt (see Making and Sharing Changes for instructions on creating a fork) and that you have an active pull request to the main repository.

The main issue with starting another pull request is to make sure that your push to BitBucket doesn’t go to the same head as your existing pull request and trigger BitBucket’s auto-update feature. Here’s how to get your local repository away from your current pull request head using revsets and your hgrc file:

  1. Set up a Mercurial path for the main yt repository (note this is a convenience step and only needs to be done once). Add the following to your Your_yt/.hg/hgrc:

    [paths]
    upstream = https://bitbucket.org/yt_analysis/yt
    

    This will create a path called upstream that is aliased to the URL of the main yt repository.

  2. Now we’ll use revsets to update your local repository to the tip of the upstream path:

    $ hg pull upstream
    $ hg update -r "remote(tip,'upstream')"
    

After the above steps, your local repository should be at the tip of the main yt repository. If you find yourself doing this a lot, it may be worth aliasing this task in your hgrc file by adding something like:

[alias]
myupdate = update -r "remote(tip,'upstream')"

And then you can just issue hg myupdate to get at the tip of the main yt repository.

Make sure you are on the branch you want to be on, and then you can make changes and hg commit them. If you prefer working with bookmarks, you may want to make a bookmark before committing your changes, such as hg bookmark mybookmark.

To push to your fork on BitBucket if you didn’t use a bookmark, you issue the following:

$ hg push -r . -f https://bitbucket.org/YourUsername/Your_yt

The -r . means “push only the commit I’m standing on and any ancestors.” The -f is to force Mecurial to do the push since we are creating a new remote head.

Note that if you did use a bookmark, you don’t have to force the push, but you do need to push the bookmark; in other words do the following instead of the above:

$ hg push -B mybookmark https://bitbucket.org/YourUsername/Your_yt

The -B means “publish my bookmark and any relevant changesets to the remote server.”

You can then go to the BitBucket interface and issue a new pull request based on your last changes, as usual.

How To Get The Source Code For Editing

yt is hosted on BitBucket, and you can see all of the yt repositories at http://bitbucket.org/yt_analysis/. With the yt installation script you should have a copy of Mercurial for checking out pieces of code. Make sure you have followed the steps above for bootstrapping your development (to assure you have a bitbucket account, etc.)

In order to modify the source code for yt, we ask that you make a “fork” of the main yt repository on bitbucket. A fork is simply an exact copy of the main repository (along with its history) that you will now own and can make modifications as you please. You can create a personal fork by visiting the yt bitbucket webpage at https://bitbucket.org/yt_analysis/yt/ . After logging in, you should see an option near the top right labeled “fork”. Click this option, and then click the fork repository button on the subsequent page. You now have a forked copy of the yt repository for your own personal modification.

This forked copy exists on the bitbucket repository, so in order to access it locally, follow the instructions at the top of that webpage for that forked repository, namely run at a local command line:

$ hg clone http://bitbucket.org/<USER>/<REPOSITORY_NAME>

This downloads that new forked repository to your local machine, so that you can access it, read it, make modifications, etc. It will put the repository in a local directory of the same name as the repository in the current working directory. You can see any past state of the code by using the hg log command. For example, the following command would show you the last 5 changesets (modifications to the code) that were submitted to that repository.

$ cd <REPOSITORY_NAME>
$ hg log -l 5

Using the revision specifier (the number or hash identifier next to each changeset), you can update the local repository to any past state of the code (a previous changeset or version) by executing the command:

$ hg up revision_specifier

Lastly, if you want to use this new downloaded version of your yt repository as the active version of yt on your computer (i.e. the one which is executed when you run yt from the command line or the one that is loaded when you do import yt), then you must “activate” it using the following commands from within the repository directory.

$ cd <REPOSITORY_NAME>
$ python2.7 setup.py develop

This will rebuild all C modules as well.

How To Read The Source Code

If you just want to look at the source code, you may already have it on your computer. If you build yt using the install script, the source is available at $YT_DEST/src/yt-hg. See Installing yt Using pip or from Source for more details about to obtain the yt source code if you did not build yt using the install script.

The root directory of the yt mercurial repository contains a number of subdirectories with different components of the code. Most of the yt source code is contained in the yt subdirectory. This directory its self contains the following subdirectories:

frontends

This is where interfaces to codes are created. Within each subdirectory of yt/frontends/ there must exist the following files, even if empty:

  • data_structures.py, where subclasses of AMRGridPatch, Dataset and AMRHierarchy are defined.
  • io.py, where a subclass of IOHandler is defined.
  • fields.py, where fields we expect to find in datasets are defined
  • misc.py, where any miscellaneous functions or classes are defined.
  • definitions.py, where any definitions specific to the frontend are defined. (i.e., header formats, etc.)
fields
This is where all of the derived fields that ship with yt are defined.
geometry
This is where geometric helpler routines are defined. Handlers for grid and oct data, as well as helpers for coordinate transformations can be found here.
visualization
This is where all visualization modules are stored. This includes plot collections, the volume rendering interface, and pixelization frontends.
data_objects
All objects that handle data, processed or unprocessed, not explicitly defined as visualization are located in here. This includes the base classes for data regions, covering grids, time series, and so on. This also includes derived fields and derived quantities.
analysis_modules
This is where all mechanisms for processing data live. This includes things like clump finding, halo profiling, halo finding, and so on. This is something of a catchall, but it serves as a level of greater abstraction that simply data selection and modification.
gui
This is where all GUI components go. Typically this will be some small tool used for one or two things, which contains a launching mechanism on the command line.
utilities
All broadly useful code that doesn’t clearly fit in one of the other categories goes here.
extern
Bundled external modules (i.e. code that was not written by one of the yt authors but that yt depends on) lives here.

If you’re looking for a specific file or function in the yt source code, use the unix find command:

$ find <DIRECTORY_TREE_TO_SEARCH> -name '<FILENAME>'

The above command will find the FILENAME in any subdirectory in the DIRECTORY_TREE_TO_SEARCH. Alternatively, if you’re looking for a function call or a keyword in an unknown file in a directory tree, try:

$ grep -R <KEYWORD_TO_FIND> <DIRECTORY_TREE_TO_SEARCH>

This can be very useful for tracking down functions in the yt source.

Code Style Guide

To keep things tidy, we try to stick with a couple simple guidelines.

General Guidelines

  • In general, follow PEP-8 guidelines.
  • Classes are ConjoinedCapitals, methods and functions are lowercase_with_underscores.
  • Use 4 spaces, not tabs, to represent indentation.
  • Line widths should not be more than 80 characters.
  • Do not use nested classes unless you have a very good reason to, such as requiring a namespace or class-definition modification. Classes should live at the top level. __metaclass__ is exempt from this.
  • Do not use unnecessary parentheses in conditionals. if((something) and (something_else)) should be rewritten as if something and something_else. Python is more forgiving than C.
  • Avoid copying memory when possible. For example, don’t do a = a.reshape(3,4) when a.shape = (3,4) will do, and a = a * 3 should be np.multiply(a, 3, a).
  • In general, avoid all double-underscore method names: __something is usually unnecessary.
  • Doc strings should describe input, output, behavior, and any state changes that occur on an object. See the file doc/docstring_example.txt for a fiducial example of a docstring.

API Guide

  • Do not import “*” from anything other than yt.funcs.
  • Internally, only import from source files directly; instead of: from yt.visualization.api import SlicePlot do from yt.visualization.plot_window import SlicePlot.
  • Numpy is to be imported as np.
  • Do not use too many keyword arguments. If you have a lot of keyword arguments, then you are doing too much in __init__ and not enough via parameter setting.
  • In function arguments, place spaces before commas. def something(a,b,c) should be def something(a, b, c).
  • Don’t create a new class to replicate the functionality of an old class – replace the old class. Too many options makes for a confusing user experience.
  • Parameter files external to yt are a last resort.
  • The usage of the **kwargs construction should be avoided. If they cannot be avoided, they must be explained, even if they are only to be passed on to a nested function.
  • Constructor APIs should be kept as simple as possible.
  • Variable names should be short but descriptive.
  • No global variables!

Variable Names and Enzo-isms

  • Avoid Enzo-isms. This includes but is not limited to:
    • Hard-coding parameter names that are the same as those in Enzo. The following translation table should be of some help. Note that the parameters are now properties on a Dataset subclass: you access them like ds.refine_by .
      • RefineBy `` => `` refine_by
      • TopGridRank `` => `` dimensionality
      • TopGridDimensions `` => `` domain_dimensions
      • InitialTime `` => `` current_time
      • DomainLeftEdge `` => `` domain_left_edge
      • DomainRightEdge `` => `` domain_right_edge
      • CurrentTimeIdentifier `` => `` unique_identifier
      • CosmologyCurrentRedshift `` => `` current_redshift
      • ComovingCoordinates `` => `` cosmological_simulation
      • CosmologyOmegaMatterNow `` => `` omega_matter
      • CosmologyOmegaLambdaNow `` => `` omega_lambda
      • CosmologyHubbleConstantNow `` => `` hubble_constant
    • Do not assume that the domain runs from 0 to 1. This is not true for many codes and datasets.