Contributing#

Before you start#

Before starting work on a contribution, please check the issue tracker to see if there’s already an issue describing what you have in mind.

  • If there is, add a comment to let others know you’re willing to work on it.

  • If there isn’t, please create a new issue to describe your idea.

We strongly encourage discussing your plans before you start coding—either in the issue itself or on our Zulip chat. This helps avoid duplicated effort and ensures your work aligns with the project’s scope and roadmap.

Keep in mind that we use issues liberally to track development. Some may be vague or aspirational, serving as reminders for future work rather than tasks ready to be tackled. There are a few reasons an issue might not be actionable yet:

  • It depends on other issues being resolved first.

  • It hasn’t been clearly scoped. In such cases, helping to clarify the scope or breaking the issue into smaller parts can be a valuable contribution. Maintainers typically lead this process, but you’re welcome to participate in the discussion.

  • It doesn’t currently fit into the roadmap or the maintainers’ priorities, meaning we may be unable to commit to timely guidance and prompt code reviews.

If you’re unsure whether an issue is ready to work on, just ask!

Some issues are labelled as good first issue. These are especially suitable if you’re new to the project, and we recommend starting there.

Contribution workflow#

If you want to contribute to movement and don’t have permission to make changes directly, you can create your own copy of the project, make updates, and then suggest those updates for inclusion in the main project. This process is often called a “fork and pull request” workflow.

When you create your own copy (or “fork”) of a project, it’s like making a new workspace that shares code with the original project. Once you’ve made your changes in your copy, you can submit them as a pull request, which is a way to propose changes back to the main project.

If you are not familiar with git, we recommend reading up on this guide.

Forking the repository#

  1. Fork the repository on GitHub. You can read more about forking in the GitHub docs.

  2. Clone your fork to your local machine and navigate to the repository folder:

    git clone [https://github.com/](https://github.com/)<your-github-username>/movement.git
    cd movement
    
  3. Set the upstream remote to the base movement repository: This links your local copy to the original project so you can pull the latest changes.

    git remote add upstream [https://github.com/neuroinformatics-unit/movement.git](https://github.com/neuroinformatics-unit/movement.git)
    

    Note

    Your repository now has two remotes: origin (your fork, where you push changes) and upstream (the main repository, where you pull updates from)

Creating a development environment#

Now that you have the repository locally, you need to set up a Python environment and install the project dependencies.

  1. Create an environment using conda or uv and install movement in editable mode, including development dependencies.

    First, create and activate a conda environment:

    conda create -n movement-dev -c conda-forge python=3.13
    conda activate movement-dev
    

    Then, install the package in editable mode with development dependencies:

    pip install -e ".[dev]"
    

    First, create and activate a virtual environment:

    uv venv --python=3.13
    source .venv/bin/activate  # On macOS and Linux
    .venv\Scripts\activate     # On Windows PowerShell
    

    Then, install the package in editable mode with development dependencies:

    uv pip install -e ".[dev]"
    

    If you also want to edit the documentation and preview the changes locally, you will additionally need the docs extra dependencies. See Editing the documentation for more details.

  2. Finally, initialise the pre-commit hooks:

    pre-commit install
    

Pull requests#

In all cases, please submit code to the main repository via a pull request (PR). We recommend, and adhere, to the following conventions:

  • Please submit draft PRs as early as possible to allow for discussion.

  • The PR title should be descriptive e.g. “Add new function to do X” or “Fix bug in Y”.

  • The PR description should be used to provide context and motivation for the changes.

    • If the PR is solving an issue, please add the issue number to the PR description, e.g. “Fixes #123” or “Closes #123”.

    • Make sure to include cross-links to other relevant issues, PRs and Zulip threads, for context.

  • The maintainers triage PRs and assign suitable reviewers using the GitHub review system.

  • One approval of a PR (by a maintainer) is enough for it to be merged.

  • Unless someone approves the PR with optional comments, the PR is immediately merged by the approving reviewer.

  • PRs are preferably merged via the “squash and merge” option, to keep a clean commit history on the main branch.

A typical PR workflow would be:

  • Create a new branch, make your changes, and stage them.

  • When you try to commit, the pre-commit hooks will be triggered.

  • Stage any changes made by the hooks, and commit.

  • You may also run the pre-commit hooks manually, at any time, with pre-commit run -a.

  • Make sure to write tests for any new features or bug fixes. See testing below.

  • Don’t forget to update the documentation, if necessary. See contributing documentation below.

  • Push your changes to your fork on GitHub(git push origin <branch-name>).

  • Open a draft pull request from your fork to the upstream movement repository, with a meaningful title and a thorough description of the changes.

    Note

    When creating the PR, ensure the base repository is neuroinformatics-unit/movement (the upstream) and the head repository is your fork. GitHub sometimes defaults to comparing against your own fork. Also make sure to tick the “Allow edits by maintainers” checkbox, so that maintainers can make small fixes directly to your branch.

  • If all checks (e.g. linting, type checking, testing) run successfully, you may mark the pull request as ready for review.

  • Respond to review comments and implement any requested changes.

  • One of the maintainers will approve the PR and add it to the merge queue.

  • Success 🎉 !! Your PR will be (squash-)merged into the main branch.

Development guidelines#

Formatting and pre-commit hooks#

Running pre-commit install will set up pre-commit hooks to ensure a consistent formatting style. Currently, these include:

  • ruff does a number of jobs, including code linting and auto-formatting.

  • mypy as a static type checker.

  • check-manifest to ensure that the right files are included in the pip package.

  • codespell to check for common misspellings.

These will prevent code from being committed if any of these hooks fail. To run all the hooks before committing:

pre-commit run  # for staged files
pre-commit run -a  # for all files in the repository

Some problems will be automatically fixed by the hooks. In this case, you should stage the auto-fixed changes and run the hooks again:

git add .
pre-commit run

If a problem cannot be auto-fixed, the corresponding tool will provide information on what the issue is and how to fix it. For example, ruff might output something like:

movement/io/load_poses.py:551:80: E501 Line too long (90 > 79)

This pinpoints the problem to a single code line and a specific ruff rule violation. Sometimes you may have good reasons to ignore a particular rule for a specific line of code. You can do this by adding an inline comment, e.g. # noqa: E501. Replace E501 with the code of the rule you want to ignore.

Docstrings#

We adhere to the numpydoc style. All public functions, classes, and methods must include docstrings, as these enable automatic generation of the API reference.

To document module‑level variables or class attributes, place a string literal immediately after the definition—a convention recognised by sphinx-autodoc; see also PEP 257:

class MyClass:
    x: int = 42
    """Description of x."""

Testing#

We use pytest for testing, aiming for ~100% test coverage where feasible. All new features should be accompanied by tests.

Tests are stored in the tests directory, structured as follows:

  • test_unit/: Contains unit tests that closely follow the movement package structure.

  • test_integration/: Includes tests for interactions between different modules.

  • fixtures/: Holds reusable test data fixtures, automatically imported via conftest.py. Check for existing fixtures before adding new ones, to avoid duplication.

For tests requiring experimental data, you can use sample data from our external data repository. These datasets are accessible through the pytest.DATA_PATHS dictionary, populated in conftest.py. Avoid including large data files directly in the GitHub repository.

Running benchmark tests#

Some tests are marked as benchmark because we use them along with pytest-benchmark to measure the performance of a section of the code. These tests are excluded from the default test run to keep CI and local test running fast. This applies to all ways of running pytest (via command line, IDE, tox or CI).

To run only the benchmark tests locally:

pytest -m benchmark

To run all tests, including those marked as benchmark:

pytest -m ""

Comparing benchmark runs across branches#

To compare performance between branches (e.g., main and a PR branch), we use pytest-benchmark’s save and compare functionality:

  1. Run benchmarks on the main branch and save the results:

    git checkout main
    pytest -m benchmark --benchmark-save=main
    

    By default the results are saved to .benchmarks/ (a directory ignored by git) as JSON files with the format <machine-identifier>/0001_main.json, where <machine-identifier> is a directory whose name relates to the machine specifications, 0001 is a counter for the benchmark run, and main corresponds to the string passed in the --benchmark-save option.

  2. Switch to your PR branch and run the benchmarks again:

    git checkout pr-branch
    pytest -m benchmark --benchmark-save=pr
    
  3. Show the results from both runs together:

    pytest-benchmark compare <path-to-main-result.json> <path-to-pr-result.json> --group-by=name
    

    Instead of providing the paths to the results, you can also provide the identifiers of the runs (e.g. 0001_main and 0002_pr), or use glob patterns to match the results (e.g. *main* and *pr*).

    You can sort the results by the name of the run using the --sort='name', or group them with the --group-by=<label> option (e.g. group-by=name to group by the name of the run, group-by=func to group by the name of the test function, or group-by=param to group by the parameters used to test the function). For further options, check the comparison CLI documentation.

We recommend reading the pytest-benchmark documentation for more information on the available CLI arguments. Some useful options are:

  • --benchmark-warmup=on: to enable warmup to prime caches and reduce variability between runs. This is recommended for tests involving I/O or external resources.

  • --benchmark-warmup-iterations=N: to set the number of warmup iterations.

  • --benchmark-compare: to run benchmarks and compare against the last saved run.

  • --benchmark-min-rounds=10: to run more rounds for stable results.

Note

High standard deviation in benchmark results often indicates bad isolation or non-deterministic behaviour (I/O, side-effects, garbage collection overhead). Before comparing past runs, it is advisable to make the benchmark runs as consistent as possible. See the pytest-benchmark guidance on comparing runs and the pytest-benchmark FAQ for troubleshooting tips.

Logging#

We use the loguru-based MovementLogger for logging. The logger is configured to write logs to a rotating log file at the DEBUG level and to sys.stderr at the WARNING level.

To import the logger:

from movement.utils.logging import logger

Once the logger is imported, you can log messages with the appropriate severity levels using the same syntax as loguru (e.g. logger.debug("Debug message"), logger.warning("Warning message")).

Logging and raising exceptions#

Both logger.error() and logger.exception() can be used to log Errors and Exceptions, with the difference that the latter will include the traceback in the log message. As these methods will return the logged Exception, you can log and raise the Exception in a single line:

raise logger.error(ValueError("message"))
raise logger.exception(ValueError("message")) # with traceback

When to use print, warnings.warn, logger.warning and logger.info#

We aim to adhere to the When to use logging guide to ensure consistency in our logging practices. In general:

  • Use print() for simple, non-critical messages that do not need to be logged.

  • Use warnings.warn() for user input issues that are non-critical and can be addressed within movement, e.g. deprecated function calls that are redirected, invalid fps number in ValidPosesInputs that is implicitly set to None; or when processing data containing excessive NaNs, which the user can potentially address using appropriate methods, e.g. interpolate_over_time()

  • Use logger.info() for informational messages about expected behaviours that do not indicate problems, e.g. where default values are assigned to optional parameters.

Implementing new loaders#

Implementing a new loader to support additional file formats in movement involves the following steps:

  1. Create validator classes for the file format (recommended).

  2. Implement the loader function.

  3. Update the SourceSoftware type alias.

Create file validators#

movement enforces separation of concerns by decoupling file validation from data loading, so that loaders can focus solely on reading and parsing data, while validation logic is encapsulated in dedicated file validator classes. Besides allowing users to get early feedback on file issues, this also makes it easier to reuse validation logic across different loaders that may support the same file format.

All file validators are attrs-based classes and live in movement.validators.files. They define the rules an input file must satisfy before it can be loaded, and they conform to the ValidFile protocol. At minimum, this requires defining:

  • suffixes: The expected file extensions for the format.

  • file: The path to the file or an NWBFile object, depending on the loader.

Additional attributes can also be defined to store pre-parsed information that the loader may need later.

Using a hypothetical format “MySoftware” that produces CSV files containing the columns scorer, bodyparts, and coords, we illustrate the full pattern file validators follow:

  • Declare expected file suffixes.

  • Normalise the input file and apply reusable validators.

  • Implement custom, format-specific validation.

@define
class ValidMySoftwareCSV:
    """Validator for MySoftware .csv output files."""
    suffixes: ClassVar[set[str]] = {".csv"}
    file: Path = field(
        converter=Path,
        validator=_file_validator(permission="r", suffixes=suffixes),
    )
    col_names: list[str] = field(init=False, factory=list)

    @file.validator
    def _file_contains_expected_header(self, attribute, value):
        """Ensure that the .csv file contains the expected header row.
        """
        expected_cols = ["scorer", "bodyparts", "coords"]
        with open(value) as f:
            col_names = f.readline().split(",")[:3]
            if col_names != expected_cols:
                raise logger.error(
                    ValueError(
                        ".csv header row does not match the known format for "
                        "MySoftware output files."
                    )
                )
            self.col_names = col_names
Declare expected file suffixes#

The suffixes class variable restricts the validator to only accept files with the specified extensions. If a suffix check is not required, this can be set to an empty set (set()). In the ValidMySoftwareCSV example, only files with a .csv extension are accepted.

Normalise input file and apply reusable validators#

An attrs converter is typically used to normalise input files into Path objects, along with one or more validators to ensure the file meets the expected criteria.

In addition to the built-in attrs validators, movement provides several reusable file-specific validators (as callables) in movement.validators.files:

  • _file_validator: A composite validator that ensures file is a Path, is not a directory, is accessible with the required permission, and has one of the expected suffixes (if any).

  • _hdf5_validator: Checks that an HDF5 file contains the expected dataset(s).

  • _json_validator: Checks that a file contains valid JSON and optionally validates it against a JSON Schema. Schemas are defined as Python dicts in movement/validators/_json_schemas.py. Custom validation checks and an optional attribute name for storing the parsed data can also be provided.

  • _if_instance_of: Conditionally applies a validator only when file is an instance of a given class.

In the current example, the _file_validator is used to ensure that the input file is a readable CSV file.

Combining reusable validators

Reusable validators can be combined using either attrs.validators.and_() or by passing a list of validators to the validator parameter of field(). The file attribute in ValidDeepLabCutH5 combines both _file_validator and _hdf5_validator to ensure the input file is a readable HDF5 file containing the expected dataset df_with_missing:

@define
class ValidDeepLabCutH5:
    """Class for validating DeepLabCut-style .h5 files."""

    suffixes: ClassVar[set[str]] = {".h5"}
    file: Path = field(
        converter=Path,
        validator=validators.and_(
            _file_validator(permission="r", suffixes=suffixes),
            _hdf5_validator(datasets={"df_with_missing"}),
        ),
    )
Implement format-specific validation#

Most formats often require custom validation logic beyond basic file checks. In the current example, the _file_contains_expected_header method uses the file attribute’s validator method as a decorator (@file.validator) to check that the first line of the CSV file matches the expected header row for MySoftware output files.

See also

Implement loader function#

Once the file validator is defined, the next step is to implement the loader function that reads the validated file and constructs the movement dataset. Continuing from the hypothetical “MySoftware” example, the loader function from_mysoftware_file would look like this:

@register_loader(
    source_software="MySoftware",
    file_validators=ValidMySoftwareCSV,
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
    """Load data from MySoftware files."""
    # The decorator returns an instance of ValidMySoftwareCSV
    # which conforms to the ValidFile protocol
    # so we need to let the type checker know this
    valid_file = cast("ValidFile", file)
    file_path = valid_file.file  # Path
    # The _parse_* functions are pseudocode
    ds = load_poses.from_numpy(
        position_array= _parse_positions(file_path),
        confidence_array=_parse_confidences(file_path),
        individual_names=_parse_individual_names(file_path),
        keypoint_names=_parse_keypoint_names(file_path),
        fps=_parse_fps(file_path),
        source_software="MySoftware",
    )
    logger.info(f"Loaded poses from {file_path.name}")
    return ds

Loader functions live in movement.io.load_poses or movement.io.load_bboxes, depending on the data type (poses or bounding boxes).

A loader function must conform to the LoaderProtocol, which requires the loader to:

Decorate the loader with @register_loader#

The @register_loader() decorator associates a loader function with a source_software name so that users can load files from that software via the unified load_dataset() interface:

from movement.io import load_dataset
ds = load_dataset("path/to/mysoftware_output.csv", source_software="MySoftware")

which is equivalent to calling the loader function directly:

from movement.io.load_poses import from_mysoftware_file
ds = from_mysoftware_file("path/to/mysoftware_output.csv")

If a file_validators argument is supplied to the @register_loader() decorator, the decorator selects the appropriate validator—based on its declared suffixes—and uses it to normalise and validate the input file before invoking the loader. As a result, the loader receives the validated file object instead of the raw path or handle.

If no validator is provided, the loader is passed the raw file argument as-is.

Handling multiple file formats for the same software

Many software packages produce multiple file formats (e.g. DeepLabCut outputs both CSV and HDF5). In that case, we recommend one loader per source software, which internally dispatches to per-format parsing functions, to ensure a consistent entry point for each supported source software. If formats require very different validation logic, you may pass multiple validators to file_validators=[...]. The decorator will select the appropriate validator based on file suffix and the validator’s suffixes attribute.

@register_loader(
    source_software="MySoftware",
    file_validators=[ValidMySoftwareCSV, ValidMySoftwareH5],
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
    """Load data from MySoftware files (CSV or HDF5)."""
    ...
Construct the dataset#

After parsing the input file, the loader function should construct the movement dataset using:

These helper functions create the xarray.Dataset object from numpy arrays and metadata, ensuring that the dataset conforms to the movement dataset specification.

Update SourceSoftware type alias#

The SourceSoftware type alias is defined in movement.io.load as a Literal containing all supported source software names. When adding a new loader, update this type alias to include the new software name to maintain type safety across the codebase:

SourceSoftware: TypeAlias = Literal[
    "DeepLabCut",
    "SLEAP",
    ...,
    "MySoftware",  # Newly added software
]

Developing the napari plugin#

The movement plugin for napari is built following the napari plugin guide. All widgets subclass qtpy.QtWidgets.QWidget (see the napari guide on widgets).

The plugin lives in movement.napari and is structured as follows:

  • movement.napari.meta_widget: the top-level container widget registered as the napari plugin entry point, which brings together all other subwidgets:

    • movement.napari.loader_widgets: a Qt form widget for loading tracked datasets from supported file formats as points, tracks and boxes.

    • movement.napari.regions_widget: a Qt table widget for managing named regions of interest drawn as napari shapes layers. See the next section for more details on this widget’s architecture.

  • movement.napari.layer_styles: dataclasses that encapsulate visual properties for each layer type.

  • movement.napari.convert: functions for converting movement datasets into the NumPy arrays and properties DataFrames that napari layer constructors expect.

Qt Model/View architecture#

movement.napari.regions_widget follows Qt’s Model/View pattern to separate the data (what is stored) from the display (how it is shown). Understanding this pattern is helpful before making changes to this module, and for creating new widgets that follow the same design principles.

The three components are:

  • RegionsTableModel (subclasses QAbstractTableModel): wraps a napari shapes layer and exposes its data to Qt (i.e., region names from layer.properties["name"] and shape types). It listens to napari layer events and emits Qt signals when the data changes.

  • RegionsTableView (subclasses QTableView): renders the model’s data as a table and handles user interactions (e.g. row selection, inline name editing). Keeps table row selection in sync with napari’s current shape selection.

  • RegionsWidget: connects the table model and table view. It manages layer selection, creates and links models to views, and handles layer lifecycle events.

Data flows in both directions:

napari shapes layer <-> RegionsTableModel <-> RegionsTableView <-> User
Preventing circular updates in bidirectional syncs

Bidirectional syncing between napari and the Qt table can cause circular updates. For example, selecting a napari shape selects the corresponding table row, which would then re-trigger shape selection.

Guard flags such as _syncing_row_selection and _syncing_layer_selection break this cycle: while a sync is in progress, the corresponding flag is set to True and any events that would re-trigger it are ignored. Preserve this pattern when adding new two-way sync logic.

Continuous integration#

All pushes and pull requests will be built by GitHub actions. This will usually include linting, testing and deployment.

A GitHub actions workflow (.github/workflows/test_and_deploy.yml) has been set up to run (on each push/PR):

  • Linting checks (pre-commit).

  • Testing (only if linting checks pass)

  • Release to PyPI (only if a git tag is present and if tests pass).

Versioning and releases#

We use semantic versioning, which includes MAJOR.MINOR.PATCH version numbers:

  • PATCH = small bugfix

  • MINOR = new feature

  • MAJOR = breaking change

We use setuptools_scm to automatically version movement. It has been pre-configured in the pyproject.toml file. setuptools_scm will automatically infer the version using git. To manually set a new semantic version, create a tag and make sure the tag is pushed to GitHub. Make sure you commit any changes you wish to be included in this version. E.g. to bump the version to 1.0.0:

git add .
git commit -m "Add new changes"
git tag -a v1.0.0 -m "Bump to version 1.0.0"
git push --follow-tags

Alternatively, you can also use the GitHub web interface to create a new release and tag.

The addition of a GitHub tag triggers the package’s deployment to PyPI. The version number is automatically determined from the latest tag on the main branch.

Contributing documentation#

The documentation is hosted via GitHub pages at movement.neuroinformatics.dev. Its source files are located in the docs folder of this repository. They are written in either Markdown or reStructuredText. The index.md file corresponds to the homepage of the documentation website. Other .md or .rst files are linked to the homepage via the toctree directive.

We use Sphinx and the PyData Sphinx Theme to build the source files into HTML output. This is handled by a GitHub actions workflow (.github/workflows/docs_build_and_deploy.yml). The build job runs on each PR, ensuring that the documentation build is not broken by new changes. The deployment job runs on tag pushes (for PyPI releases) or manual triggers on the main branch. This keeps the documentation aligned with releases, while allowing manual redeployment when necessary.

Editing the documentation#

To edit the documentation, ensure you have already set up a development environment.

To build the documentation locally, install the extra dependencies by running the following command from the repository root:

pip install -e ".[docs]"      # conda env
uv pip install -e ".[docs]"   # uv env

Now create a new branch, edit the documentation source files (.md or .rst in the docs folder), and commit your changes. Submit your documentation changes via a pull request, following the same guidelines as for code changes. Make sure that the header levels in your .md or .rst files are incremented consistently (H1 > H2 > H3, etc.) without skipping any levels.

Adding new pages#

If you create a new documentation source file (e.g. my_new_file.md or my_new_file.rst), you will need to add it to the toctree directive in index.md for it to be included in the documentation website:

:maxdepth: 2
:hidden:

existing_file
my_new_file

Linking to external URLs#

If you are adding references to an external URL (e.g. https://github.com/neuroinformatics-unit/movement/issues/1) in a .md file, you will need to check if a matching URL scheme (e.g. https://github.com/neuroinformatics-unit/movement/) is defined in myst_url_schemes in docs/source/conf.py. If it is, the following [](scheme:loc) syntax will be converted to the full URL during the build process:

[link text](movement-github:issues/1)

If it is not yet defined and you have multiple external URLs pointing to the same base URL, you will need to add the URL scheme to myst_url_schemes in docs/source/conf.py.

Updating the API reference#

The API reference is auto-generated by the docs/make_api.py script, and the sphinx-autodoc and sphinx-autosummary extensions. The script inspects the source tree and generates the docs/source/api_index.rst file, which lists the modules to be included in the API reference, skipping those listed in EXCLUDE_MODULES.

For each package module listed in PACKAGE_MODULES—a module that re-exports selected classes and functions from its submodules via __init__.py (e.g. movement.kinematics)—the script also generates a .rst file in docs/source/api/ with autosummary entries for the top-level objects exposed by the module.

The Sphinx extensions then generate the API reference pages for each module listed in api_index.rst, based on their docstrings. See Docstrings for the docstring formatting conventions.

If your PR introduces new modules that should not be documented in the API reference, or if there are changes to existing modules that necessitate their removal from the documentation, make sure to update EXCLUDE_MODULES in docs/make_api.py accordingly.

Likewise, if you want to document a module that exposes its public API via its __init__.py, rather than through its submodules individually, make sure to add it to PACKAGE_MODULES in docs/make_api.py.

Updating the examples#

We use sphinx-gallery to create the examples. To add new examples, you will need to create a new .py file in examples/, or in examples/advanced/ if your example targets experienced users. The file should be structured as specified in the relevant sphinx-gallery documentation.

We are using sphinx-gallery’s integration with binder, to provide interactive versions of the examples. This is configured in docs/source/conf.py under the sphinx_gallery_conf variable, and further customised for our repository by the .binder/postBuild script. If your examples rely on packages that are not among movement’s dependencies, you will need to add them to the .binder/requirements.txt file.

Cross-referencing Python objects#

Note

Docstrings in the .py files for the API reference and the examples are converted into .rst files, so these should use reStructuredText syntax.

Internal references#

For referencing movement objects in .md files, use the {role}`target` syntax with the appropriate Python object role.

For example, to reference the movement.io.load_poses module, use:

{mod}`movement.io.load_poses`

For referencing movement objects in .rst files, use the :role:`target` syntax with the appropriate Python object role.

For example, to reference the movement.io.load_poses module, use:

:mod:`movement.io.load_poses`

External references#

For referencing external Python objects using intersphinx, ensure the mapping between module names and their documentation URLs is defined in intersphinx_mapping in docs/source/conf.py. Once the module is included in the mapping, use the same syntax as for internal references.

For example, to reference the xarray.Dataset.update() method, use:

{meth}`xarray.Dataset.update`

For example, to reference the xarray.Dataset.update() method, use:

:meth:`xarray.Dataset.update`

Updating the contributors list#

The contributors list is automatically updated on the first day of each month by a GitHub actions workflow (.github/workflows/update_contributors_list.yml). It uses the Contributors-Readme-Action to generate the list of contributors based on the commits to the repository.

It is also possible to manually add other contributors who have not contributed code to the repository, but have contributed in other ways (e.g. by providing sample data, or by actively participating in discussions). The way to add them differs depending on whether they are GitHub users or not.

To add a contributor who has a GitHub account, locate the section marked with MANUAL: OTHER GITHUB CONTRIBUTORS in docs/source/community/people.md.

Next, add their GitHub username (e.g. newcontributor) to the <!-- readme: -start --> and <!-- readme: -end --> lines as follows:

<!-- readme: githubUser1,githubUser2,newcontributor -start -->
existing content...
<!-- readme: githubUser1,githubUser2,newcontributor -end -->

The aforementioned GitHub actions workflow will then automatically update the contributors list with newcontributor’s GitHub profile picture, name, and link to their GitHub profile.

To add a contributor who does not have a GitHub account, locate the section marked with MANUAL: OTHER NON-GITHUB CONTRIBUTORS in docs/source/community/people.md.

Next, add a row containing the contributor’s image, name, and link to their website to the existing list-table as follows:

*   - existing content...
*   - [![newcontributor](https://newcontributor.image.jpg) <br /> <sub><b>New Contributor</b></sub>](https://newcontributor.website.com)

Building the documentation locally#

We recommend that you build and view the documentation website locally, before you push your proposed changes.

First, ensure your development environment with the required dependencies is active (see Editing the documentation for details on how to create it). Then, navigate to the docs/ directory:

cd docs

All subsequent commands should be run from this directory.

Note

Windows PowerShell users should prepend make commands with .\ (e.g. .\make html).

To build the documentation, run:

make html

The local build can be viewed by opening docs/build/html/index.html in a browser.

To re-build the documentation after making changes, we recommend removing existing build files first. The following command will remove all generated files in docs/, including the auto-generated files source/api_index.rst and source/snippets/admonitions.md, as well as all files in build/, source/api/, and source/examples/. It will then re-build the documentation:

make clean html

To check that external links are correctly resolved, run:

make linkcheck

If the linkcheck step incorrectly marks links with valid anchors as broken, you can skip checking the anchors in specific links by adding the URLs to linkcheck_anchors_ignore_for_url in docs/source/conf.py, e.g.:

# The linkcheck builder will skip verifying that anchors exist when checking
# these URLs
linkcheck_anchors_ignore_for_url = [
    "https://gin.g-node.org/G-Node/Info/wiki/",
    "https://neuroinformatics.zulipchat.com/",
]

Tip

The make commands can be combined to run multiple tasks sequentially. For example, to re-build the documentation and check the links, run:

make clean html linkcheck

Previewing the documentation in continuous integration#

We use artifact.ci to preview the documentation that is built as part of our GitHub Actions workflow. To do so:

  1. Go to the “Checks” tab in the GitHub PR.

  2. Click on the “Docs” section on the left.

  3. If the “Build Sphinx Docs” action is successful, a summary section will appear under the block diagram with a link to preview the built documentation.

  4. Click on the link and wait for the files to be uploaded (it may take a while the first time). You may be asked to sign in to GitHub.

  5. Once the upload is complete, look for docs/build/html/index.html under the “Detected Entrypoints” section.

Sample data#

We maintain some sample datasets to be used for testing, examples and tutorials on an external data repository. Our hosting platform of choice is called GIN and is maintained by the German Neuroinformatics Node. GIN has a GitHub-like interface and git-like CLI functionalities.

Currently, the data repository contains sample pose estimation data files stored in the poses folder, and tracked bounding boxes data files under the bboxes folder. For some of these files, we also host the associated video file (in the videos folder) and/or a single video frame (in the frames) folder. These can be used to develop and test visualisations, e.g. to overlay the data on video frames. The metadata.yaml file holds metadata for each sample dataset, including information on data provenance as well as the mapping between data files and related video/frame files.

For most sample datasets, the tracking data lives in a single file under poses or bboxes. However, some tools—like TRex—may split their tracking outputs across multiple files. In those cases, the dataset is distributed as a ZIP archive containing every relevant file, and is automatically extracted when fetched.

Fetching data#

To fetch the data from GIN, we use the pooch Python package, which can download data from pre-specified URLs and store them locally for all subsequent uses. It also provides some nice utilities, like verification of sha256 hashes and decompression of archives.

The relevant functionality is implemented in the movement.sample_data module. The most important parts of this module are:

  1. The SAMPLE_DATA download manager object.

  2. The list_datasets() function, which returns a list of the available poses and bounding boxes datasets (file names of the data files).

  3. The fetch_dataset_paths() function, which returns a dictionary containing local paths to the files associated with a particular sample dataset: poses or bboxes, frame, video. If the relevant files are not already cached locally, they will be downloaded.

  4. The fetch_dataset() function, which downloads the files associated with a given sample dataset (same as fetch_dataset_paths()) and additionally loads the pose or bounding box data into movement, returning an xarray.Dataset object. If available, the local paths to the associated video and frame files are stored as dataset attributes, with names video_path and frame_path, respectively.

By default, the downloaded files are stored in the ~/.movement/data folder. This can be changed by setting the DATA_DIR variable in the sample_data.py file.

Adding new data#

Only core movement developers may add new files to the external data repository. Make sure to run the following procedure on a UNIX-like system, as we have observed some weird behaviour on Windows (some sha256sums may end up being different). To add a new file, you will need to:

  1. Create a GIN account.

  2. Request collaborator access to the movement data repository if you don’t already have it.

  3. Install and configure the GIN CLI by running gin login in a terminal with your GIN credentials.

  4. Clone the movement data repository to your local machine using gin get neuroinformatics/movement-test-data, then run gin download --content to download all the files.

  5. Add your new files to the appropriate folders (poses, bboxes, videos, and/or frames) following the existing file naming conventions.

  6. Add metadata for your new files to metadata.yaml using the example entry below as a template. You can leave all sha256sum values as null for now.

  7. Update file hashes in metadata.yaml by running python update_hashes.py from the root of the movement data repository. This script computes SHA256 hashes for all data files and updates the corresponding sha256sum values in the metadata file. Make sure you’re in a Python environment with movement installed.

  8. Commit your changes using gin commit -m <message> <filename> for specific files or gin commit -m <message> . for all changes.

  9. Upload your committed changes to the GIN repository with gin upload. Use gin download to pull the latest changes or gin sync to synchronise changes bidirectionally.

  10. Verify the new files can be fetched and loaded correctly using the movement.sample_data module.

metadata.yaml example entry#

SLEAP_three-mice_Aeon_proofread.analysis.h5:
  sha256sum: null
  source_software: SLEAP
  type: poses
  fps: 50
  species: mouse
  number_of_individuals: 3
  shared_by:
    name: Chang Huan Lo
    affiliation: Sainsbury Wellcome Centre, UCL
  frame:
    file_name: three-mice_Aeon_frame-5sec.png
    sha256sum: null
  video:
    file_name: three-mice_Aeon_video.avi
    sha256sum: null
  note: All labels were proofread (user-defined) and can be considered ground truth.
    It was exported from the .slp file with the same prefix.

Verifying sample data#

To verify that a sample dataset can be fetched and loaded correctly:

from movement import sample_data

# Fetch and load the dataset
ds = sample_data.fetch_dataset("SLEAP_three-mice_Aeon_proofread.analysis.h5")

# Verify it loaded correctly
print(ds)

This displays the dataset’s structure (dimensions, coordinates, data variables, and attributes), confirming the data was loaded successfully.

If the sample dataset also includes a video, pass with_video=True to verify that the video is correctly linked to the dataset:

ds = sample_data.fetch_dataset(
    "SLEAP_three-mice_Aeon_proofread.analysis.h5",
    with_video=True,
)
print(ds.video_path)