Contributing#
Before you start#
Before starting work on a contribution, please check the issue tracker to see if there’s already an issue describing what you have in mind.
If there is, add a comment to let others know you’re willing to work on it.
If there isn’t, please create a new issue to describe your idea.
We strongly encourage discussing your plans before you start coding—either in the issue itself or on our Zulip chat. This helps avoid duplicated effort and ensures your work aligns with the project’s scope and roadmap.
Keep in mind that we use issues liberally to track development. Some may be vague or aspirational, serving as reminders for future work rather than tasks ready to be tackled. There are a few reasons an issue might not be actionable yet:
It depends on other issues being resolved first.
It hasn’t been clearly scoped. In such cases, helping to clarify the scope or breaking the issue into smaller parts can be a valuable contribution. Maintainers typically lead this process, but you’re welcome to participate in the discussion.
It doesn’t currently fit into the roadmap or the maintainers’ priorities, meaning we may be unable to commit to timely guidance and prompt code reviews.
If you’re unsure whether an issue is ready to work on, just ask!
Some issues are labelled as good first issue.
These are especially suitable if you’re new to the project, and we recommend starting there.
Contribution workflow#
If you want to contribute to movement and don’t have permission to make changes directly, you can create your own copy of the project, make updates, and then suggest those updates for inclusion in the main project. This process is often called a “fork and pull request” workflow.
When you create your own copy (or “fork”) of a project, it’s like making a new workspace that shares code with the original project. Once you’ve made your changes in your copy, you can submit them as a pull request, which is a way to propose changes back to the main project.
If you are not familiar with git, we recommend reading up on this guide.
Forking the repository#
Fork the repository on GitHub. You can read more about forking in the GitHub docs.
Clone your fork to your local machine and navigate to the repository folder:
git clone [https://github.com/](https://github.com/)<your-github-username>/movement.git cd movement
Set the upstream remote to the base
movementrepository: This links your local copy to the original project so you can pull the latest changes.git remote add upstream [https://github.com/neuroinformatics-unit/movement.git](https://github.com/neuroinformatics-unit/movement.git)
Note
Your repository now has two remotes:
origin(your fork, where you push changes) andupstream(the main repository, where you pull updates from)
Creating a development environment#
Now that you have the repository locally, you need to set up a Python environment and install the project dependencies.
Create an environment using conda or uv and install
movementin editable mode, including development dependencies.First, create and activate a
condaenvironment:conda create -n movement-dev -c conda-forge python=3.13 conda activate movement-dev
Then, install the package in editable mode with development dependencies:
pip install -e ".[dev]"
First, create and activate a virtual environment:
uv venv --python=3.13 source .venv/bin/activate # On macOS and Linux .venv\Scripts\activate # On Windows PowerShell
Then, install the package in editable mode with development dependencies:
uv pip install -e ".[dev]"
If you also want to edit the documentation and preview the changes locally, you will additionally need the
docsextra dependencies. See Editing the documentation for more details.Finally, initialise the pre-commit hooks:
pre-commit install
Pull requests#
In all cases, please submit code to the main repository via a pull request (PR). We recommend, and adhere, to the following conventions:
Please submit draft PRs as early as possible to allow for discussion.
The PR title should be descriptive e.g. “Add new function to do X” or “Fix bug in Y”.
The PR description should be used to provide context and motivation for the changes.
If the PR is solving an issue, please add the issue number to the PR description, e.g. “Fixes #123” or “Closes #123”.
Make sure to include cross-links to other relevant issues, PRs and Zulip threads, for context.
The maintainers triage PRs and assign suitable reviewers using the GitHub review system.
One approval of a PR (by a maintainer) is enough for it to be merged.
Unless someone approves the PR with optional comments, the PR is immediately merged by the approving reviewer.
PRs are preferably merged via the “squash and merge” option, to keep a clean commit history on the main branch.
A typical PR workflow would be:
Create a new branch, make your changes, and stage them.
When you try to commit, the pre-commit hooks will be triggered.
Stage any changes made by the hooks, and commit.
You may also run the pre-commit hooks manually, at any time, with
pre-commit run -a.Make sure to write tests for any new features or bug fixes. See testing below.
Don’t forget to update the documentation, if necessary. See contributing documentation below.
Push your changes to your fork on GitHub(
git push origin <branch-name>).Open a draft pull request from your fork to the upstream
movementrepository, with a meaningful title and a thorough description of the changes.Note
When creating the PR, ensure the base repository is
neuroinformatics-unit/movement(theupstream) and the head repository is your fork. GitHub sometimes defaults to comparing against your own fork. Also make sure to tick the “Allow edits by maintainers” checkbox, so that maintainers can make small fixes directly to your branch.If all checks (e.g. linting, type checking, testing) run successfully, you may mark the pull request as ready for review.
Respond to review comments and implement any requested changes.
One of the maintainers will approve the PR and add it to the merge queue.
Success 🎉 !! Your PR will be (squash-)merged into the main branch.
Development guidelines#
Formatting and pre-commit hooks#
Running pre-commit install will set up pre-commit hooks to ensure a consistent formatting style. Currently, these include:
ruff does a number of jobs, including code linting and auto-formatting.
mypy as a static type checker.
check-manifest to ensure that the right files are included in the pip package.
codespell to check for common misspellings.
These will prevent code from being committed if any of these hooks fail. To run all the hooks before committing:
pre-commit run # for staged files
pre-commit run -a # for all files in the repository
Some problems will be automatically fixed by the hooks. In this case, you should stage the auto-fixed changes and run the hooks again:
git add .
pre-commit run
If a problem cannot be auto-fixed, the corresponding tool will provide
information on what the issue is and how to fix it. For example, ruff might
output something like:
movement/io/load_poses.py:551:80: E501 Line too long (90 > 79)
This pinpoints the problem to a single code line and a specific ruff rule violation.
Sometimes you may have good reasons to ignore a particular rule for a specific line of code. You can do this by adding an inline comment, e.g. # noqa: E501. Replace E501 with the code of the rule you want to ignore.
Docstrings#
We adhere to the numpydoc style. All public functions, classes, and methods must include docstrings, as these enable automatic generation of the API reference.
To document module‑level variables or class attributes, place a string literal immediately after the definition—a convention recognised by sphinx-autodoc; see also PEP 257:
class MyClass:
x: int = 42
"""Description of x."""
Testing#
We use pytest for testing, aiming for ~100% test coverage where feasible. All new features should be accompanied by tests.
Tests are stored in the tests directory, structured as follows:
test_unit/: Contains unit tests that closely follow themovementpackage structure.test_integration/: Includes tests for interactions between different modules.fixtures/: Holds reusable test data fixtures, automatically imported viaconftest.py. Check for existing fixtures before adding new ones, to avoid duplication.
For tests requiring experimental data, you can use sample data from our external data repository.
These datasets are accessible through the pytest.DATA_PATHS dictionary, populated in conftest.py.
Avoid including large data files directly in the GitHub repository.
Running benchmark tests#
Some tests are marked as benchmark because we use them along with pytest-benchmark to measure the performance of a section of the code. These tests are excluded from the default test run to keep CI and local test running fast.
This applies to all ways of running pytest (via command line, IDE, tox or CI).
To run only the benchmark tests locally:
pytest -m benchmark
To run all tests, including those marked as benchmark:
pytest -m ""
Comparing benchmark runs across branches#
To compare performance between branches (e.g., main and a PR branch), we use pytest-benchmark’s save and compare functionality:
Run benchmarks on the
mainbranch and save the results:git checkout main pytest -m benchmark --benchmark-save=main
By default the results are saved to
.benchmarks/(a directory ignored by git) as JSON files with the format<machine-identifier>/0001_main.json, where<machine-identifier>is a directory whose name relates to the machine specifications,0001is a counter for the benchmark run, andmaincorresponds to the string passed in the--benchmark-saveoption.Switch to your PR branch and run the benchmarks again:
git checkout pr-branch pytest -m benchmark --benchmark-save=pr
Show the results from both runs together:
pytest-benchmark compare <path-to-main-result.json> <path-to-pr-result.json> --group-by=name
Instead of providing the paths to the results, you can also provide the identifiers of the runs (e.g.
0001_mainand0002_pr), or use glob patterns to match the results (e.g.*main*and*pr*).You can sort the results by the name of the run using the
--sort='name', or group them with the--group-by=<label>option (e.g.group-by=nameto group by the name of the run,group-by=functo group by the name of the test function, orgroup-by=paramto group by the parameters used to test the function). For further options, check the comparison CLI documentation.
We recommend reading the pytest-benchmark documentation for more information on the available CLI arguments. Some useful options are:
--benchmark-warmup=on: to enable warmup to prime caches and reduce variability between runs. This is recommended for tests involving I/O or external resources.--benchmark-warmup-iterations=N: to set the number of warmup iterations.--benchmark-compare: to run benchmarks and compare against the last saved run.--benchmark-min-rounds=10: to run more rounds for stable results.
Note
High standard deviation in benchmark results often indicates bad isolation or non-deterministic behaviour (I/O, side-effects, garbage collection overhead). Before comparing past runs, it is advisable to make the benchmark runs as consistent as possible. See the pytest-benchmark guidance on comparing runs and the pytest-benchmark FAQ for troubleshooting tips.
Logging#
We use the loguru-based MovementLogger for logging.
The logger is configured to write logs to a rotating log file at the DEBUG level and to sys.stderr at the WARNING level.
To import the logger:
from movement.utils.logging import logger
Once the logger is imported, you can log messages with the appropriate severity levels using the same syntax as loguru (e.g. logger.debug("Debug message"), logger.warning("Warning message")).
Logging and raising exceptions#
Both logger.error() and logger.exception() can be used to log Errors and Exceptions, with the difference that the latter will include the traceback in the log message.
As these methods will return the logged Exception, you can log and raise the Exception in a single line:
raise logger.error(ValueError("message"))
raise logger.exception(ValueError("message")) # with traceback
When to use print, warnings.warn, logger.warning and logger.info#
We aim to adhere to the When to use logging guide to ensure consistency in our logging practices. In general:
Use
print()for simple, non-critical messages that do not need to be logged.Use
warnings.warn()for user input issues that are non-critical and can be addressed withinmovement, e.g. deprecated function calls that are redirected, invalidfpsnumber inValidPosesInputsthat is implicitly set toNone; or when processing data containing excessive NaNs, which the user can potentially address using appropriate methods, e.g.interpolate_over_time()Use
logger.info()for informational messages about expected behaviours that do not indicate problems, e.g. where default values are assigned to optional parameters.
Implementing new loaders#
Implementing a new loader to support additional file formats in movement involves the following steps:
Create validator classes for the file format (recommended).
Implement the loader function.
Update the
SourceSoftwaretype alias.
Create file validators#
movement enforces separation of concerns by decoupling file validation from data loading, so that loaders can focus solely on reading and parsing data, while validation logic is encapsulated in dedicated file validator classes.
Besides allowing users to get early feedback on file issues, this also makes it easier to reuse validation logic across different loaders that may support the same file format.
All file validators are attrs-based classes and live in movement.validators.files.
They define the rules an input file must satisfy before it can be loaded, and they conform to the ValidFile protocol.
At minimum, this requires defining:
suffixes: The expected file extensions for the format.file: The path to the file or anNWBFileobject, depending on the loader.
Additional attributes can also be defined to store pre-parsed information that the loader may need later.
Using a hypothetical format “MySoftware” that produces CSV files containing the columns scorer, bodyparts, and coords, we illustrate the full pattern file validators follow:
Declare expected file suffixes.
Normalise the input file and apply reusable validators.
Implement custom, format-specific validation.
@define
class ValidMySoftwareCSV:
"""Validator for MySoftware .csv output files."""
suffixes: ClassVar[set[str]] = {".csv"}
file: Path = field(
converter=Path,
validator=_file_validator(permission="r", suffixes=suffixes),
)
col_names: list[str] = field(init=False, factory=list)
@file.validator
def _file_contains_expected_header(self, attribute, value):
"""Ensure that the .csv file contains the expected header row.
"""
expected_cols = ["scorer", "bodyparts", "coords"]
with open(value) as f:
col_names = f.readline().split(",")[:3]
if col_names != expected_cols:
raise logger.error(
ValueError(
".csv header row does not match the known format for "
"MySoftware output files."
)
)
self.col_names = col_names
Declare expected file suffixes#
The suffixes class variable restricts the validator to only accept files with the specified extensions.
If a suffix check is not required, this can be set to an empty set (set()).
In the ValidMySoftwareCSV example, only files with a .csv extension are accepted.
Normalise input file and apply reusable validators#
An attrs converter is typically used to normalise input files into Path objects, along with one or more validators to ensure the file meets the expected criteria.
In addition to the built-in attrs validators, movement provides several reusable file-specific validators (as callables) in movement.validators.files:
_file_validator: A composite validator that ensuresfileis aPath, is not a directory, is accessible with the required permission, and has one of the expectedsuffixes(if any)._hdf5_validator: Checks that an HDF5filecontains the expected dataset(s)._json_validator: Checks that afilecontains valid JSON and optionally validates it against a JSON Schema. Schemas are defined as Python dicts inmovement/validators/_json_schemas.py. Custom validation checks and an optional attribute name for storing the parsed data can also be provided._if_instance_of: Conditionally applies a validator only whenfileis an instance of a given class.
In the current example, the _file_validator is used to ensure that the input file is a readable CSV file.
Combining reusable validators
Reusable validators can be combined using either attrs.validators.and_() or by passing a list of validators to the validator parameter of field().
The file attribute in ValidDeepLabCutH5 combines both _file_validator and _hdf5_validator to ensure the input file is a readable HDF5 file containing the expected dataset df_with_missing:
@define
class ValidDeepLabCutH5:
"""Class for validating DeepLabCut-style .h5 files."""
suffixes: ClassVar[set[str]] = {".h5"}
file: Path = field(
converter=Path,
validator=validators.and_(
_file_validator(permission="r", suffixes=suffixes),
_hdf5_validator(datasets={"df_with_missing"}),
),
)
Implement format-specific validation#
Most formats often require custom validation logic beyond basic file checks.
In the current example, the _file_contains_expected_header method uses the file attribute’s validator method as a decorator (@file.validator) to check that the first line of the CSV file matches the expected header row for MySoftware output files.
See also
attrs by Example: Overview of writing
attrsclasses.attrs Validators: Details on writing custom validators for attributes.
Implement loader function#
Once the file validator is defined, the next step is to implement the loader function that reads the validated file and constructs the movement dataset.
Continuing from the hypothetical “MySoftware” example, the loader function from_mysoftware_file would look like this:
@register_loader(
source_software="MySoftware",
file_validators=ValidMySoftwareCSV,
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
"""Load data from MySoftware files."""
# The decorator returns an instance of ValidMySoftwareCSV
# which conforms to the ValidFile protocol
# so we need to let the type checker know this
valid_file = cast("ValidFile", file)
file_path = valid_file.file # Path
# The _parse_* functions are pseudocode
ds = load_poses.from_numpy(
position_array= _parse_positions(file_path),
confidence_array=_parse_confidences(file_path),
individual_names=_parse_individual_names(file_path),
keypoint_names=_parse_keypoint_names(file_path),
fps=_parse_fps(file_path),
source_software="MySoftware",
)
logger.info(f"Loaded poses from {file_path.name}")
return ds
Loader functions live in movement.io.load_poses or movement.io.load_bboxes, depending on the data type (poses or bounding boxes).
A loader function must conform to the LoaderProtocol, which requires the loader to:
Accept
fileas its first parameter, which may be:Return an
xarray.Datasetobject containing the movement dataset.
Decorate the loader with @register_loader#
The @register_loader() decorator associates a loader function with a source_software name so that users can load files from that software via the unified load_dataset() interface:
from movement.io import load_dataset
ds = load_dataset("path/to/mysoftware_output.csv", source_software="MySoftware")
which is equivalent to calling the loader function directly:
from movement.io.load_poses import from_mysoftware_file
ds = from_mysoftware_file("path/to/mysoftware_output.csv")
If a file_validators argument is supplied to the @register_loader() decorator, the decorator selects the appropriate validator—based on its declared suffixes—and uses it to normalise and validate the input file before invoking the loader.
As a result, the loader receives the validated file object instead of the raw path or handle.
If no validator is provided, the loader is passed the raw file argument as-is.
Handling multiple file formats for the same software
Many software packages produce multiple file formats (e.g. DeepLabCut outputs both CSV and HDF5).
In that case, we recommend one loader per source software, which internally dispatches to per-format parsing functions, to ensure a consistent entry point for each supported source software.
If formats require very different validation logic, you may pass multiple validators to file_validators=[...].
The decorator will select the appropriate validator based on file suffix and the validator’s suffixes attribute.
@register_loader(
source_software="MySoftware",
file_validators=[ValidMySoftwareCSV, ValidMySoftwareH5],
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
"""Load data from MySoftware files (CSV or HDF5)."""
...
Construct the dataset#
After parsing the input file, the loader function should construct the movement dataset using:
movement.io.load_poses.from_numpy()for pose tracks.movement.io.load_bboxes.from_numpy()for bounding box tracks.
These helper functions create the xarray.Dataset object from numpy arrays and metadata, ensuring that the dataset conforms to the movement dataset specification.
Update SourceSoftware type alias#
The SourceSoftware type alias is defined in movement.io.load as a Literal containing all supported source software names.
When adding a new loader, update this type alias to include the new software name to maintain type safety across the codebase:
SourceSoftware: TypeAlias = Literal[
"DeepLabCut",
"SLEAP",
...,
"MySoftware", # Newly added software
]
Developing the napari plugin#
The movement plugin for napari is built following the
napari plugin guide.
All widgets subclass qtpy.QtWidgets.QWidget (see the
napari guide on widgets).
The plugin lives in movement.napari
and is structured as follows:
movement.napari.meta_widget: the top-level container widget registered as thenapariplugin entry point, which brings together all other subwidgets:movement.napari.loader_widgets: a Qt form widget for loading tracked datasets from supported file formats as points, tracks and boxes.movement.napari.regions_widget: a Qt table widget for managing named regions of interest drawn asnaparishapes layers. See the next section for more details on this widget’s architecture.
movement.napari.layer_styles: dataclasses that encapsulate visual properties for each layer type.movement.napari.convert: functions for convertingmovementdatasets into the NumPy arrays and properties DataFrames thatnaparilayer constructors expect.
Qt Model/View architecture#
movement.napari.regions_widget follows
Qt’s Model/View pattern
to separate the data (what is stored) from the display (how it is shown).
Understanding this pattern is helpful before making changes to this module,
and for creating new widgets that follow the same design principles.
The three components are:
RegionsTableModel(subclassesQAbstractTableModel): wraps anaparishapes layer and exposes its data to Qt (i.e., region names fromlayer.properties["name"]and shape types). It listens tonaparilayer events and emits Qt signals when the data changes.RegionsTableView(subclassesQTableView): renders the model’s data as a table and handles user interactions (e.g. row selection, inline name editing). Keeps table row selection in sync withnapari’s current shape selection.RegionsWidget: connects the table model and table view. It manages layer selection, creates and links models to views, and handles layer lifecycle events.
Data flows in both directions:
napari shapes layer <-> RegionsTableModel <-> RegionsTableView <-> User
Preventing circular updates in bidirectional syncs
Bidirectional syncing between napari and the Qt table can cause circular
updates. For example, selecting a napari shape selects the corresponding
table row, which would then re-trigger shape selection.
Guard flags such as _syncing_row_selection and _syncing_layer_selection
break this cycle: while a sync is in progress, the corresponding flag is set
to True and any events that would re-trigger it are ignored. Preserve this
pattern when adding new two-way sync logic.
Continuous integration#
All pushes and pull requests will be built by GitHub actions. This will usually include linting, testing and deployment.
A GitHub actions workflow (.github/workflows/test_and_deploy.yml) has been set up to run (on each push/PR):
Linting checks (pre-commit).
Testing (only if linting checks pass)
Release to PyPI (only if a git tag is present and if tests pass).
Versioning and releases#
We use semantic versioning, which includes MAJOR.MINOR.PATCH version numbers:
PATCH = small bugfix
MINOR = new feature
MAJOR = breaking change
We use setuptools_scm to automatically version movement.
It has been pre-configured in the pyproject.toml file.
setuptools_scm will automatically infer the version using git.
To manually set a new semantic version, create a tag and make sure the tag is pushed to GitHub.
Make sure you commit any changes you wish to be included in this version. E.g. to bump the version to 1.0.0:
git add .
git commit -m "Add new changes"
git tag -a v1.0.0 -m "Bump to version 1.0.0"
git push --follow-tags
Alternatively, you can also use the GitHub web interface to create a new release and tag.
The addition of a GitHub tag triggers the package’s deployment to PyPI. The version number is automatically determined from the latest tag on the main branch.
Contributing documentation#
The documentation is hosted via GitHub pages at
movement.neuroinformatics.dev.
Its source files are located in the docs folder of this repository.
They are written in either Markdown
or reStructuredText.
The index.md file corresponds to the homepage of the documentation website.
Other .md or .rst files are linked to the homepage via the toctree directive.
We use Sphinx and the PyData Sphinx Theme
to build the source files into HTML output.
This is handled by a GitHub actions workflow (.github/workflows/docs_build_and_deploy.yml).
The build job runs on each PR, ensuring that the documentation build is not broken by new changes.
The deployment job runs on tag pushes (for PyPI releases) or manual triggers on the main branch.
This keeps the documentation aligned with releases, while allowing manual redeployment when necessary.
Editing the documentation#
To edit the documentation, ensure you have already set up a development environment.
To build the documentation locally, install the extra dependencies by running the following command from the repository root:
pip install -e ".[docs]" # conda env
uv pip install -e ".[docs]" # uv env
Now create a new branch, edit the documentation source files (.md or .rst in the docs folder),
and commit your changes. Submit your documentation changes via a pull request,
following the same guidelines as for code changes.
Make sure that the header levels in your .md or .rst files are incremented
consistently (H1 > H2 > H3, etc.) without skipping any levels.
Adding new pages#
If you create a new documentation source file (e.g. my_new_file.md or my_new_file.rst),
you will need to add it to the toctree directive in index.md
for it to be included in the documentation website:
:maxdepth: 2
:hidden:
existing_file
my_new_file
Linking to external URLs#
If you are adding references to an external URL (e.g. https://github.com/neuroinformatics-unit/movement/issues/1) in a .md file, you will need to check if a matching URL scheme (e.g. https://github.com/neuroinformatics-unit/movement/) is defined in myst_url_schemes in docs/source/conf.py. If it is, the following [](scheme:loc) syntax will be converted to the full URL during the build process:
[link text](movement-github:issues/1)
If it is not yet defined and you have multiple external URLs pointing to the same base URL, you will need to add the URL scheme to myst_url_schemes in docs/source/conf.py.
Updating the API reference#
The API reference is auto-generated by the docs/make_api.py script, and the sphinx-autodoc and sphinx-autosummary extensions.
The script inspects the source tree and generates the docs/source/api_index.rst file, which lists the modules to be included in the API reference, skipping those listed in EXCLUDE_MODULES.
For each package module listed in PACKAGE_MODULES—a module that re-exports selected classes and functions from its submodules via __init__.py (e.g. movement.kinematics)—the script also generates a .rst file in docs/source/api/ with autosummary entries for the top-level objects exposed by the module.
The Sphinx extensions then generate the API reference pages for each module listed in api_index.rst, based on their docstrings.
See Docstrings for the docstring formatting conventions.
If your PR introduces new modules that should not be documented in the API reference, or if there are changes to existing modules that necessitate their removal from the documentation, make sure to update EXCLUDE_MODULES in docs/make_api.py accordingly.
Likewise, if you want to document a module that exposes its public API via its __init__.py, rather than through its submodules individually, make sure to add it to PACKAGE_MODULES in docs/make_api.py.
Updating the examples#
We use sphinx-gallery
to create the examples.
To add new examples, you will need to create a new .py file in examples/,
or in examples/advanced/ if your example targets experienced users.
The file should be structured as specified in the relevant
sphinx-gallery documentation.
We are using sphinx-gallery’s integration with binder, to provide interactive versions of the examples.
This is configured in docs/source/conf.py under the sphinx_gallery_conf variable,
and further customised for our repository by the .binder/postBuild script.
If your examples rely on packages that are not among movement’s dependencies,
you will need to add them to the .binder/requirements.txt file.
Cross-referencing Python objects#
Note
Docstrings in the .py files for the API reference and the examples are converted into .rst files, so these should use reStructuredText syntax.
Internal references#
For referencing movement objects in .md files, use the {role}`target` syntax with the appropriate Python object role.
For example, to reference the movement.io.load_poses module, use:
{mod}`movement.io.load_poses`
For referencing movement objects in .rst files, use the :role:`target` syntax with the appropriate Python object role.
For example, to reference the movement.io.load_poses module, use:
:mod:`movement.io.load_poses`
External references#
For referencing external Python objects using intersphinx,
ensure the mapping between module names and their documentation URLs is defined in intersphinx_mapping in docs/source/conf.py.
Once the module is included in the mapping, use the same syntax as for internal references.
For example, to reference the xarray.Dataset.update() method, use:
{meth}`xarray.Dataset.update`
For example, to reference the xarray.Dataset.update() method, use:
:meth:`xarray.Dataset.update`
Updating the contributors list#
The contributors list is automatically updated on the first day of each month by a GitHub actions workflow (.github/workflows/update_contributors_list.yml).
It uses the Contributors-Readme-Action to generate the list of contributors based on the commits to the repository.
It is also possible to manually add other contributors who have not contributed code to the repository, but have contributed in other ways (e.g. by providing sample data, or by actively participating in discussions). The way to add them differs depending on whether they are GitHub users or not.
To add a contributor who has a GitHub account, locate the section marked with MANUAL: OTHER GITHUB CONTRIBUTORS in docs/source/community/people.md.
Next, add their GitHub username (e.g. newcontributor) to the <!-- readme: -start --> and <!-- readme: -end --> lines as follows:
<!-- readme: githubUser1,githubUser2,newcontributor -start -->
existing content...
<!-- readme: githubUser1,githubUser2,newcontributor -end -->
The aforementioned GitHub actions workflow will then automatically update the contributors list with newcontributor’s GitHub profile picture, name, and link to their GitHub profile.
To add a contributor who does not have a GitHub account, locate the section marked with MANUAL: OTHER NON-GITHUB CONTRIBUTORS in docs/source/community/people.md.
Next, add a row containing the contributor’s image, name, and link to their website to the existing list-table as follows:
* - existing content...
* - [ <br /> <sub><b>New Contributor</b></sub>](https://newcontributor.website.com)
Building the documentation locally#
We recommend that you build and view the documentation website locally, before you push your proposed changes.
First, ensure your development environment with the required dependencies is active (see Editing the documentation for details on how to create it). Then, navigate to the docs/ directory:
cd docs
All subsequent commands should be run from this directory.
Note
Windows PowerShell users should prepend make commands with .\ (e.g. .\make html).
To build the documentation, run:
make html
The local build can be viewed by opening docs/build/html/index.html in a browser.
To re-build the documentation after making changes,
we recommend removing existing build files first.
The following command will remove all generated files in docs/,
including the auto-generated files source/api_index.rst and
source/snippets/admonitions.md, as well as all files in
build/, source/api/, and source/examples/.
It will then re-build the documentation:
make clean html
To check that external links are correctly resolved, run:
make linkcheck
If the linkcheck step incorrectly marks links with valid anchors as broken, you can skip checking the anchors in specific links by adding the URLs to linkcheck_anchors_ignore_for_url in docs/source/conf.py, e.g.:
# The linkcheck builder will skip verifying that anchors exist when checking
# these URLs
linkcheck_anchors_ignore_for_url = [
"https://gin.g-node.org/G-Node/Info/wiki/",
"https://neuroinformatics.zulipchat.com/",
]
Tip
The make commands can be combined to run multiple tasks sequentially.
For example, to re-build the documentation and check the links, run:
make clean html linkcheck
Previewing the documentation in continuous integration#
We use artifact.ci to preview the documentation that is built as part of our GitHub Actions workflow. To do so:
Go to the “Checks” tab in the GitHub PR.
Click on the “Docs” section on the left.
If the “Build Sphinx Docs” action is successful, a summary section will appear under the block diagram with a link to preview the built documentation.
Click on the link and wait for the files to be uploaded (it may take a while the first time). You may be asked to sign in to GitHub.
Once the upload is complete, look for
docs/build/html/index.htmlunder the “Detected Entrypoints” section.
Sample data#
We maintain some sample datasets to be used for testing, examples and tutorials on an external data repository. Our hosting platform of choice is called GIN and is maintained by the German Neuroinformatics Node. GIN has a GitHub-like interface and git-like CLI functionalities.
Currently, the data repository contains sample pose estimation data files
stored in the poses folder, and tracked bounding boxes data files under the bboxes folder. For some of these files, we also host
the associated video file (in the videos folder) and/or a single
video frame (in the frames) folder. These can be used to develop and
test visualisations, e.g. to overlay the data on video frames.
The metadata.yaml file holds metadata for each sample dataset,
including information on data provenance as well as the mapping between data files and related
video/frame files.
For most sample datasets, the tracking data lives in a single file under poses or bboxes.
However, some tools—like TRex—may split their tracking outputs across multiple files.
In those cases, the dataset is distributed as a ZIP archive containing every relevant file, and is automatically extracted when fetched.
Fetching data#
To fetch the data from GIN, we use the pooch Python package, which can download data from pre-specified URLs and store them locally for all subsequent uses. It also provides some nice utilities, like verification of sha256 hashes and decompression of archives.
The relevant functionality is implemented in the movement.sample_data module.
The most important parts of this module are:
The
SAMPLE_DATAdownload manager object.The
list_datasets()function, which returns a list of the available poses and bounding boxes datasets (file names of the data files).The
fetch_dataset_paths()function, which returns a dictionary containing local paths to the files associated with a particular sample dataset:posesorbboxes,frame,video. If the relevant files are not already cached locally, they will be downloaded.The
fetch_dataset()function, which downloads the files associated with a given sample dataset (same asfetch_dataset_paths()) and additionally loads the pose or bounding box data intomovement, returning anxarray.Datasetobject. If available, the local paths to the associated video and frame files are stored as dataset attributes, with namesvideo_pathandframe_path, respectively.
By default, the downloaded files are stored in the ~/.movement/data folder.
This can be changed by setting the DATA_DIR variable in the sample_data.py file.
Adding new data#
Only core movement developers may add new files to the external data repository.
Make sure to run the following procedure on a UNIX-like system, as we have observed some weird behaviour on Windows (some sha256sums may end up being different).
To add a new file, you will need to:
Create a GIN account.
Request collaborator access to the movement data repository if you don’t already have it.
Install and configure the GIN CLI by running
gin loginin a terminal with your GIN credentials.Clone the
movementdata repository to your local machine usinggin get neuroinformatics/movement-test-data, then rungin download --contentto download all the files.Add your new files to the appropriate folders (
poses,bboxes,videos, and/orframes) following the existing file naming conventions.Add metadata for your new files to
metadata.yamlusing the example entry below as a template. You can leave allsha256sumvalues asnullfor now.Update file hashes in
metadata.yamlby runningpython update_hashes.pyfrom the root of the movement data repository. This script computes SHA256 hashes for all data files and updates the correspondingsha256sumvalues in the metadata file. Make sure you’re in a Python environment with movement installed.Commit your changes using
gin commit -m <message> <filename>for specific files orgin commit -m <message> .for all changes.Upload your committed changes to the GIN repository with
gin upload. Usegin downloadto pull the latest changes orgin syncto synchronise changes bidirectionally.Verify the new files can be fetched and loaded correctly using the
movement.sample_datamodule.
metadata.yaml example entry#
SLEAP_three-mice_Aeon_proofread.analysis.h5:
sha256sum: null
source_software: SLEAP
type: poses
fps: 50
species: mouse
number_of_individuals: 3
shared_by:
name: Chang Huan Lo
affiliation: Sainsbury Wellcome Centre, UCL
frame:
file_name: three-mice_Aeon_frame-5sec.png
sha256sum: null
video:
file_name: three-mice_Aeon_video.avi
sha256sum: null
note: All labels were proofread (user-defined) and can be considered ground truth.
It was exported from the .slp file with the same prefix.
Verifying sample data#
To verify that a sample dataset can be fetched and loaded correctly:
from movement import sample_data
# Fetch and load the dataset
ds = sample_data.fetch_dataset("SLEAP_three-mice_Aeon_proofread.analysis.h5")
# Verify it loaded correctly
print(ds)
This displays the dataset’s structure (dimensions, coordinates, data variables, and attributes), confirming the data was loaded successfully.
If the sample dataset also includes a video, pass with_video=True to
verify that the video is correctly linked to the dataset:
ds = sample_data.fetch_dataset(
"SLEAP_three-mice_Aeon_proofread.analysis.h5",
with_video=True,
)
print(ds.video_path)