Input/Output#
Overview#
Our goal with movement
is to enable pipelines that are input-agnostic,
meaning they are not tied to a specific motion tracking tool or data format.
Therefore, our input/output functions are designed to facilitate data flows
between various third-party formats and movement
’s own native
data structure based on xarray.
It may be useful to think of movement
supporting two types of data loading/saving:
Supported third-party formats.
movement
provides convenient functions for loading/saving data in formats written by popular motion tracking tools as well as established data specifications. You can think of these as “Import” and “Export/Save As” functions.Native saving and loading with netCDF.
movement
leverages xarray’s built-in netCDF support to save and load datasets while preserving all variables and metadata. This is the recommended way to save your analysis state, allowing yourmovement
-powered workflows to resume exactly where they left off.
You are also welcome to try movement
by loading some sample data included with the package.
Supported third-party formats#
movement
supports the analysis of trajectories of keypoints (pose tracks) and of bounding box centroids (bounding box tracks),
which are represented as movement datasets
and can be loaded from and saved to various third-party formats.
Source Software |
Abbreviation |
Source Format |
Dataset Type |
Supported Operations |
---|---|---|---|---|
DLC |
DLC-style .h5 or .csv file, or corresponding pandas DataFrame |
Pose |
Load & Save |
|
SLEAP |
analysis .h5 or .slp file |
Pose |
Load & Save |
|
LP |
DLC-style .csv file, or corresponding pandas DataFrame |
Pose |
Load & Save |
|
triangulation .csv file, or corresponding pandas DataFrame |
Pose |
Load |
||
VIA |
.csv file for tracks annotation |
Bounding box |
Load |
|
NWB |
.nwb file or NWBFile object with the ndx-pose extension |
Pose |
Load & Save |
|
Any |
Numpy arrays |
Pose or Bounding box |
Load & Save* |
*Exporting any movement
DataArray to a NumPy array is as simple as calling xarray’s built-in xarray.DataArray.to_numpy()
method, so no specialised “Export/Save As” function is needed, see xarray’s documentation for more details.
Note
Currently, movement
only works with tracked data: either keypoints or bounding boxes whose identities are known from one frame to the next, across consecutive frames. For pose estimation, this means it only supports the predictions output by the supported software packages listed above. Loading manually labelled data—often defined over a non-continuous set of frames—is not currently supported.
Below, we explain how to load pose and bounding box tracks from these supported formats, as well as how to save pose tracks back to some of them.
Loading pose tracks#
The pose tracks loading functionalities are provided by the
movement.io.load_poses
module, which can be imported as follows:
from movement.io import load_poses
To read a pose tracks file into a movement poses dataset, we provide specific functions for each of the supported formats. We additionally provide a more general from_numpy()
method, with which we can build a movement poses dataset from a set of NumPy arrays.
To load DeepLabCut files in .h5 format:
ds = load_poses.from_dlc_file("/path/to/file.h5", fps=30)
# or equivalently
ds = load_poses.from_file(
"/path/to/file.h5", source_software="DeepLabCut", fps=30
)
To load DeepLabCut files in .csv format:
ds = load_poses.from_dlc_file("/path/to/file.csv", fps=30)
You can also directly load any pandas DataFrame df
that’s
formatted in the DeepLabCut style:
ds = load_poses.from_dlc_style_df(df, fps=30)
To load SLEAP analysis files in .h5 format (recommended):
ds = load_poses.from_sleap_file("/path/to/file.analysis.h5", fps=30)
# or equivalently
ds = load_poses.from_file(
"/path/to/file.analysis.h5", source_software="SLEAP", fps=30
)
To load SLEAP files in .slp format (experimental, see notes in movement.io.load_poses.from_sleap_file()
):
ds = load_poses.from_sleap_file("/path/to/file.predictions.slp", fps=30)
To load LightningPose files in .csv format:
ds = load_poses.from_lp_file("/path/to/file.analysis.csv", fps=30)
# or equivalently
ds = load_poses.from_file(
"/path/to/file.analysis.csv", source_software="LightningPose", fps=30
)
Because LightningPose follows the DeepLabCut dataframe format, you can also
directly load an appropriately formatted pandas DataFrame df
:
ds = load_poses.from_dlc_style_df(df, fps=30, source_software="LightningPose")
To load Anipose files in .csv format:
ds = load_poses.from_anipose_file(
"/path/to/file.analysis.csv", fps=30, individual_name="individual_0"
) # Optionally specify the individual name; defaults to "individual_0"
# or equivalently
ds = load_poses.from_file(
"/path/to/file.analysis.csv",
source_software="Anipose",
fps=30,
individual_name="individual_0",
)
You can also directly load any pandas DataFrame df
that’s
formatted in the Anipose triangulation style:
ds = load_poses.from_anipose_style_df(
df, fps=30, individual_name="individual_0"
)
To load NWB files in .nwb format:
ds = load_poses.from_nwb_file(
"path/to/file.nwb",
# Optionally name of the ProcessingModule to load
processing_module_key="behavior",
# Optionally name of the PoseEstimation object to load
pose_estimation_key="PoseEstimation",
)
# or equivalently
ds = load_poses.from_file(
"path/to/file.nwb",
source_software="NWB",
processing_module_key="behavior",
pose_estimation_key="PoseEstimation",
)
The above functions also accept an NWBFile
object as input:
with pynwb.NWBHDF5IO("path/to/file.nwb", mode="r") as io:
nwb_file = io.read()
ds = load_poses.from_nwb_file(
nwb_file, pose_estimation_key="PoseEstimation"
)
In the example below, we create random position data for two individuals, Alice
and Bob
,
with three keypoints each: snout
, centre
, and tail_base
. These keypoints are tracked in 2D space for 100 frames, at 30 fps. The confidence scores are set to 1 for all points.
import numpy as np
rng = np.random.default_rng(seed=42)
ds = load_poses.from_numpy(
position_array=rng.random((100, 2, 3, 2)),
confidence_array=np.ones((100, 3, 2)),
individual_names=["Alice", "Bob"],
keypoint_names=["snout", "centre", "tail_base"],
fps=30,
)
The resulting poses data structure ds
will include the predicted trajectories for each individual and
keypoint, as well as the associated point-wise confidence values reported by
the pose estimation software.
For more information on the poses data structure, see the movement datasets page.
Loading bounding box tracks#
To load bounding box tracks into a movement bounding boxes dataset, we need the functions from the
movement.io.load_bboxes
module, which can be imported as follows:
from movement.io import load_bboxes
We currently support loading bounding box tracks in the VGG Image Annotator (VIA) format only. However, like in the poses datasets, we additionally provide a from_numpy()
method, with which we can build a movement bounding boxes dataset from a set of NumPy arrays.
To load a VIA tracks .csv file:
ds = load_bboxes.from_via_tracks_file("path/to/file.csv", fps=30)
# or equivalently
ds = load_bboxes.from_file(
"path/to/file.csv",
source_software="VIA-tracks",
fps=30,
)
Note that the x,y coordinates in the input VIA tracks .csv file represent the the top-left corner of each bounding box. Instead the corresponding movement
dataset ds
will hold in its position
array the centroid of each bounding box.
In the example below, we create random position data for two bounding boxes, id_0
and id_1
,
both with the same width (40 pixels) and height (30 pixels). These are tracked in 2D space for 100 frames, which will be numbered in the resulting dataset from 0 to 99. The confidence score for all bounding boxes is set to 0.5.
import numpy as np
rng = np.random.default_rng(seed=42)
ds = load_bboxes.from_numpy(
position_array=rng.random((100, 2, 2)),
shape_array=np.ones((100, 2, 2)) * [40, 30],
confidence_array=np.ones((100, 2)) * 0.5,
individual_names=["id_0", "id_1"]
)
The resulting data structure ds
will include the centroid trajectories for each tracked bounding box, the boxes’ widths and heights, and their associated confidence values if provided.
For more information on the bounding boxes data structure, see the movement datasets page.
Saving pose tracks#
To export movement poses datasets to any of the supported third-party formats,
we’ll need functions from the movement.io.save_poses
module:
from movement.io import save_poses
Depending on the desired format, use one of the following functions:
To save as a DeepLabCut file, in .h5 or .csv format:
save_poses.to_dlc_file(ds, "/path/to/file.h5") # preferred format
save_poses.to_dlc_file(ds, "/path/to/file.csv")
The movement.io.save_poses.to_dlc_file()
function also accepts
a split_individuals
boolean argument. If set to True
, the function will
save the data as separate single-animal DeepLabCut-style files.
To save as a SLEAP analysis file in .h5 format:
save_poses.to_sleap_analysis_file(ds, "/path/to/file.h5")
When saving to SLEAP-style files, only track_names
, node_names
, tracks
, track_occupancy
,
and point_scores
are saved. labels_path
will only be saved if the source
file of the dataset is a SLEAP .slp file. Otherwise, it will be an empty string.
Other attributes and data variables
(i.e., instance_scores
, tracking_scores
, edge_names
, edge_inds
, video_path
,
video_ind
, and provenance
) are not currently supported. To learn more about what
each attribute and data variable represents, see the
SLEAP documentation.
To save as a LightningPose file in .csv format:
save_poses.to_lp_file(ds, "/path/to/file.csv")
Because LightningPose follows the single-animal DeepLabCut .csv format, the above command is equivalent to:
save_poses.to_dlc_file(ds, "/path/to/file.csv", split_individuals=True)
To convert a movement
poses dataset to NWBFile
objects:
nwb_files = save_poses.to_nwb_file(ds)
To allow adding additional data to NWB files before saving, to_nwb_file
does not write to disk directly.
Instead, it returns a list of NWBFile
objects—one per individual in the dataset—since NWB files are designed to represent data from a single individual.
The to_nwb_file
function also accepts
a NWBFileSaveConfig
object as its config
argument
for customising metadata such as session or subject information in the resulting NWBFiles
(see the API reference
for examples).
These NWBFile
objects can then be saved to disk as .nwb files using pynwb.NWBHDF5IO
:
from pynwb import NWBHDF5IO
for file in nwb_files:
with NWBHDF5IO(f"{file.identifier}.nwb", "w") as io:
io.write(file)
Saving bounding box tracks#
We currently do not provide explicit methods to export a movement bounding boxes dataset in a specific format. However, you can save the bounding box tracks to a .csv file using the standard Python library csv
.
Here is an example of how you can save a bounding boxes dataset to a .csv file:
# define name for output csv file
filepath = "tracking_output.csv"
# open the csv file in write mode
with open(filepath, mode="w", newline="") as file:
writer = csv.writer(file)
# write the header
writer.writerow(["frame_idx", "bbox_ID", "x", "y", "width", "height", "confidence"])
# write the data
for individual in ds.individuals.data:
for frame in ds.time.data:
x, y = ds.position.sel(time=frame, individuals=individual).data
width, height = ds.shape.sel(time=frame, individuals=individual).data
confidence = ds.confidence.sel(time=frame, individuals=individual).data
writer.writerow([frame, individual, x, y, width, height, confidence])
Alternatively, we can convert the movement
bounding boxes dataset to a pandas DataFrame with the xarray.DataArray.to_dataframe()
method, wrangle the dataframe as required, and then apply the pandas.DataFrame.to_csv()
method to save the data as a .csv file.
Native saving and loading with netCDF#
Because movement
datasets are xarray.Dataset
objects, we can rely on
xarray’s built-in support for the netCDF file format.
netCDF is a binary file format for self-described datasets that originated in the geosciences,
and netCDF files on disk directly correspond to xarray.Dataset
objects.
Saving to netCDF is the recommended way to preserve the complete state of your analysis, including all variables, coordinates, and attributes.
To save any xarray dataset ds
to a netCDF file:
ds.to_netcdf("/path/to/my_data.nc")
To load the dataset back:
import xarray as xr
ds = xr.open_dataset("my_data.nc")
Similarly, an xarray.DataArray
object (e.g. the position
variable
of a movement
dataset) can be saved to disk using the
to_netcdf()
method, and loaded from disk using the
xarray.open_dataarray()
function.
As netCDF files correspond to Dataset objects,
these functions internally convert the DataArray to a Dataset before saving,
and then convert back when loading.
Note
xarray also supports compression and chunking options with netCDF, which can be useful for managing large datasets. For more details, see the xarray documentation on netCDF.
Below is an example of how you may integrate netCDF into you
movement
-powered workflows:
from movement.io import load_poses
from movement.filtering import rolling_filter
from movement.kinematics import compute_speed
ds = load_poses.from_file(
"path/to/my_data.h5", source_software="DeepLabCut", fps=30
)
# Apply a rolling median filter to smooth the position data
ds["position_smooth"] = rolling_filter(
ds["position"], window=5, statistic="median"
)
# Compute speed based on the smoothed position data
ds["speed"] = compute_speed(ds["position_smooth"])
# Save the dataset to a netCDF file
# This includes the original position and confidence data,
# the smoothed position, and the computed speed
ds.to_netcdf("my_data_processed.nc")
Sample data#
movement
includes some sample data files that you can use to
try the package out. These files contain pose and bounding box tracks from
various supported third-party formats.
You can list the available sample data files using:
from movement import sample_data
file_names = sample_data.list_datasets()
print(*file_names, sep='\n') # print each sample file in a separate line
Each sample file is prefixed with the name (or abbreviation) of the software package that was used to generate it.
To load one of the sample files as a
movement dataset, use the
fetch_dataset
function:
filename = "SLEAP_three-mice_Aeon_proofread.analysis.h5"
ds = sample_data.fetch_dataset(filename)
Some sample datasets also have an associated video file
(the video for which the data was predicted). You can request
to download the sample video by setting with_video=True
:
ds = sample_data.fetch_dataset(filename, with_video=True)
If available, the video file is downloaded and its path is stored
in the video_path
attribute of the dataset (i.e., ds.video_path
).
This attribute will not be set if no video is
available for this dataset, or if you did not request it.
Some datasets also include a sample frame file, which is a single
still frame extracted from the video. This can be useful for visualisation
(e.g., as a background image for plotting trajectories). If available,
this file is always downloaded when fetching the dataset,
and its path is stored in the frame_path
attribute
(i.e., ds.frame_path
). If no frame file is available for the dataset,
the frame_path
attribute will not be set.
Under the hood
When you import the sample_data
module with from movement import sample_data
,
movement
downloads a small metadata file to your local machine with information about the latest sample datasets available. Then, the first time you call the fetch_dataset()
function, movement
downloads the requested file to your machine and caches it in the ~/.movement/data
directory. On subsequent calls, the data are directly loaded from this local cache.