(target-poses-and-bboxes-dataset)= # The movement datasets In `movement`, poses or bounding box tracks are represented as an {class}`xarray.Dataset` object. An {class}`xarray.Dataset` object is a container for multiple arrays. Each array is an {class}`xarray.DataArray` object holding different aspects of the collected data (position, time, confidence scores...). You can think of a {class}`xarray.DataArray` object as a multi-dimensional {class}`numpy.ndarray` with pandas-style indexing and labelling. So a `movement` dataset is simply an {class}`xarray.Dataset` with a specific structure to represent pose tracks or bounding box tracks. Because pose data and bounding box data are somewhat different, `movement` provides two types of datasets: `poses` datasets and `bboxes` datasets. To discuss the specifics of both types of `movement` datasets, it is useful to clarify some concepts such as **data variables**, **dimensions**, **coordinates** and **attributes**. In the next section, we will describe these concepts and the `movement` datasets' structure in some detail. To learn more about `xarray` data structures in general, see the relevant [documentation](xarray:user-guide/data-structures.html). ## Dataset structure ```{figure} ../_static/dataset_structure.png :alt: movement dataset structure An {class}`xarray.Dataset` is a collection of several data arrays that share some dimensions. The schematic shows the data arrays that make up the `poses` and `bboxes` datasets in `movement`. ``` The structure of a `movement` dataset `ds` can be easily inspected by simply printing it. ::::{tab-set} :::{tab-item} Poses dataset To inspect a sample poses dataset, we can run: ```python from movement import sample_data ds = sample_data.fetch_dataset( "SLEAP_three-mice_Aeon_proofread.analysis.h5", ) print(ds) ``` and we would obtain an output such as: ``` Size: 27kB Dimensions: (time: 601, space: 2, keypoints: 1, individuals: 3) Coordinates: * time (time) float64 5kB 0.0 0.02 0.04 0.06 ... 11.96 11.98 12.0 * space (space) Size: 19kB Dimensions: (time: 5, space: 2, individuals: 86) Coordinates: * time (time) int64 40B 0 1 2 3 4 * space (space) ` is an {class}`xarray.DataArray` object, with the same **dimensions** as the original `position` **data variable**, so adding it to the existing `ds` makes sense and works seamlessly. We can also update existing **data variables** in-place, using {meth}`xarray.Dataset.update`. For example, if we wanted to update the `position` and `velocity` arrays in our dataset, we could do: ```python ds.update({"position": position_filtered, "velocity": velocity_filtered}) ``` Custom **attributes** can be added to the dataset with: ```python ds.attrs["my_custom_attribute"] = "my_custom_value" # we can now access this value using dot notation on the dataset ds.my_custom_attribute ``` (target-attrs-data-type-warning)= :::{warning} Keep in mind that only certain attribute data types are compatible with the [netCDF format](https://docs.unidata.ucar.edu/nug/current/). If you plan to [save your dataset to a netCDF file](target-netcdf), make sure to only use attributes that are scalars, strings, or 1D arrays. Complex data types and arbitrary Python objects will likely lead to errors when saving. The error message will include the term "illegal data type for attribute..". :::