Sample scene of the Drunkard’s Dataset. The dataset provides various levels of scene deformation. Top row: Sample frames from scene 0 over all difficulty levels 0 − 3. Bottom row: External views showing the ground truth camera trajectory in green and the camera frame in purple. With increasing deformation level the camera motion is more abrupt.
Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data.
In the link above you can download The Drunkard's Dataset.The root folder contains two similar versions of the dataset but with different image resolutions, 1024x1024 and 320x320 pixels. Both versions have the same folder structure as follows:
The Drunkard's Dataset
Dataset resolutions
Scenes
Difficulty levels
Color
Depth
Optical flow
Normal
Pose
For every of the 19 scenes there are 4 levels of deformation difficulty and inside each of them you can find color and depth images, optical flow and normal maps and the camera trajectory.
- Color: RGB uint8 .png images.
- Depth: uint16 .png grayscale images whose pixel values must be multiplied by (2 ** 16 - 1) * 30 to obtain metric scale in meters.
- Optical flow: .npy image numpy arrays that are .npz compressed. They have two channels: horizontal and vertical pixel translation to go from current frame to the next one.
- Normal: .npy image numpy arrays that are .npz compressed. There are three channels: x, y and z to represent the normal vector to the surface where the pixel falls.
- Camera trajectory pose: .txt file containing at each line a different SE(3) world-to-camera transformation for every frame. Format: timestamps, translation (tx, ty, tz), quaternions (qx, qy, qz, qw).
The source code of the Drunkard's Odometry is available to use at GitHub in the link above. There you will find a detailed explanation to execute training and evaluations scripts.
Paper and Bibtex
NeurIPS paper available here.
@article{recasens2024drunkard,
title={The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes},
author={Recasens Lafuente, David and Oswald, Martin R and Pollefeys, Marc and Civera, Javier},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
Funding
This work was supported by the EU Comission (EU-H2020 EndoMapper GA 863146), the Spanish Government
(PID2021-127685NB-I00 and TED2021-131150BI00), the Aragon Government (DGA-T45 17R/FSE), and a
research grant from FIFA.
License
The code, dataset and additional resources of this work are released under MIT License.
There are some parts of the code modified from other repositories subject also to their own license.
Check the GitHub repository for further details.