The Drunkard’s Odometry: Estimating Camera Motion in Deforming Scenes

[NeurIPS 2023]

David Recasens1
Martin R. Oswald2,3
Marc Pollefeys2,4
Javier Civera1
1University of Zaragoza
2ETH Zurich
3University of Amsterdam
4Microsoft
[The Drunkard's Dataset]
[The Drunkard's Odometry]
[Paper]

Fallback Image

Sample scene of the Drunkard’s Dataset. The dataset provides various levels of scene deformation. Top row: Sample frames from scene 0 over all difficulty levels 0 − 3. Bottom row: External views showing the ground truth camera trajectory in green and the camera frame in purple. With increasing deformation level the camera motion is more abrupt.


Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data.


The Drunkard's Dataset


[Download Dataset]

In the link above you can download The Drunkard's Dataset.The root folder contains two similar versions of the dataset but with different image resolutions, 1024x1024 and 320x320 pixels. Both versions have the same folder structure as follows:


For every of the 19 scenes there are 4 levels of deformation difficulty and inside each of them you can find color and depth images, optical flow and normal maps and the camera trajectory.

- Color: RGB uint8 .png images.

- Depth: uint16 .png grayscale images whose pixel values must be multiplied by (2 ** 16 - 1) * 30 to obtain metric scale in meters.

- Optical flow: .npy image numpy arrays that are .npz compressed. They have two channels: horizontal and vertical pixel translation to go from current frame to the next one.

- Normal: .npy image numpy arrays that are .npz compressed. There are three channels: x, y and z to represent the normal vector to the surface where the pixel falls.

- Camera trajectory pose: .txt file containing at each line a different SE(3) world-to-camera transformation for every frame. Format: timestamps, translation (tx, ty, tz), quaternions (qx, qy, qz, qw).

Check the Drunkard's Odometry dataloader for further coding technical details to work with the data.


The Drunkard's Odometry


[GitHub]

The source code of the Drunkard's Odometry is available to use at GitHub in the link above. There you will find a detailed explanation to execute training and evaluations scripts.


Paper and Bibtex


NeurIPS paper available here.


@inproceedings{
recasens2023the,
title={The Drunkard{\textquoteright}s Odometry: Estimating Camera Motion in Deforming Scenes},
author={David Recasens and Martin R. Oswald and Marc Pollefeys and Javier Civera},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023},
url={https://openreview.net/forum?id=Kn6VRkYqYk}
}

    

Funding


This work was supported by the EU Comission (EU-H2020 EndoMapper GA 863146), the Spanish Government 
(PID2021-127685NB-I00 and TED2021-131150BI00), the Aragon Government (DGA-T45 17R/FSE), and a 
research grant from FIFA.


    

License


The code, dataset and additional resources of this work are released under MIT License. 
There are some parts of the code modified from other repositories subject also to their own license. 
Check the GitHub repository for further details.


    


Based on the Clara Fernandez's template.