I like to work on computer vision problems with various applications such as monocular depth
estimation, super-resolution, and remote sensing.
I collaborated with the ICRC to map
vulnerable populations in developing countries.
In the past, I interned and collaborated with Meta's Reality Labs Research.
I obtained my bachelor's and master's degree in Geomatics Engineering from ETH Zürich. During my
master's I specialized in deep learning, computer vision and remote sensing.
I will graduate in February 2026 and am actively seeking full-time opportunities in industry. If
you’re hiring or know of relevant opportunities, I’d love to connect.
Research
I'm interested in computer vision, deep learning, and their applications to remote sensing.
Most of my work is related to super-resolution, depth estimation, or both simultaneously.
Some papers are highlighted. * indicates equal contribution.
Elastic3D is a controllable, end-to-end method for monocular-to-stereo video conversion. Based
on latent diffusion with a novel guided VAE decoder, it ensures sharp and epipolar-consistent
output while allowing intuitive control over the stereo effect at inference time.
ML-Bokeh extends the SHARP codebase with physically-based rendering and smart autofocus for
cinematic depth-of-field effects. It features synthetic aperture simulation, artifact-free
spiral sampling, and an automated autofocus system based on subject detection.
We investigate whether medium-resolution Copernicus Sentinel-1 and Sentinel-2 imagery can
support rapid building damage assessment after disasters.
We introduce the xBD-S12 dataset and show that, despite 10 m resolution, building damage
can be mapped reliably across many events, making Copernicus data a practical complement to
limited very-high resolution imagery.
Marigold-DC is a zero-shot depth completion frame work. We repurpose Marigold as an
off-the-shelf monocular depth estimator and guide it with sparse depth observations.
We present a 42-year pan-Arctic land surface temperature dataset, downscaled from AVHRR GAC to
1 km resolution with a deep anisotropic diffusion super-resolution model trained on MODIS
LST and guided by high-resolution land cover, elevation, and vegetation height.
The resulting twice-daily, 1 km LST record enables improved permafrost and near-surface air
temperature modelling, Greenland Ice Sheet surface mass balance assessment, and climate
monitoring in the pre-MODIS era.
Marigold (TPAMI) generalizes the original CVPR'24 monocular depth estimator into a
diffusion-based foundation model for dense prediction,
supporting tasks such as depth, surface normals, and intrinsic image decomposition with only a
few diffusion steps and efficient fine-tuning.
Marigold is an affine-invariant monocular depth estimation method based on Stable Diffusion,
leveraging its rich prior knowledge for better generalization and achieving state-of-the-art
performance with significant improvements,
even with synthetic training data.
POPCORN is a lightweight population mapping method using free satellite images and minimal data,
surpassing existing accuracy and providing interpretable maps for mapping populations in
data-scarce regions.
We automate SLF's ground-based snow cover monitoring pipeline in the Dischma valley by combining
deep learning-based fog classification with pixel-wise snow segmentation for ground camera
imagery.
Our approach removes manual thresholds, generalizes across multiple cameras, and enables more
scalable, reliable alpine snow cover mapping to support avalanche research and satellite product
validation.
We introduce neural heat fields, a neural field formulation that inherently models a physically
exact point spread function, enabling analytically correct anti-aliasing at any super-resolution
scale without extra computation.
Building on this, Thera achieves aliasing-free arbitrary-scale single image super-resolution,
substantially outperforming previous methods while remaining parameter-efficient and supported
by strong theoretical guarantees.
We propose DADA, a novel approach to depth image super-resolution by combining guided
anisotropic diffusion with a deep convolutional network, enhancing both edge detail and
contextual reasoning.
This method achieves unprecedented results in three benchmarks, especially at larger scales like
x32
Information about the focal length with which a photo is taken might be obstructed (internet
photos) or not available (vintage photos).
Inferring the focal length of a photo solely from a monocular view is an ill-posed task that
requires knowledge about the scale of objects and their distance to the camera - e.g. scene
understanding.
I trained a deep learning model to acquire such scene understanding to predict the focal length
and open-source the model with this repository.
POMELO is a deep learning model that creates fine-grained population maps using coarse census
counts and open geodata,
achieving high accuracy in sub-Saharan Africa and effectively estimating population numbers even
without any census data.
We propose a method for forecasting the emergence and timing of new buildings using a deep
neural network with a custom pretraining procedure, validated on the SpaceNet7 dataset.
We propose using neural ordinary differential equations (NODEs) combined with RNNs to improve
crop classification from irregularly spaced satellite images,
showing enhanced accuracy over common methods, especially with few observations,
and better early-season forecasting due to the continuous representation of latent dynamics.
This work presents a method for automatically refining 3D city models generated from aerial
images by using a neural network trained with reference data and a loss function to improve
DSMs,
effectively preserving geometric structures while removing noise and artifacts.
News
20th of November 2025: I am giving a talk at ZurichCV.
October 2025: I am attending ICCV, presenting Marigold-DC.
April 2025: I started as a Student Researcher at Google in the Team of Federico Tombari.