Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera

Abstract — 摘要

We present a method for simultaneous real-time 3D reconstruction and 6-DoF camera tracking using a single event camera. Event cameras are novel bio-inspired vision sensors that output pixel-level brightness changes asynchronously, offering advantages such as very high dynamic range (140 dB), microsecond-level temporal resolution, and very low power consumption. Unlike standard cameras, they do not suffer from motion blur and can operate in extreme lighting conditions. We propose a probabilistic framework that jointly estimates the camera motion and a semi-dense 3D map of the scene from a stream of events. Our method operates in real-time at over 300 Hz and demonstrates robust performance in scenarios where conventional cameras fail, such as high-speed motion and high dynamic range scenes.

本文提出一種使用單一事件相機同時進行即時三維重建與六自由度相機追蹤的方法。事件相機是一種新型仿生視覺感測器，以非同步方式輸出像素級亮度變化，具有極高動態範圍（140 dB）、微秒級時間解析度和極低功耗等優勢。與標準相機不同，它不受運動模糊影響，且可在極端光線條件下運作。我們提出一個從事件流中聯合估計相機運動和場景半稠密三維地圖的機率框架。方法以超過 300 Hz 的速率即時運作，在傳統相機失效的場景（如高速運動和高動態範圍場景）中展現穩健性能。

段落功能介紹事件相機的獨特優勢並提出聯合三維重建與追蹤方法。

邏輯角色以事件相機的物理優勢作為方法論創新的基礎。

論證技巧 / 潛在漏洞140 dB 動態範圍和 300 Hz 的數據令人印象深刻，但事件相機的可得性和成本是實際應用的限制因素。

1. Introduction — 緒論

Simultaneous localization and mapping (SLAM) is a fundamental problem in robotics and computer vision. Traditional frame-based SLAM systems are limited by the characteristics of conventional cameras: they suffer from motion blur at high speeds, have limited dynamic range, and consume significant power. Event cameras, such as the Dynamic Vision Sensor (DVS), represent a paradigm shift in visual sensing. Rather than capturing frames at a fixed rate, each pixel independently and asynchronously reports brightness changes as they occur. This results in a continuous stream of events with microsecond resolution, enabling perception at speeds unattainable by standard cameras. We present the first system that achieves real-time 3D reconstruction and full 6-DoF tracking using only events.

同步定位與建圖（SLAM）是機器人學和電腦視覺中的基礎問題。傳統基於幀的 SLAM 系統受限於傳統相機的特性：在高速下會產生運動模糊、動態範圍有限且功耗較高。事件相機，如動態視覺感測器（DVS），代表了視覺感知的範式轉移。每個像素獨立且非同步地報告亮度變化的發生，而非以固定速率擷取幀。這產生了一個具有微秒解析度的連續事件流，使得標準相機無法達到的速度感知成為可能。我們提出了第一個僅使用事件即可實現即時三維重建和完整六自由度追蹤的系統。

段落功能對比傳統相機的局限與事件相機的革命性優勢。

邏輯角色以「範式轉移」定位事件相機的技術意義，為「第一個」系統的宣稱鋪路。

論證技巧 / 潛在漏洞「第一個」的宣稱建立在嚴格定義的範疇（僅事件+即時+3D+6DoF）上，具有學術嚴謹性。

2. Event Camera Model — 事件相機模型

An event camera generates events asynchronously whenever the change in log intensity at a pixel exceeds a threshold C: an event e = (x, y, t, p) is triggered when |log I(x,y,t) - log I(x,y,t-dt)| >= C, where p is the polarity (sign of the change). This model has several key properties: (i) no redundant information is transmitted — only changing pixels generate events; (ii) the temporal resolution is in the order of microseconds, independent of scene illumination; (iii) the dynamic range exceeds 140 dB, compared to about 60 dB for standard cameras. These properties make event cameras ideal for robotics applications where fast, low-power, and robust perception is needed.

事件相機在像素的對數強度變化超過閾值 C 時非同步生成事件：當 |log I(x,y,t) - log I(x,y,t-dt)| >= C 時觸發事件 e = (x, y, t, p)，其中 p 是極性（變化的正負號）。此模型具有幾個關鍵特性：(i) 不傳輸冗餘資訊 — 只有變化的像素才會產生事件；(ii) 時間解析度在微秒量級，與場景照明無關；(iii) 動態範圍超過 140 dB，相比標準相機的約 60 dB。這些特性使事件相機非常適合需要快速、低功耗和穩健感知的機器人應用。

段落功能形式化定義事件相機的數學模型與物理特性。

邏輯角色為後續的演算法設計提供感測器模型的精確數學基礎。

論證技巧 / 潛在漏洞以量化對比（140 dB vs 60 dB）清楚展示了事件相機的性能優勢。

3. Method — 方法

Our approach uses a probabilistic generative model that relates the observed events to the camera motion and scene structure. We formulate the problem as maximum likelihood estimation, where we jointly optimize the camera trajectory (parameterized as a continuous-time spline) and a semi-dense 3D map. The map is represented as a collection of 3D points with associated normal vectors. Events are processed in small batches, and for each batch we perform interleaved optimization of the camera pose and the map. The key insight is that event generation can be predicted from the camera motion and scene geometry, enabling a photometric-style error formulation adapted to the event stream.

我們的方法使用機率生成模型，將觀測到的事件與相機運動和場景結構關聯起來。將問題公式化為最大似然估計，聯合最佳化相機軌跡（以連續時間樣條參數化）和半稠密三維地圖。地圖表示為一組帶有關聯法向量的三維點。事件以小批次處理，對每個批次執行相機姿態和地圖的交替最佳化。關鍵洞見在於事件生成可以從相機運動和場景幾何預測，使得適配事件流的光度式誤差公式化成為可能。

段落功能描述聯合最佳化框架的核心設計。

邏輯角色將事件相機的非傳統資料格式轉化為可求解的最佳化問題。

論證技巧 / 潛在漏洞交替最佳化是成熟的策略，但收斂性保證在非凸問題中需要更多分析。

4. Experiments — 實驗

We evaluate our system on both synthetic and real-world datasets. On synthetic sequences, we achieve sub-millimeter accuracy in 3D reconstruction and sub-degree accuracy in rotation estimation. On real-world sequences captured with a DVS128 event camera, the system runs at over 300 Hz on a single CPU core, demonstrating true real-time performance. We show qualitative results on challenging scenarios including rapid camera rotation and scenes with extreme lighting variations. In comparison with frame-based methods, our approach maintains accuracy in conditions where conventional cameras produce completely unusable images due to motion blur or over/under-exposure.

我們在合成和真實世界資料集上評估系統。在合成序列上，達到亞毫米級三維重建精度和亞度數級旋轉估計精度。在使用 DVS128 事件相機擷取的真實世界序列上，系統在單一 CPU 核心上以超過 300 Hz 運行，展現真正的即時性能。我們展示了包括快速相機旋轉和極端光線變化場景在內的挑戰性場景的定性結果。與基於幀的方法相比，我們的方法在傳統相機因運動模糊或過曝/欠曝而產生完全不可用影像的條件下仍能保持精度。

段落功能報告合成與真實資料的定量與定性結果。

邏輯角色以極端場景的成功表現驗證事件相機的獨特價值。

論證技巧 / 潛在漏洞選擇傳統相機「完全失敗」的場景進行對比，最大化了事件相機的優勢展示。

5. Conclusions — 結論

We have presented the first real-time system for simultaneous 3D reconstruction and 6-DoF tracking using only an event camera. Our probabilistic framework effectively exploits the unique properties of event cameras — high temporal resolution, high dynamic range, and low latency — to achieve performance that is impossible with conventional cameras. The system opens new possibilities for visual perception in extreme conditions that are relevant for fast-moving robots, autonomous vehicles, and other demanding applications.

我們提出了第一個僅使用事件相機進行即時同步三維重建和六自由度追蹤的系統。我們的機率框架有效利用了事件相機的獨特特性——高時間解析度、高動態範圍和低延遲——實現了傳統相機無法達到的性能。該系統為快速移動機器人、自動駕駛車輛和其他高要求應用中極端條件下的視覺感知開闢了新的可能性。

段落功能重申開創性貢獻並展望應用前景。

邏輯角色以應用場景（機器人、自駕車）提升論文的實際影響力。

論證技巧 / 潛在漏洞作為 Best Paper，其開創性在於將事件相機推向完整的 SLAM 應用，但事件相機的市場普及度仍是障礙。

論證結構總覽

問題
傳統相機在極端條件失效

➔

論點
事件相機+機率框架

➔

證據
300Hz即時+亞毫米精度

➔

反駁
傳統相機完全失效場景

➔

結論
開拓極端視覺感知

核心主張

利用事件相機的非同步、高時間解析度和高動態範圍特性，建構了第一個即時事件驅動的三維重建與六自由度追蹤系統。

最強論證

在傳統相機完全失效的極端場景（高速運動、極端光線）中仍能維持亞毫米精度，具有不可取代的應用價值。

最弱環節

事件相機硬體的成本和普及度限制了方法的廣泛應用，且在靜態或緩慢變化場景中事件相機反而缺乏資訊。