Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence

Abstract — 摘要

Camera pose estimation from 2D-to-3D feature correspondences is a fundamental problem in computer vision. Existing methods typically decompose this into two sequential steps: first establishing correspondences, then estimating the pose robustly using RANSAC or similar methods. This paper proposes a globally-optimal method that simultaneously solves for camera pose and feature correspondences by maximising the cardinality of the inlier set. The approach uses a branch-and-bound algorithm over the space of camera rotations, guaranteeing a globally-optimal solution without requiring initial correspondences.

從二維到三維特徵對應關係進行攝影機姿態估測，是電腦視覺中的基礎問題。現有方法通常將此分解為兩個序列步驟：先建立對應關係，再使用 RANSAC 或類似方法穩健地估測姿態。本文提出一種全域最佳方法，同時求解攝影機姿態與特徵對應關係，透過最大化內點集合的基數。此方法使用分支定界演算法在旋轉空間上搜尋，保證在不需要初始對應關係的情況下獲得全域最佳解。

段落功能全文總覽——從經典的兩步流程出發，引出同時求解的統一框架。

邏輯角色摘要以「分離 vs. 聯合」的對比建構論證：先指出既有方法的分離式缺陷，再以全域最佳保證作為核心賣點。

論證技巧 / 潛在漏洞「全域最佳」是極具吸引力的理論保證，但計算複雜度可能是實務瓶頸。摘要迴避了效率問題，將讀者的注意力集中在最佳性上。

1. Introduction — 緒論

The Perspective-n-Point (PnP) problem — estimating camera pose from n 2D-3D point correspondences — is central to visual localisation, structure-from-motion, and augmented reality. Standard pipelines first extract and match local features (e.g., SIFT, ORB), then apply RANSAC to find an inlier set consistent with a geometric model. However, the quality of the initial correspondences critically affects the final pose accuracy, and RANSAC provides only probabilistic guarantees of finding the optimal solution.

透視 n 點（PnP）問題——從 n 個二維到三維點對應關係估測攝影機姿態——是視覺定位、運動恢復結構與擴增實境的核心。標準流程先提取並匹配局部特徵（如 SIFT、ORB），再應用 RANSAC 找出與幾何模型一致的內點集合。然而，初始對應關係的品質嚴重影響最終姿態精度，且 RANSAC 僅提供找到最佳解的機率性保證。

段落功能建立研究場域——介紹 PnP 問題與標準 RANSAC 流程。

邏輯角色論證鏈起點：先確認 PnP 的重要性，再指出 RANSAC 的兩個弱點（依賴對應品質、機率性保證），為確定性方法鋪路。

論證技巧 / 潛在漏洞將 RANSAC 的「機率性保證」框架為缺點是有效的修辭策略，但實務上 RANSAC 已足夠可靠。作者需證明確定性保證帶來的實際效能提升值得額外計算成本。

The authors argue that decoupling correspondence estimation from pose estimation is suboptimal because errors in the first stage propagate irreversibly to the second. They propose to jointly optimise both problems by searching for the camera pose that maximises the number of geometrically consistent feature matches. This formulation naturally handles ambiguous and many-to-many correspondences that arise in practice, particularly in scenes with repetitive structures or textureless regions.

作者主張，將對應關係估測與姿態估測解耦是次佳的，因為第一階段的誤差會不可逆地傳播至第二階段。他們提出聯合最佳化兩個問題，透過搜尋最大化幾何一致性特徵匹配數量的攝影機姿態。此公式化自然地處理實務中出現的模糊與多對多對應關係，尤其在具有重複結構或無紋理區域的場景中。

段落功能問題深化——指出解耦方法的根本缺陷，為聯合方法提供理論動機。

邏輯角色從「RANSAC 有弱點」到「解耦本身是錯誤」的邏輯升級，將問題從「改良」層次提升到「範式轉換」層次。

論證技巧 / 潛在漏洞以「不可逆傳播」的因果論證說服力強，但多對多對應在實務中的頻率與嚴重性未被量化。若大多數場景的對應品質已足夠好，聯合最佳化的優勢可能有限。

RANSAC and its variants (PROSAC, LO-RANSAC, USAC) remain the dominant paradigm for robust estimation in geometry problems. These methods randomly sample minimal sets, compute model hypotheses, and evaluate consensus. While highly practical, they lack deterministic optimality guarantees. Branch-and-bound (BnB) methods have been applied to rotation search and registration problems, providing certifiable global optimality. However, prior BnB approaches for pose estimation require known correspondences as input.

RANSAC 及其變體（PROSAC、LO-RANSAC、USAC）仍是幾何問題中穩健估測的主流範式。這些方法隨機取樣最小集合、計算模型假設並評估共識。雖然高度實用，但缺乏確定性的最佳性保證。分支定界（BnB）方法已被應用於旋轉搜尋與配準問題，提供可驗證的全域最佳性。然而，先前用於姿態估測的分支定界方法需要已知的對應關係作為輸入。

段落功能文獻對比——將 RANSAC 族與 BnB 族方法的優劣並列。

邏輯角色建立兩個方法家族各自的優勢與缺口：RANSAC 實用但非最佳，BnB 最佳但需對應。本文方法定位為兩者的融合。

論證技巧 / 潛在漏洞以對稱式的文獻回顧暗示最佳解是結合兩者優勢，論證結構清晰。但 BnB 的指數級最壞情況複雜度被淡化處理。

3. Method — 方法

3.1 Problem Formulation — 問題公式化

Given a set of 2D image features and a set of 3D scene points, the goal is to find the camera rotation R and translation t that maximise the number of geometrically consistent 2D-3D matches. A match is considered an inlier if the reprojection error is below a threshold. Unlike standard PnP, the correspondences are not given a priori — the method must determine them simultaneously with the pose. The search space is the rotation group SO(3), with translation solved analytically for each rotation candidate.

給定一組二維影像特徵與一組三維場景點，目標是找到最大化幾何一致性二維到三維匹配數量的攝影機旋轉 R 與平移 t。若重投影誤差低於門檻值，則匹配被視為內點。與標準 PnP 不同，對應關係不是事先給定的——方法必須與姿態同時確定。搜尋空間為旋轉群 SO(3)，平移量則對每個旋轉候選進行解析求解。

段落功能形式化定義——嚴謹地陳述最佳化問題的目標函數與搜尋空間。

邏輯角色將直覺性的「聯合求解」轉化為數學問題：在 SO(3) 上最大化內點基數。平移的解析求解降低了搜尋維度。

論證技巧 / 潛在漏洞將平移分離並解析求解是降維的巧妙策略。但門檻值的選擇對內點數量有直接影響，其敏感度分析是驗證穩健性的關鍵。

3.2 Branch-and-Bound over SO(3) — 旋轉空間上的分支定界

The rotation space SO(3) is parameterised using the angle-axis representation and partitioned into nested cubes in the pi-ball. For each rotation subcube, an upper bound on the inlier count is computed by relaxing the geometric consistency constraint. The branch-and-bound algorithm recursively subdivides the most promising subcubes and prunes those whose upper bound is below the current best solution. This guarantees convergence to the globally-optimal rotation — the one that admits the largest inlier set — without exhaustive enumeration.

旋轉空間 SO(3) 使用角-軸表示進行參數化，並分割為 pi 球中的巢狀立方體。對於每個旋轉子立方體，透過放鬆幾何一致性約束來計算內點數的上界。分支定界演算法遞迴地細分最有前景的子立方體，並剪除上界低於當前最佳解的子立方體。這保證收斂到全域最佳旋轉——即容許最大內點集合的旋轉——而無需窮舉列舉。

段落功能核心演算法——詳述分支定界在旋轉空間上的運作機制。

邏輯角色此段是全文的技術核心：分支定界提供了全域最佳性的理論支撐，上界計算的品質直接決定演算法效率。

論證技巧 / 潛在漏洞以「不需窮舉」修飾分支定界，暗示效率可接受。但上界的緊緻程度（tightness）未被量化——若上界過於寬鬆，剪枝效果有限，演算法可能退化為接近窮舉的搜尋。

A critical component is the upper bound computation. For a given rotation subcube, the bound counts the maximum number of potentially consistent matches by considering the worst-case reprojection within the angular uncertainty of the subcube. The bound is tight in the sense that it converges to the true inlier count as the subcube shrinks. Additionally, the authors introduce efficient data structures and early termination criteria to accelerate the search in practice, achieving runtimes from seconds to minutes depending on problem size.

關鍵組件是上界計算。對於給定的旋轉子立方體，上界透過考慮子立方體角度不確定性內的最壞情況重投影，來計算最大可能一致匹配數。此上界是緊緻的，意即當子立方體縮小時會收斂到真實內點數。此外，作者引入高效的資料結構與提前終止準則來加速實際搜尋，在不同問題規模下達到數秒到數分鐘的執行時間。

段落功能效率保障——說明上界品質與加速策略。

邏輯角色回應可能的「計算效率」質疑：緊緻上界確保有效剪枝，加速策略使實務應用可行。

論證技巧 / 潛在漏洞「數秒到數分鐘」的表述坦誠但模糊——與 RANSAC 的毫秒級相比，實時應用可能受限。但全域最佳性在精度敏感的場景中可能值得此代價。

4. Experiments — 實驗

Experiments are conducted on both synthetic data with controlled noise levels and real datasets for visual localisation. On synthetic data, the method consistently finds the true globally-optimal solution, while RANSAC-based methods occasionally miss it, especially under high outlier ratios (above 80%). On real localisation benchmarks, the approach achieves higher pose accuracy than state-of-the-art methods, particularly in challenging scenarios with repetitive textures and wide baselines. The runtime scales reasonably with problem size, making the method suitable for offline applications requiring high reliability.

實驗在控制雜訊水準的合成資料與視覺定位的真實資料集上進行。在合成資料上，方法一致地找到真正的全域最佳解，而基於 RANSAC 的方法偶爾會遺漏，尤其在高離群值比例（超過 80%）下。在真實定位基準上，此方法達到比最先進方法更高的姿態精度，特別是在具有重複紋理與寬基線的挑戰性場景中。執行時間隨問題規模合理地增長，使方法適用於需要高可靠性的離線應用。

段落功能全面驗證——在合成與真實場景下證明方法的優越性。

邏輯角色實證支柱：合成實驗驗證最佳性保證，真實實驗驗證實用性。高離群值比例的測試特別有力。

論證技巧 / 潛在漏洞將方法定位為「離線高可靠性」應用是務實的策略，但也承認了即時性的限制。未與深度學習特徵匹配方法（如 SuperPoint + SuperGlue）比較，可能是時代局限。

5. Conclusion — 結論

This paper presents a globally-optimal approach to simultaneously estimating camera pose and feature correspondences by maximising the inlier set cardinality over the rotation space using branch-and-bound. The method provides deterministic optimality guarantees that RANSAC-based approaches cannot offer. Experiments demonstrate superior accuracy in challenging scenarios with high outlier ratios and ambiguous correspondences. The work opens avenues for certifiable algorithms in geometric vision tasks where reliability is paramount.

本文提出一種全域最佳方法，同時估測攝影機姿態與特徵對應關係，透過在旋轉空間上使用分支定界最大化內點集合基數。此方法提供了確定性最佳性保證，為 RANSAC 類方法所無法企及。實驗證明在挑戰性場景中具有優越的精度，包括高離群值比例與模糊對應。本研究為幾何視覺任務中可靠性至關重要的可驗證演算法開闢了道路。

段落功能總結全文——重申核心貢獻與理論保證，展望可驗證演算法的未來。

邏輯角色結論呼應摘要的「全域最佳」承諾，並將貢獻昇華為「可驗證演算法」的更廣泛研究方向。

論證技巧 / 潛在漏洞以「可驗證演算法」的概念框架結尾具有前瞻性。但未充分討論在大規模三維點雲下的可擴展性挑戰，以及與端對端學習方法的競爭前景。

論證結構總覽

問題
對應與姿態解耦
導致次佳結果

→

論點
SO(3) 上分支定界
聯合最佳化

→

證據
合成+真實資料
高離群值場景驗證

→

反駁
計算成本換取
確定性保證

→

結論
可驗證演算法
幾何視覺新方向

作者核心主張（一句話）

透過在旋轉空間上使用分支定界演算法最大化內點集合基數，可同時且全域最佳地求解攝影機姿態與特徵對應關係。

論證最強處

全域最佳性的理論保證：分支定界的收斂性證明為方法提供了 RANSAC 無法企及的確定性保證，特別在高離群值場景中展現出的穩定性優勢令人信服。

論證最弱處

實時性的限制：數秒到數分鐘的執行時間限制了即時應用的可能性。此外，隨著三維點雲規模增大，分支定界的實際收斂速度可能急遽下降，可擴展性未被充分驗證。