Sublabel-Accurate Relaxation of Nonconvex Energies

Abstract — 摘要

We propose a spatially continuous convex relaxation framework based on functional lifting that provides sublabel-accurate solutions to multilabel problems. While previous approaches to functional lifting required the energy to be linear between labels, necessitating a large number of labels for accurate approximation, our method uses a piecewise convex approximation of the data term, which significantly reduces the required number of labels. In contrast to MRF-based approaches, the spatially continuous setting produces less grid bias. Our formulation represents the tightest possible convex relaxation in a local sense. The implementation is straightforward and enables efficient primal-dual optimization on GPUs. We demonstrate the effectiveness of our approach on several computer vision applications including optical flow, stereo matching, and image denoising.

本文提出一個基於函數提升的空間連續凸鬆弛框架，為多標籤問題提供次標籤精度的解。先前的函數提升方法要求能量函數在標籤之間為線性，因此需要大量標籤才能精確近似；而本文方法使用資料項的分段凸近似，顯著降低了所需的標籤數量。相較於基於馬可夫隨機場的方法，空間連續設定產生較少的網格偏差。本文的公式化在局部意義上代表了最緊的凸鬆弛。實作方式簡潔明瞭，可在 GPU 上進行高效的原始-對偶最佳化。實驗在光流估計、立體匹配及影像去雜訊等電腦視覺應用上展示了方法的有效性。

段落功能全文總覽——精確定義「次標籤精度」問題，提出分段凸近似方案，並預告多項應用驗證。

邏輯角色摘要以遞進方式建構論點：先指出先前方法的線性假設限制，再引出分段凸近似的改進，最後以「最緊凸鬆弛」的理論保證與 GPU 實作的工程可行性收尾。

論證技巧 / 潛在漏洞「局部意義上最緊」的限定詞精確但可能讓讀者高估全域最佳性。此外，「次標籤精度」的實際改善幅度需依實驗量化驗證。

1. Introduction — 緒論

Many problems in computer vision can be formulated as labeling problems, where the goal is to assign a label from a discrete or continuous set to each pixel or spatial location. Examples include optical flow estimation, stereo matching, image segmentation, and denoising. A common approach is to formulate these as energy minimization problems consisting of a data term measuring fidelity to observations and a regularization term encouraging spatial smoothness. When the label space is discrete, Markov Random Fields (MRFs) provide a principled framework, but they suffer from metrication artifacts due to the underlying grid discretization. Continuous convex relaxation methods alleviate grid bias but have traditionally been limited to binary or convex data terms.

電腦視覺中的許多問題可被形式化為標籤問題，目標是為每個像素或空間位置從離散或連續集合中分配一個標籤。範例包括光流估計、立體匹配、影像分割和去雜訊。常見方法是將其形式化為能量最小化問題，包含衡量觀測值擬合度的資料項與鼓勵空間平滑性的正則項。當標籤空間為離散時，馬可夫隨機場提供了一個有原則的框架，但它因底層網格離散化而受到計量偽影的影響。連續凸鬆弛方法減輕了網格偏差，但傳統上僅限於二元或凸的資料項。

段落功能建立研究場域——定義標籤問題與能量最小化框架，指出 MRF 與凸鬆弛方法的各自局限。

邏輯角色論證起點：以「能量最小化」為統一框架，將多個視覺問題歸納為同一形式，然後指出兩大主流方法（MRF vs 凸鬆弛）各自的缺陷。

論證技巧 / 潛在漏洞以廣泛的應用範圍開場，建立方法的實用價值。但「計量偽影」對實際效能的影響程度因問題而異，可能在某些應用中微不足道。

Functional lifting extends convex relaxation to multilabel problems by lifting the label assignment function to a higher-dimensional space where convexity can be exploited. However, existing functional lifting approaches assume that the data term is piecewise linear between consecutive labels, which is only a good approximation when a large number of labels is used. This dramatically increases computational cost. We propose to replace the piecewise linear approximation with a piecewise convex one, which provides sublabel accuracy with far fewer labels. This is possible because many data terms in computer vision are naturally convex or can be well approximated by piecewise convex functions.

函數提升透過將標籤指派函數提升至更高維空間以利用凸性，將凸鬆弛擴展到多標籤問題。然而，現有的函數提升方法假設資料項在連續標籤間為分段線性，這僅在使用大量標籤時才是良好的近似，使計算成本大幅增加。本文提出以分段凸近似取代分段線性近似，以遠更少的標籤達到次標籤精度。這是可行的，因為電腦視覺中的許多資料項天生為凸，或可被分段凸函數良好地近似。

段落功能提出核心創新——從分段線性到分段凸的關鍵改進。

邏輯角色承接上段的問題界定，此段精確定位本文貢獻：不是提出全新框架，而是在函數提升框架內做出一個看似微小但影響深遠的改進。

論證技巧 / 潛在漏洞「許多資料項天生為凸」的聲明為分段凸近似的可行性提供了理論依據。但非凸資料項（如多模態匹配代價）在此框架中的處理能力值得探討。

Discrete optimization methods for labeling problems, including graph cuts and message passing algorithms, have been widely used in computer vision. While highly effective for certain energy classes, they are fundamentally limited by the discretization of both the spatial domain and the label space. Continuous convex relaxation approaches, pioneered by works on total variation regularization, avoid spatial discretization artifacts. The functional lifting framework of Pock et al. extends these ideas to multilabel problems but requires piecewise linear data terms, leading to the need for dense label sampling. Our approach directly addresses this limitation by introducing piecewise convex data terms that capture sublabel information within each label interval.

離散最佳化方法（包括圖切割與訊息傳遞演算法）在電腦視覺中被廣泛使用。雖然對某些能量類型非常有效，但它們根本上受限於空間域與標籤空間的離散化。連續凸鬆弛方法（由全變分正則化的先驅工作開創）避免了空間離散化偽影。Pock 等人的函數提升框架將這些概念擴展到多標籤問題，但要求分段線性資料項，導致需要密集的標籤取樣。本文方法透過引入分段凸資料項直接解決此限制，在每個標籤區間內捕獲次標籤資訊。

段落功能文獻定位——梳理從離散方法到連續方法再到函數提升的演進脈絡。

邏輯角色建立學術譜系：圖切割 -> 連續凸鬆弛 -> 函數提升 -> 本文的分段凸改進。每一步解決前一步的缺陷，使本文定位為自然的下一步。

論證技巧 / 潛在漏洞線性演進敘事清晰但可能過度簡化——實際上離散方法（如 FastPD）在許多應用中仍優於連續方法，兩者並非簡單的替代關係。

3. Method — 方法

We consider energy minimization problems of the form E(u) = integral of [f(x, u(x)) + g(x, Du(x))] dx, where u: Omega -> Gamma maps from the image domain to the label space, f is the data term, and g is the regularizer depending on the gradient Du. The key challenge is that f is generally nonconvex in u, making direct minimization intractable. The functional lifting approach replaces the scalar function u with a vector-valued indicator function phi in a higher-dimensional space, transforming the nonconvex problem into one that admits tight convex relaxation.

本文考慮形式為 E(u) = 積分 [f(x, u(x)) + g(x, Du(x))] dx 的能量最小化問題，其中 u: Omega -> Gamma 從影像域映射到標籤空間，f 為資料項，g 為依賴於梯度 Du 的正則項。核心挑戰在於 f 對 u 通常為非凸，使得直接最小化難以處理。函數提升方法以高維空間中的向量值指示函數 phi 取代純量函數 u，將非凸問題轉換為允許緊凸鬆弛的形式。

段落功能數學建模——正式定義能量最小化問題與函數提升的基本框架。

邏輯角色方法推導的數學基礎：從變分問題的一般形式出發，引入函數提升作為處理非凸性的標準工具。

論證技巧 / 潛在漏洞以簡潔的數學符號精確定義問題，為後續推導建立嚴謹的出發點。但函數提升帶來的維度膨脹問題未在此處提及。

3.2 Piecewise Convex Approximation — 分段凸近似

The central contribution of this work is to replace the piecewise linear approximation of the data term between labels with a piecewise convex approximation. Given a set of labels gamma_0 < gamma_1 < ... < gamma_K, instead of interpolating f linearly between consecutive labels, we compute the convex envelope of f on each interval [gamma_i, gamma_{i+1}]. This results in the tightest possible convex relaxation in a local sense: on each label interval, no tighter convex underestimator of the data term exists. The practical consequence is dramatic: problems that previously required hundreds of labels for acceptable accuracy can now be solved with as few as 5-10 labels, reducing memory and computation by orders of magnitude.

本文的核心貢獻是以分段凸近似取代標籤間資料項的分段線性近似。給定一組標籤 gamma_0 < gamma_1 < ... < gamma_K，不再在連續標籤間對 f 進行線性內插，而是在每個區間 [gamma_i, gamma_{i+1}] 上計算 f 的凸包絡。這在局部意義上產生了最緊的凸鬆弛：在每個標籤區間上，不存在更緊的凸下估計。實際影響非常顯著：先前需要數百個標籤才能達到可接受精度的問題，現在僅需 5-10 個標籤即可求解，將記憶體與計算量降低了數個數量級。

段落功能核心創新——詳述分段凸近似的數學機制與其帶來的計算效率提升。

邏輯角色全文論證的支柱：凸包絡取代線性內插這一看似簡單的改變，帶來了理論保證（最緊鬆弛）與實際效益（標籤數大幅減少）的雙重收穫。

論證技巧 / 潛在漏洞「數量級的改善」具有強烈的說服力。但凸包絡的計算本身有額外開銷，且對於高度非凸的資料項，分段凸近似的品質取決於標籤的放置位置——此敏感性未被深入討論。

3.3 Primal-Dual Optimization — 原始-對偶最佳化

The resulting convex optimization problem is solved using a first-order primal-dual algorithm in the style of Chambolle and Pock. The algorithm alternates between proximal gradient steps on the primal variable and the dual variable, with the proximal operators having closed-form solutions for typical regularizers (total variation, Huber norm). The implementation is highly parallelizable and runs efficiently on modern GPUs. The straightforward structure of the algorithm — requiring only pointwise operations and linear operators — makes the implementation remarkably simple despite the theoretical sophistication of the relaxation.

所得的凸最佳化問題使用 Chambolle 與 Pock 風格的一階原始-對偶演算法求解。演算法在原始變數與對偶變數的近端梯度步驟之間交替進行，其近端運算子在典型正則項（全變分、Huber 範數）下具有封閉形式解。此實作高度可平行化，可在現代 GPU 上高效運行。演算法的簡潔結構——僅需逐點運算與線性運算子——使得儘管鬆弛的理論基礎複雜，實作卻非常簡單。

段落功能演算法實作——說明最佳化求解器的具體步驟與工程優勢。

邏輯角色從理論到實踐的橋樑：證明分段凸鬆弛不僅在理論上更緊，在實作上也同樣簡潔可行。

論證技巧 / 潛在漏洞強調「實作簡單」是極有效的推廣策略——降低了方法被採用的門檻。但一階方法的收斂速度可能較慢，對大規模問題的實際運行時間未明確報告。

4. Experiments — 實驗

We evaluate our approach on three computer vision applications. For optical flow estimation, our method with only 10 labels achieves comparable accuracy to the piecewise linear approach with 200 labels, demonstrating the sublabel accuracy claim. For stereo matching on the Middlebury benchmark, the continuous formulation produces smoother disparity maps with less grid bias than MRF-based methods. For image denoising with nonconvex regularizers, our framework handles the ROF model with truncated total variation, producing results that better preserve edges while removing noise. Across all experiments, the GPU implementation converges within seconds, making the approach practical for real-time applications.

本文在三項電腦視覺應用上評估方法。在光流估計方面，僅使用 10 個標籤即達到分段線性方法使用 200 個標籤的同等精度，驗證了次標籤精度的主張。在 Middlebury 基準的立體匹配方面，連續公式化產生比基於 MRF 方法更平滑、網格偏差更少的視差圖。在使用非凸正則項的影像去雜訊方面，本框架處理截斷全變分的 ROF 模型，產出在去除雜訊的同時更好地保留邊緣的結果。在所有實驗中，GPU 實作在數秒內即可收斂，使方法適用於即時應用。

段落功能全面的實驗驗證——在光流、立體匹配、去雜訊三個應用上展示方法的有效性。

邏輯角色實證支柱：「10 vs 200 個標籤」的對比是全文最有說服力的數據點，直接量化了分段凸近似帶來的效率提升。

論證技巧 / 潛在漏洞三個應用涵蓋了光流（連續標籤）、立體（半連續）與去雜訊（正則項變體），展示了框架的通用性。但未與當時最先進的深度學習方法（如 FlowNet）比較，可能因研究典範不同而被忽視。

5. Conclusion — 結論

We have presented a sublabel-accurate convex relaxation framework for nonconvex energy minimization in computer vision. By replacing the piecewise linear approximation in functional lifting with a piecewise convex one, we achieve the locally tightest convex relaxation while dramatically reducing the number of required labels. The approach combines theoretical rigor with practical efficiency: the optimization is straightforward to implement, GPU-accelerable, and produces high-quality results across multiple applications. Our work demonstrates that careful mathematical modeling can lead to both better theory and better practice in computer vision optimization.

本文提出了一個用於電腦視覺中非凸能量最小化的次標籤精度凸鬆弛框架。透過以分段凸近似取代函數提升中的分段線性近似，實現了局部最緊的凸鬆弛，同時大幅降低所需的標籤數量。此方法結合了理論嚴謹性與實際效率：最佳化過程易於實作、可 GPU 加速，並在多項應用中產出高品質結果。本研究展示了審慎的數學建模能在電腦視覺最佳化中同時帶來更好的理論與更好的實踐。

段落功能總結全文——重申理論貢獻與實踐價值的雙重意義。

邏輯角色結論以「理論與實踐兼顧」收尾，呼應摘要中的雙重承諾，形成完整的論證閉環。

論證技巧 / 潛在漏洞結尾的哲學性陳述（「審慎的數學建模帶來更好的理論與實踐」）提升了論文的格局。但隨著深度學習方法在低層視覺任務上的快速崛起，基於能量最小化的方法在精度上逐漸被超越，此框架的長期影響力可能受限。

論證結構總覽

問題
函數提升需大量標籤
計算代價高昂

→

論點
分段凸取代分段線性
達到次標籤精度

→

證據
10 標籤 = 200 標籤精度
三項應用驗證

→

反駁
局部最緊凸鬆弛
理論保證

→

結論
數學建模同時改善
理論與實踐

作者核心主張（一句話）

以分段凸近似取代函數提升中的分段線性近似，能在保持局部最緊凸鬆弛的理論保證下，將多標籤問題所需的標籤數量減少一至兩個數量級。

論證最強處

理論與實踐的完美呼應：「10 個標籤等效 200 個標籤」的實驗結果精準對應了「次標籤精度」的理論主張，且 GPU 實作在秒級收斂進一步強化了實用性。分段凸近似的概念簡潔而深刻，是數學優雅性的典範。

論證最弱處

與深度學習的比較缺失：論文發表於深度學習方法席捲低層視覺任務的時期，但未與 FlowNet 等端到端學習方法進行精度和速度的比較。此外，「局部最緊」並非「全域最佳」，在資料項高度非凸的場景下，鬆弛品質可能顯著下降。