Saliency Detection via Absorbing Markov Chain

Abstract — 摘要

In this paper, we formulate saliency detection as a problem of computing the absorption time of absorbing Markov chains on image graphs. We construct a graph where image superpixels serve as transient nodes and boundary superpixels are designated as absorbing nodes. The expected absorption time from each transient node to the absorbing boundary nodes effectively measures the dissimilarity between interior regions and the image boundary, naturally distinguishing salient objects from the background. To handle false positives that arise from homogeneous background regions near the image center, we further leverage the equilibrium distribution of an ergodic Markov chain to refine the saliency map. Experiments on four benchmark datasets demonstrate that our method achieves competitive or superior performance compared to state-of-the-art approaches.

本文將顯著性偵測公式化為計算影像圖上吸收馬可夫鏈之吸收時間的問題。我們建構一個圖，其中影像超像素作為暫態節點，邊界超像素被指定為吸收節點。從每個暫態節點到吸收邊界節點的期望吸收時間，有效地衡量了內部區域與影像邊界之間的差異性，自然地將顯著物件與背景區分開來。為處理影像中心附近均質背景區域所引起的誤正例，我們進一步利用遍歷馬可夫鏈的均衡分布來精煉顯著性圖。在四個基準資料集上的實驗證明，我們的方法達到了與最先進方法相當或更優的表現。

段落功能全文總覽——以遞進方式從「馬可夫鏈建模」到「吸收時間計算」再到「遍歷修正」，預告實驗成果。

邏輯角色摘要精確對應了三層技術貢獻：(1) 吸收馬可夫鏈的新框架；(2) 邊界先驗的數學實現；(3) 均衡分布的修正機制。結構緊湊完整。

論證技巧 / 潛在漏洞將顯著性偵測重新框架為馬可夫鏈問題，具有概念上的新穎性。但「邊界即背景」的假設在某些場景（如物件被裁切）中可能失效，作者透過遍歷修正試圖緩解此問題，但未在摘要中明確其適用範圍。

1. Introduction — 緒論

Salient object detection aims to identify the most visually prominent regions in an image. It has wide applications in image segmentation, content-aware resizing, object recognition, and visual tracking. Most existing approaches compute saliency based on local or global contrast: local methods compare each pixel or region with its neighbors, while global methods compare regions against the entire image statistics. While effective, contrast-based methods may fail when the salient object has similar color distribution to the background or when the background is complex and heterogeneous.

顯著物件偵測旨在識別影像中最視覺突出的區域。它在影像分割、內容感知裁切、物件辨識與視覺追蹤等方面具有廣泛應用。大多數現有方法基於局部或全域對比來計算顯著性：局部方法比較每個像素或區域與其鄰域，全域方法則比較區域與整幅影像的統計特性。儘管有效，基於對比的方法在顯著物件與背景具有相似色彩分布，或背景複雜且異質時可能失效。

段落功能建立研究場域——定義問題並指出對比方法的兩大失敗模式。

邏輯角色論證鏈的起點：先以應用場景建立重要性，再揭示對比範式的結構性缺陷，為馬可夫鏈方法的引入建立動機。

論證技巧 / 潛在漏洞兩個失敗模式（色彩相似、背景複雜）的列舉具體且有說服力。但這些失敗模式對於馬可夫鏈方法是否同樣存在，需在後文中回應。

Recent works have explored the use of boundary prior — the observation that image boundaries are predominantly occupied by background regions. Methods such as geodesic saliency leverage this prior by computing the shortest geodesic distance from each region to the image boundary. In this paper, we propose a principled formulation based on absorbing Markov chains that naturally integrates the boundary prior with a global diffusion process. The absorption time captures both local appearance similarity and global connectivity structure, providing a more robust saliency measure than simple distance metrics.

近期研究探索了邊界先驗的使用——即影像邊界主要被背景區域佔據的觀察。如測地線顯著性等方法利用此先驗，計算每個區域到影像邊界的最短測地線距離。本文提出一種基於吸收馬可夫鏈的原則性公式化方法，自然地將邊界先驗與全域擴散過程整合。吸收時間同時捕捉了局部外觀相似性與全域連通結構，提供比簡單距離度量更穩健的顯著性衡量。

段落功能提出核心思想——將邊界先驗與馬可夫鏈擴散結合。

邏輯角色承接上段的問題陳述，此段將方法定位為「測地線顯著性的升級版」——從靜態距離到動態擴散過程，理論深度提升。

論證技巧 / 潛在漏洞以「原則性公式化」一詞暗示方法具有嚴格的數學基礎，與啟發式方法形成對比。但馬可夫鏈的轉移機率設定仍需啟發式選擇（如特徵距離的核函數），並非完全「原則性」。

Saliency detection methods can be broadly categorized into local contrast-based, global contrast-based, and boundary prior-based approaches. Itti et al.'s classic model uses multi-scale center-surround differences. Cheng et al. proposed global contrast using histogram-based color distance. Wei et al. introduced geodesic saliency based on boundary connectivity. Our work is most related to graph-based diffusion methods such as Yang et al.'s manifold ranking approach, but differs in using absorbing rather than regular random walks. The absorbing formulation provides a natural and principled way to incorporate the boundary prior through the designation of absorbing states.

顯著性偵測方法可大致分為局部對比、全域對比與邊界先驗三大類。Itti 等人的經典模型使用多尺度中心-環繞差異。Cheng 等人提出基於直方圖色彩距離的全域對比。Wei 等人引入基於邊界連通性的測地線顯著性。我們的工作與基於圖的擴散方法最為相關，如 Yang 等人的流形排序方法，但不同之處在於使用吸收型而非一般型隨機漫步。吸收式公式化透過吸收狀態的指定，提供了一種自然且有原則的邊界先驗融入方式。

段落功能文獻回顧——將方法定位於三大路線的交匯處。

邏輯角色建立清晰的學術譜系，並精確指出與最近親方法（流形排序）的關鍵差異點：吸收 vs. 一般隨機漫步。

論證技巧 / 潛在漏洞將「吸收」與「一般」隨機漫步的區別作為核心差異化要素是精準的技術定位。但需要在方法章節中證明此差異在實務上確實帶來顯著的效能提升。

3. Method — 方法

3.1 Graph Construction — 圖建構

We first over-segment the input image into superpixels using SLIC. A graph G = (V, E) is constructed where each superpixel is a node and edges connect spatially adjacent superpixels. The edge weight between nodes i and j is defined as w_ij = exp(-||c_i - c_j||^2 / (2 * sigma^2)), where c_i and c_j are the mean color vectors in CIELAB space and sigma is a bandwidth parameter. The transition probability matrix P of the Markov chain is derived by row-normalizing the weight matrix. Superpixels touching the image boundary are designated as absorbing nodes — once the random walker reaches these nodes, it is absorbed and stops.

我們首先使用 SLIC 將輸入影像過度分割為超像素。建構圖 G = (V, E)，其中每個超像素為一個節點，邊連接空間上相鄰的超像素。節點 i 與 j 之間的邊權重定義為 w_ij = exp(-||c_i - c_j||^2 / (2 * sigma^2))，其中 c_i 與 c_j 為 CIELAB 空間中的平均色彩向量，sigma 為頻寬參數。馬可夫鏈的轉移機率矩陣 P 透過對權重矩陣進行列正規化而得。接觸影像邊界的超像素被指定為吸收節點——隨機漫步者一旦到達這些節點即被吸收而停止。

段落功能方法推導第一步——定義圖結構與馬可夫鏈。

邏輯角色此為整個框架的數學基礎。SLIC 超像素提供了計算效率（相比像素級），高斯核的邊權重編碼了色彩相似性，邊界吸收節點實現了邊界先驗。

論證技巧 / 潛在漏洞數學公式化清晰嚴謹。但頻寬參數 sigma 的選擇對結果可能有顯著影響——過大導致所有節點相似，過小則圖變得稀疏。此超參數的敏感性需進一步分析。

3.2 Absorption Time — 吸收時間

In an absorbing Markov chain, the expected absorption time y_i for a transient node i represents the expected number of steps a random walker starting at node i takes before being absorbed by any boundary node. This can be computed from the fundamental matrix N = (I - Q)^{-1}, where Q is the sub-matrix of the transition matrix corresponding to transient nodes. The absorption time is y = N * 1 (the row sums of N). Intuitively, salient regions that differ from the background will have longer absorption times because the random walker must traverse through dissimilar regions to reach the boundary. Conversely, background regions similar to the boundary have short absorption times. To address the false positive problem for homogeneous regions far from boundaries, we compute the equilibrium distribution of an ergodic version of the chain (by removing absorbing states) and use it to normalize the absorption times, suppressing large central background regions.

在吸收馬可夫鏈中，暫態節點 i 的期望吸收時間 y_i 代表從節點 i 出發的隨機漫步者在被任何邊界節點吸收前所需的期望步數。此可從基本矩陣 N = (I - Q)^{-1} 計算，其中 Q 為轉移矩陣中對應暫態節點的子矩陣。吸收時間為 y = N * 1（N 的列和）。直覺上，與背景不同的顯著區域將具有較長的吸收時間，因為隨機漫步者必須穿越不相似的區域才能到達邊界。反之，與邊界相似的背景區域具有較短的吸收時間。為解決遠離邊界的均質區域所產生的誤正例問題，我們計算鏈的遍歷版本（移除吸收狀態後）的均衡分布，並用其正規化吸收時間，抑制大面積的中央背景區域。

段落功能核心演算法——推導吸收時間的計算與修正機制。

邏輯角色此段是全文論證的支柱：基本矩陣的逆運算提供了嚴格的數學解，均衡分布的修正則回應了邊界先驗的已知弱點。兩者構成了「基礎方法 + 修正機制」的完整架構。

論證技巧 / 潛在漏洞「直覺上」的解釋使抽象的數學概念變得可理解。但 (I - Q)^{-1} 的計算在超像素數量較多時可能成為瓶頸（O(n^3) 複雜度）。均衡分布修正是對邊界先驗弱點的補救，但兩個機制的權重如何平衡需要實驗調校。

4. Experiments — 實驗

We evaluate the proposed method on four benchmark datasets: MSRA-B (5000 images), ECSSD (1000 images), DUT-OMRON (5168 images), and PASCAL-S (850 images). We compare against fourteen state-of-the-art methods using standard metrics including Precision-Recall curves, F-measure, and Mean Absolute Error (MAE). Our method consistently achieves top-tier performance across all datasets. On MSRA-B, we achieve the highest F-measure of 0.903, outperforming geodesic saliency (0.871) and manifold ranking (0.889). On the more challenging DUT-OMRON dataset, our method shows particularly strong improvements, demonstrating robustness to complex backgrounds. Ablation experiments confirm that both the absorbing Markov chain formulation and the equilibrium distribution refinement contribute positively to the final performance.

我們在四個基準資料集上評估所提方法：MSRA-B（5000 幅影像）、ECSSD（1000 幅影像）、DUT-OMRON（5168 幅影像）及 PASCAL-S（850 幅影像）。我們與十四個最先進方法進行比較，使用的標準指標包括精確率-召回率曲線、F-measure 與平均絕對誤差（MAE）。我們的方法在所有資料集上穩定地達到頂級表現。在 MSRA-B 上，我們達到 0.903 的最高 F-measure，優於測地線顯著性（0.871）與流形排序（0.889）。在更具挑戰性的 DUT-OMRON 資料集上，我們的方法展現出尤為顯著的改進，證明了對複雜背景的穩健性。消融實驗確認吸收馬可夫鏈公式化與均衡分布精煉均對最終效能有正面貢獻。

段落功能提供全面的實驗證據——在四個資料集上以三個指標驗證方法有效性。

邏輯角色實證支柱：大規模比較（14 個基線）與多資料集驗證提供了統計穩健性。DUT-OMRON 上的特別優勢呼應了緒論中「複雜背景」挑戰的討論。

論證技巧 / 潛在漏洞直接報告與最近親方法（測地線、流形排序）的數值對比是有效的差異化策略。但未報告執行時間比較——矩陣求逆的計算成本可能顯著高於簡單的距離計算。

5. Conclusion — 結論

We have proposed a novel saliency detection method based on absorbing Markov chains. By treating boundary superpixels as absorbing states and computing the expected absorption time for each interior region, our method naturally integrates the boundary prior with a principled diffusion process. The additional refinement using equilibrium distributions effectively addresses the false positive problem. Our approach achieves state-of-the-art results on multiple benchmarks with a clean and mathematically grounded formulation. Future directions include extending the framework to video saliency detection with temporal absorbing states and exploring learning-based approaches for adaptive graph construction.

我們提出了一種基於吸收馬可夫鏈的新穎顯著性偵測方法。透過將邊界超像素作為吸收狀態並計算每個內部區域的期望吸收時間，我們的方法自然地將邊界先驗與有原則的擴散過程整合。使用均衡分布的額外精煉有效地解決了誤正例問題。我們的方法以乾淨且數學上有根據的公式化，在多個基準上達到最先進的結果。未來方向包括將框架擴展至具有時間吸收狀態的影片顯著性偵測，以及探索用於自適應圖建構的學習式方法。

段落功能總結全文——重申核心貢獻並展望未來方向。

邏輯角色結論段呼應摘要結構，以「乾淨且數學上有根據」作為方法的核心賣點。兩個未來方向（時間擴展、學習式圖建構）均具有明確的技術可行性。

論證技巧 / 潛在漏洞「學習式圖建構」的展望暗示了手工設計圖結構的侷限性——這實際上是方法的一個潛在弱點的委婉承認。隨著深度學習的興起，此類手工方法的長期競爭力面臨挑戰。

論證結構總覽

問題
對比方法在相似色彩
或複雜背景下失效

→

論點
吸收馬可夫鏈整合
邊界先驗與擴散過程

→

證據
四個資料集上達到
F-measure 0.903

→

反駁
均衡分布修正
解決中央誤正例

→

結論
數學上嚴謹的
顯著性偵測框架

作者核心主張（一句話）

吸收馬可夫鏈的期望吸收時間提供了一種數學上嚴謹、計算上可行的顯著性度量，自然整合了邊界先驗與全域擴散結構。

論證最強處

數學框架的優雅性：將顯著性偵測重新框架為吸收馬可夫鏈問題，使得邊界先驗成為模型的自然組成部分（而非外部附加條件）。基本矩陣的閉合形式解避免了迭代求解的不確定性。

論證最弱處

計算效率與邊界假設：基本矩陣的求逆（O(n^3)）在高解析度影像上可能成為瓶頸。更根本地，「邊界即背景」的假設在裁切影像或全景場景中系統性失效，均衡分布修正僅能部分緩解此問題。