Unpaired Learning of Deep Image Denoising

Abstract — 摘要

We propose a novel approach for unpaired learning of deep image denoising that does not require paired clean-noisy training data or knowledge of the noise model. Our method uses a Dilated Blind-Spot Network (D-BSN) trained in a self-supervised manner followed by knowledge distillation to produce a practical denoiser. The D-BSN learns to predict each pixel from its surrounding context without seeing the pixel itself, and the distilled student network produces clean outputs directly. This achieves competitive results with supervised methods on real-world noisy images.

我們提出一種無配對深度影像去雜訊學習的新方法，不需要配對的清潔-雜訊訓練資料或雜訊模型的知識。我們的方法使用以自監督方式訓練的擴張盲點網路（D-BSN），接著透過知識蒸餾產生實用的去雜訊器。D-BSN 學習從周圍上下文預測每個像素而不看到該像素本身，蒸餾的學生網路直接產生乾淨輸出。在真實世界雜訊影像上達到與監督方法相當的結果。

段落功能全文總覽——定義無配對去雜訊的兩階段框架。

邏輯角色「不需要配對資料或雜訊模型」直接回應了實際應用的核心挑戰。

論證技巧 / 潛在漏洞盲點網路的設計巧妙地利用了雜訊的統計獨立性假設。

1. Introduction — 緒論

Image denoising has been dominated by supervised learning methods that require paired clean-noisy image datasets. In practice, obtaining perfectly aligned clean-noisy pairs is extremely difficult or impossible for real-world noise. Existing unsupervised methods either assume a known noise model (e.g., Gaussian) or use noise-to-noise training that still requires multiple noisy observations of the same scene. We propose a fully unpaired approach that works with only a collection of noisy images, without any clean references or noise model assumptions.

影像去雜訊由監督學習方法主導，需要配對的清潔-雜訊影像資料集。實務上，對真實世界雜訊獲取完美對齊的配對極為困難甚至不可能。現有的非監督方法要麼假設已知雜訊模型（如高斯），要麼使用仍需同一場景多次雜訊觀察的 noise-to-noise 訓練。我們提出一種完全無配對方法，僅需一組雜訊影像，無需任何清潔參考或雜訊模型假設。

段落功能建立動機——列舉現有方法的資料需求限制。

邏輯角色逐步排除各種資料假設，為「完全無配對」的極端設定建立必要性。

論證技巧 / 潛在漏洞四種限制的列舉形成了強有力的論證鏈，清楚定位了研究缺口。

The key theoretical foundation is that if noise is pixel-wise independent, a network that predicts a pixel without accessing it will converge to the conditional expectation of the clean signal. This is the same principle behind Noise2Self and blind-spot networks. However, existing blind-spot architectures produce suboptimal results due to the receptive field constraint. Our two-stage approach elegantly resolves this limitation.

關鍵的理論基礎是若雜訊是逐像素獨立的，一個不存取目標像素就預測它的網路將收斂到清潔信號的條件期望。這與 Noise2Self 和盲點網路背後的原理相同。然而，現有盲點架構因感受野約束而產生次優結果。我們的兩階段方法優雅地解決了此限制。

段落功能理論基礎——建立盲點網路的數學正當性。

邏輯角色條件期望的收斂性質為方法提供了嚴謹的理論保證。

論證技巧 / 潛在漏洞逐像素獨立的假設是核心但並非所有真實雜訊都滿足此條件。

2. Method — 方法

Our approach consists of two stages. In Stage 1, we train a Dilated Blind-Spot Network (D-BSN) that predicts each pixel from its surrounding context using dilated convolutions that skip the center pixel. Under the assumption that noise is pixel-wise independent, the network can learn to estimate the clean signal without ever seeing clean images. The D-BSN output is a denoised estimate with some residual artifacts due to the blind-spot constraint.

我們的方法由兩個階段組成。在第一階段，我們訓練擴張盲點網路（D-BSN），使用跳過中心像素的擴張摺積從周圍上下文預測每個像素。在雜訊是逐像素獨立的假設下，網路可以在從未見過清潔影像的情況下學習估計清潔信號。D-BSN 的輸出是去雜訊估計，但因盲點約束帶有殘留偽影。

段落功能核心方法第一階段——盲點網路的自監督去雜訊。

邏輯角色盲點設計是自監督去雜訊的核心巧思：排除目標像素確保網路學到去雜訊而非恆等映射。

論證技巧 / 潛在漏洞逐像素獨立的假設在真實感測器雜訊中大致成立但並非嚴格正確。

2.1 Knowledge Distillation — 知識蒸餾

In Stage 2, we use the trained D-BSN as a teacher to generate pseudo-clean targets from noisy images. A standard U-Net student without the blind-spot constraint is then trained to map noisy inputs to these pseudo-clean targets. This distillation step removes the artifacts caused by the blind-spot constraint and produces a practical denoiser that can process pixels using their full local context.

在第二階段，我們以訓練好的 D-BSN 作為教師，從雜訊影像生成偽清潔目標。然後訓練一個不含盲點約束的標準 U-Net 學生，將雜訊輸入映射到這些偽清潔目標。這個蒸餾步驟移除盲點約束造成的偽影，產生實用的去雜訊器，可使用完整局部上下文處理像素。

段落功能第二階段——透過蒸餾消除盲點約束的限制。

邏輯角色兩階段設計巧妙地解耦了「自監督學習」和「實用推論」兩個目標。

論證技巧 / 潛在漏洞蒸餾品質取決於第一階段 D-BSN 的去雜訊品質，誤差可能累積。

3. Experiments — 實驗

On the DND benchmark for real-world denoising, our method achieves 39.28 dB PSNR, compared to 39.75 dB for supervised CBDNet and 37.50 dB for BM3D. On SIDD benchmark, we achieve 38.92 dB. Remarkably, our unpaired method comes within 0.5 dB of fully supervised approaches while requiring no paired training data. The distilled student outperforms the D-BSN teacher by 0.8 dB, confirming the value of the distillation stage.

在DND 基準上，我們的方法達到 39.28 dB PSNR，相比監督式 CBDNet 的 39.75 dB 和 BM3D 的 37.50 dB。在 SIDD 基準上達到 38.92 dB。值得注意的是，我們的無配對方法與完全監督方法僅差 0.5 dB，且不需要配對訓練資料。蒸餾學生比 D-BSN 教師提升 0.8 dB，確認蒸餾階段的價值。

段落功能定量評估——在真實世界基準上逼近監督方法。

邏輯角色與監督方法僅差 0.5 dB 是極為有力的結果，證明了無配對方法的可行性。

論證技巧 / 潛在漏洞蒸餾帶來 0.8 dB 改進驗證了兩階段設計的必要性。

We also compare different blind-spot architectures. Our D-BSN with dilated convolutions outperforms the standard blind-spot network by +0.6 dB, as dilated convolutions provide a larger effective receptive field while maintaining the blind-spot property. We further show that the method works across different noise types including Gaussian, Poisson, and real sensor noise.

我們也比較了不同的盲點架構。我們的擴張摺積 D-BSN 比標準盲點網路提升 +0.6 dB，因為擴張摺積在維持盲點特性的同時提供了更大的有效感受野。我們進一步展示該方法在不同雜訊類型上均有效，包括高斯、泊松和真實感測器雜訊。

段落功能架構比較與泛化性——擴張摺積的優勢與跨雜訊類型驗證。

邏輯角色跨雜訊類型的有效性增強了方法的實用性宣稱。

論證技巧 / 潛在漏洞擴張摺積的改進驗證了架構設計的細節考量對效能有實質影響。

4. Conclusion — 結論

We have presented a fully unpaired approach to deep image denoising that combines self-supervised blind-spot learning with knowledge distillation. Our method achieves near-supervised performance without any paired data or noise model assumptions. This work demonstrates that self-supervised learning can largely close the gap with supervised methods in low-level vision tasks.

我們提出了一種完全無配對的深度影像去雜訊方法，結合自監督盲點學習與知識蒸餾。我們的方法在不需要配對資料或雜訊模型假設的情況下達到接近監督的效能。本研究展示了自監督學習在低階視覺任務中可大幅縮小與監督方法的差距。

段落功能總結——確立自監督去雜訊的可行性與研究意義。

邏輯角色將結論上升到自監督方法在低階視覺中的普遍潛力。

論證技巧 / 潛在漏洞後續的 Noise2Score、AP-BSN 等工作進一步改進了此方向。

論證結構總覽

問題
去雜訊需配對資料

→

論點
盲點網路可自監督學習

→

方法
D-BSN + 蒸餾

→

證據
與監督方法僅差 0.5dB

→

結論
自監督可逼近監督效能

核心主張

透過擴張盲點網路的自監督訓練與知識蒸餾的兩階段設計，可在完全無配對資料的條件下達到接近監督方法的去雜訊效能。

論證最強處

在 DND 和 SIDD 真實世界基準上與監督方法僅差 0.5 dB，蒸餾階段帶來 0.8 dB 的額外改進。

論證最弱處

逐像素獨立雜訊的假設可能不適用於所有真實場景（如結構化雜訊），且兩階段訓練增加了流程複雜度。