NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Abstract — 摘要

Neural Radiance Fields (NeRF) have shown impressive results for novel view synthesis from well-lit photographs. However, standard NeRF relies on processed low dynamic range (LDR) images, losing valuable high dynamic range (HDR) information and introducing bias from the camera's image signal processor. The authors propose RawNeRF, which modifies NeRF to train directly on linear raw images rather than tonemapped outputs. This enables high dynamic range view synthesis from noisy inputs captured in near-darkness, leveraging 3D multiview consistency across 25-200 input frames to effectively denoise while preserving full HDR information. The method enables novel applications including post-capture exposure adjustment, tonemapping manipulation, and synthetic defocus with physically accurate bokeh.

神經輻射場 (NeRF) 已在光線充足的照片上展現出令人印象深刻的新視角合成結果。然而，標準 NeRF 依賴經過處理的低動態範圍 (LDR) 影像，喪失了珍貴的高動態範圍 (HDR) 資訊，並引入相機影像訊號處理器的偏差。作者提出 RawNeRF，修改 NeRF 使其直接在線性原始影像上訓練，而非色調映射後的輸出。這使得從近乎黑暗中拍攝的雜訊輸入實現高動態範圍視角合成成為可能，利用 25-200 幀輸入之間的三維多視角一致性進行有效去雜訊，同時保留完整的 HDR 資訊。該方法實現了全新的應用，包括拍攝後的曝光調整、色調映射操作，以及具有物理精確散景的合成淺景深效果。

段落功能全文總覽——從 NeRF 的 LDR 限制出發，引出 RawNeRF 在 HDR 域中的創新。

邏輯角色摘要建構了「已知方法的隱含假設（LDR 輸入）-> 打破假設（Raw 輸入）-> 新能力（HDR 合成 + 去雜訊 + 後製）」的三段式論證。

論證技巧 / 潛在漏洞「在近乎黑暗中」的措辭極具戲劇性，有效地傳達了方法的極端場景能力。但「25-200 幀」的需求在實際應用中可能是限制——暗光場景中取得足夠多樣的視角本身就是挑戰。

1. Introduction — 緒論

NeRF and its extensions have become the dominant paradigm for neural scene representation and novel view synthesis. Yet virtually all existing methods assume well-exposed, noise-free input images processed by the camera's image signal processor (ISP). This processing pipeline applies nonlinear transformations including demosaicing, white balancing, gamma correction, and tonemapping, which irreversibly compress the scene's dynamic range and discard information in highlights and shadows. In low-light conditions, the ISP further amplifies sensor noise through its nonlinear processing, producing images with severe artifacts. The authors argue that training NeRF directly on raw sensor data, bypassing the ISP entirely, can simultaneously solve the problems of noise, limited dynamic range, and inflexible post-processing.

NeRF 及其延伸方法已成為神經場景表示與新視角合成的主流範式。但現有方法幾乎都假設輸入影像是曝光良好、無雜訊且經過相機影像訊號處理器 (ISP) 處理的。此處理管線施加了非線性變換，包括去馬賽克、白平衡、伽瑪校正與色調映射，這些操作不可逆地壓縮了場景的動態範圍，丟棄了高光與陰影中的資訊。在低光條件下，ISP 的非線性處理進一步放大感測器雜訊，產生帶有嚴重偽影的影像。作者主張，直接在原始感測器資料上訓練 NeRF、完全繞過 ISP，可以同時解決雜訊、有限動態範圍與不靈活後製的問題。

段落功能問題識別——揭示 NeRF 對 ISP 處理影像的隱含依賴及其後果。

邏輯角色此段挑戰了 NeRF 研究領域中一個普遍但未被明確質疑的假設：「輸入影像應該是經過 ISP 處理的」。透過列舉 ISP 的非線性操作及其資訊損失，建立了繞過 ISP 的必要性。

論證技巧 / 潛在漏洞以「不可逆地壓縮」與「進一步放大雜訊」等措辭有效地將 ISP 描繪為問題的來源而非解決方案。但 ISP 的存在有其合理性——它處理了 Bayer 模式的解碼、感測器非線性校正等必要操作。完全繞過 ISP 意味著 RawNeRF 需要自行處理這些低階問題。

The key insight is that NeRF's multiview aggregation provides a natural mechanism for denoising through 3D consistency. When multiple noisy observations of the same 3D point are available from different viewpoints, fitting a smooth radiance field to these observations implicitly averages out the noise. Unlike traditional single-image denoisers that must hallucinate missing details, RawNeRF leverages genuine multiview information to recover both geometry and appearance. Furthermore, by operating in linear raw space, the reconstructed radiance field preserves the full dynamic range of the scene, enabling post-capture manipulation such as exposure changes, tonemapping, and synthetic defocus.

關鍵洞察在於 NeRF 的多視角聚合為透過三維一致性進行去雜訊提供了天然機制。當同一三維點可從不同視角獲得多個雜訊觀測時，將平滑的輻射場擬合至這些觀測隱式地平均了雜訊。不同於傳統的單影像去雜訊器需要「幻想」缺失的細節，RawNeRF 利用真實的多視角資訊來恢復幾何與外觀。此外，在線性原始空間中運作，重建的輻射場保留了場景的完整動態範圍，實現了拍攝後的曝光變更、色調映射與合成淺景深等操控。

段落功能核心洞察——揭示多視角一致性作為天然去雜訊機制的原理。

邏輯角色此段是全文最關鍵的論證：將 NeRF 從「需要乾淨輸入的方法」重新定位為「本身就是去雜訊器」。這個概念轉換使 RawNeRF 不僅是 NeRF 的擴展，更是一種全新的計算攝影工具。

論證技巧 / 潛在漏洞將 NeRF 的多視角聚合重新詮釋為去雜訊是出色的概念創新。然而，「隱式平均雜訊」的前提是雜訊是獨立同分布的——在實際相機中，固定模式雜訊（如暗電流）會在所有幀中一致出現，可能無法被多視角平均消除。

This work synthesizes three research areas. In novel view synthesis, methods from NeRF to mip-NeRF have progressively improved quality but always assume clean, tonemapped inputs. In image denoising, learning-based approaches have shown impressive results on raw-domain denoising, including SID (See in the Dark) which demonstrated extreme low-light enhancement from raw sensor data. However, these methods process single images or video frames independently, without exploiting 3D scene structure. In computational photography, HDR imaging typically requires bracketed exposures or specialized hardware, while synthetic defocus requires depth estimation or multi-aperture systems. RawNeRF uniquely addresses all three capabilities — denoising, HDR recovery, and computational photography — through a single 3D-consistent framework.

本研究綜合了三個研究領域。在新視角合成方面，從 NeRF 到 mip-NeRF 的方法逐步提升了品質，但始終假設乾淨的色調映射輸入。在影像去雜訊方面，基於學習的方法在原始域去雜訊上展現了令人印象深刻的結果，包括 SID（暗中可見）展示了從原始感測器資料進行極端低光增強。然而，這些方法獨立處理單張影像或視訊幀，未利用三維場景結構。在計算攝影方面，HDR 成像通常需要包圍曝光或專用硬體，而合成淺景深需要深度估計或多光圈系統。RawNeRF 獨特地透過單一三維一致框架同時實現去雜訊、HDR 恢復與計算攝影三項能力。

段落功能跨領域文獻定位——將 RawNeRF 置於三個研究領域的交叉點。

邏輯角色此段的結構高度策略性：分別指出三個領域各自的侷限，然後以 RawNeRF 作為統一解方。這建立了方法的跨領域貢獻宣稱。

論證技巧 / 潛在漏洞將三個領域的能力歸於一個框架是雄心勃勃的宣稱。但在每個領域的性能是否達到專用方法的水準需要逐一驗證。「統一框架」的吸引力不應掩蓋可能在個別任務上的效能妥協。

3. Method — 方法

3.1 RawNeRF Architecture — RawNeRF 架構

RawNeRF is built upon mip-NeRF, which uses integrated positional encoding over conical frustums for anti-aliased rendering. The key modification is that the network outputs linear raw color values instead of sRGB colors. An exponential activation function is used for the color output to ensure positivity and to naturally represent the wide dynamic range of linear raw values, which can span several orders of magnitude. The network directly processes full-resolution mosaicked Bayer pattern data, treating each color channel (R, G, G, B) as a separate observation at each pixel location, which provides additional training signal.

RawNeRF 建構於 mip-NeRF 之上，後者使用圓錐截面上的積分位置編碼進行抗鋸齒渲染。關鍵修改在於網路輸出線性原始色彩值而非 sRGB 色彩。色彩輸出使用指數啟動函數以確保正值，並自然地表示線性原始值的寬廣動態範圍（可跨越數個數量級）。網路直接處理全解析度的馬賽克 Bayer 模式資料，將每個色彩通道（R、G、G、B）視為每個像素位置的獨立觀測，提供了額外的訓練訊號。

段落功能架構設計——描述從 mip-NeRF 到 RawNeRF 的關鍵修改。

邏輯角色此段的核心在於「最小修改、最大效果」的論證策略：僅更改輸出空間（sRGB -> Raw）和啟動函數（sigmoid -> exp），即可解鎖 HDR 能力。

論證技巧 / 潛在漏洞將 Bayer 模式的每個通道視為獨立觀測是巧妙的做法——等於免費獲得了更多的訓練資料。但這假設了相鄰像素的場景輻射值是平滑的，在高頻紋理區域可能引入誤差。

3.2 Loss Function — 損失函數設計

Standard L2 loss on linear raw values is dominated by bright pixels, causing the network to ignore dark regions where noise is most problematic. The authors propose a tone-curve-weighted loss function that weights errors by the derivative of a logarithmic tone curve: (y_hat - y) / (sg(y_hat) + epsilon)^2, where sg() denotes stop-gradient. This weighting assigns equal perceptual importance to dark and bright regions, ensuring the network devotes capacity to reconstructing shadow details. Critically, the stop-gradient operator ensures unbiased gradient estimates even when training targets (raw pixels) contain noise, preventing the network from fitting to the noise.

線性原始值上的標準 L2 損失被亮像素主導，導致網路忽略雜訊最嚴重的暗區域。作者提出色調曲線加權損失函數，以對數色調曲線的導數加權誤差：(y_hat - y) / (sg(y_hat) + epsilon)^2，其中 sg() 表示停止梯度。此加權為暗區與亮區賦予相等的感知重要性，確保網路投入容量重建陰影細節。關鍵在於停止梯度運算子確保了即使訓練目標（原始像素）包含雜訊，仍能得到無偏梯度估計，防止網路擬合雜訊。

段落功能技術創新——解決線性空間中損失函數的偏差問題。

邏輯角色此段解決了在原始域中訓練的核心難題：如何在存在雜訊的訓練資料上學習出乾淨的表示。停止梯度的技巧是數學上的優雅解——它在不增加模型複雜度的情況下解決了偏差問題。

論證技巧 / 潛在漏洞停止梯度確保無偏性的數學論證嚴謹有力。但損失函數的加權方案本質上是一種啟發式設計——對數色調曲線的選擇是否最優？是否有理論依據支持這是最佳的感知加權？論文未提供替代加權方案的比較。

3.3 Variable Exposure Training — 可變曝光訓練

To maximize dynamic range, input captures may include bracketed exposures with varying shutter speeds. RawNeRF handles this by scaling network outputs by the shutter speed ratio — the network learns a canonical radiance at a reference exposure, and different exposures are modeled by multiplying the output by the appropriate shutter time. Additionally, per-channel affine calibration factors are learned to correct for sensor miscalibration and vignetting that may vary across frames. This design enables training from heterogeneous exposure collections, and at inference time, the user can render at any desired exposure level and apply sophisticated tonemapping algorithms such as HDR+.

為最大化動態範圍，輸入擷取可能包含具有不同快門速度的包圍曝光。RawNeRF 透過以快門速度比率縮放網路輸出來處理此情況——網路在參考曝光下學習正規輻射，不同曝光則以適當的快門時間乘以輸出來建模。此外，學習逐通道的仿射校準因子以校正感測器校準誤差與暗角效應等可能在幀間變化的因素。此設計使模型能從異質曝光集合中訓練，且在推論時使用者可以在任意期望曝光等級下渲染，並套用如 HDR+ 等精密的色調映射演算法。

段落功能擴展能力——描述如何處理異質曝光輸入並實現 HDR 控制。

邏輯角色此段擴展了 RawNeRF 從「單一曝光去雜訊」到「多曝光 HDR 重建」的能力範圍，使其從一個技術改進升級為計算攝影的通用平台。

論證技巧 / 潛在漏洞以快門速度比率縮放輸出的做法在物理上是正確的（線性空間中曝光與快門時間成正比）。但此假設忽略了實際相機中的非線性效應（如快門效率、感光元件飽和度），可能在極端曝光條件下引入誤差。

4. Experiments — 實驗

RawNeRF is evaluated on indoor and outdoor scenes captured in low-light conditions. For denoising performance, the method achieves results competitive with state-of-the-art denoisers including SID, Unprocess, RViDeNet, and UDVD, despite having no clean training data and using only camera pose information rather than the test image itself. For HDR applications, the method demonstrates exposure variation, sophisticated tonemapping (HDR+), and synthetic defocus with physically accurate bokeh effects on out-of-focus light sources. A key advantage is that RawNeRF produces 3D-consistent results across viewpoints, while single-image methods can produce temporally inconsistent outputs when applied to video sequences.

RawNeRF 在低光條件下拍攝的室內與室外場景上進行評估。在去雜訊效能方面，該方法達到了與 SID、Unprocess、RViDeNet 和 UDVD 等最先進去雜訊器競爭的結果，儘管沒有乾淨的訓練資料且僅使用相機姿態資訊而非測試影像本身。在 HDR 應用方面，該方法展示了曝光變化、精密色調映射（HDR+），以及在失焦光源上具有物理精確散景效果的合成淺景深。關鍵優勢在於 RawNeRF 產生跨視角三維一致的結果，而單影像方法在應用於視訊序列時可能產生時間上不一致的輸出。

段落功能實驗驗證——在去雜訊與 HDR 兩個維度上提供定量與定性證據。

邏輯角色此段的關鍵論點是「以弱得多的監督（無乾淨資料、僅姿態資訊）達到與專用去雜訊器相當的效能」——這是 RawNeRF 最令人信服的實證貢獻。

論證技巧 / 潛在漏洞「競爭性結果」的措辭留有餘地——是否在所有場景上都與專用去雜訊器匹敵？RawNeRF 需要 25-200 幀多視角輸入，而傳統去雜訊器僅需單張影像，這使得兩者的比較存在根本性的不對等。

5. Conclusion — 結論

RawNeRF demonstrates that training neural radiance fields directly on raw sensor data enables high dynamic range, low-noise view synthesis from challenging captures. By operating in linear raw space, the method preserves the full dynamic range of the scene and enables post-capture manipulation of exposure, tonemapping, and focus. The 3D multiview consistency inherent in NeRF serves as a powerful implicit denoiser. The authors acknowledge computational demands, dependence on COLMAP pose estimation, and inability to handle dynamic scenes as current limitations, but view RawNeRF as progress toward robust, high-quality capture of real-world environments in any lighting condition.

RawNeRF 證明了直接在原始感測器資料上訓練神經輻射場，能從具有挑戰性的擷取中實現高動態範圍、低雜訊的視角合成。在線性原始空間中運作，該方法保留了場景的完整動態範圍，並實現了拍攝後的曝光、色調映射與對焦操控。NeRF 固有的三維多視角一致性作為強大的隱式去雜訊器。作者坦承計算需求、對 COLMAP 姿態估計的依賴，以及無法處理動態場景為當前限制，但視 RawNeRF 為朝向在任何光照條件下穩健、高品質地捕捉真實世界環境的進展。

段落功能總結與展望——重申貢獻、坦承侷限、展望願景。

邏輯角色結論段完成論證閉環：從「NeRF 的 LDR 限制」到「RawNeRF 的 HDR 能力」，並以「任何光照條件」作為更宏大的願景收尾。

論證技巧 / 潛在漏洞坦承三項限制（計算成本、姿態依賴、靜態場景）展現了學術誠信。但最關鍵的限制——在暗光中可靠地估計相機姿態（COLMAP 在低對比場景中易失敗）——是方法實際適用性的最大瓶頸，值得更深入的討論。

論證結構總覽

問題
NeRF 依賴 ISP 處理
喪失 HDR 資訊

→

論點
直接在原始資料
上訓練輻射場

→

證據
去雜訊效能競爭
HDR/散景應用展示

→

反駁
多視角一致性
天然去雜訊機制

→

結論
任何光照條件下的
穩健場景捕捉

作者核心主張（一句話）

透過在線性原始感測器資料上直接訓練 NeRF，RawNeRF 利用三維多視角一致性實現隱式去雜訊，同時保留完整的高動態範圍資訊，使拍攝後的曝光、色調映射與對焦操控成為可能。

論證最強處

概念轉換的優雅性：將 NeRF 的多視角聚合重新詮釋為去雜訊機制，是一個深刻的洞察。在無乾淨訓練資料的條件下達到與專用去雜訊器競爭的效能，以及色調曲線加權損失函數的無偏梯度保證，展現了理論與實踐的完美結合。

論證最弱處

實際應用的前提條件過於嚴苛：方法需要 25-200 幀多視角靜態場景影像，且依賴 COLMAP 在低光條件下成功估計相機姿態——而 COLMAP 在暗光低對比場景中的可靠性本身就是未解問題。此外，無法處理動態場景嚴重限制了在真實低光場景（如夜間街景）中的適用性。