TensoRF — 雙欄批注

Abstract — 摘要

We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. We propose CP decomposition and a novel vector-matrix (VM) decomposition to factorize this 4D tensor into compact components. The VM decomposition relaxes the low-rank constraints for two modes of a tensor, factorizing tensors into compact vector and matrix factors. Beyond achieving superior rendering quality, models with CP and VM decompositions lead to a significantly lower memory footprint compared to previous works that directly optimize per-voxel features.

本文提出 TensoRF，一種建模與重建輻射場的新方法。不同於純粹使用 MLP 的 NeRF，我們將場景的輻射場建模為四維張量，表示一個三維體素網格，每個體素含多通道特徵。我們提出CP 分解和一種新穎的向量-矩陣（VM）分解來將此四維張量分解為緊湊組件。VM 分解放寬了張量兩個模式的低秩約束，將張量分解為緊湊的向量和矩陣因子。除了達到優越的渲染品質，CP 和 VM 分解的模型相較於直接最佳化逐體素特徵的先前工作，記憶體佔用顯著更低。

段落功能全文總覽——以張量分解重新定義輻射場表示。

邏輯角色摘要承諾了「品質 + 效率」的雙重突破，CP/VM 分解是核心方法論創新。

論證技巧 / 潛在漏洞以「張量分解」連結經典數學工具與現代神經渲染，吸引了兩個社群的讀者。但張量分解的最佳秩選擇可能需要場景特定的調整。

1. Introduction — 緒論

Neural Radiance Fields (NeRF) have revolutionized novel view synthesis by representing scenes as continuous volumetric functions parameterized by MLPs. However, NeRF's purely MLP-based representation requires lengthy training times (hours to days) and slow rendering speeds. Recent works have addressed this by replacing MLPs with explicit voxel grids (Plenoxels, DVGO), achieving dramatic speedups at the cost of much higher memory usage. We propose to bridge this gap by representing radiance fields as low-rank tensors, which combine the efficiency of explicit representations with the compactness of implicit ones.

神經輻射場（NeRF）透過以 MLP 參數化的連續體積函數表示場景，徹底改變了新視角合成。然而，NeRF 純 MLP 的表示需要漫長的訓練時間（數小時至數天）和緩慢的渲染速度。近期工作透過以顯式體素網格（Plenoxels、DVGO）取代 MLP，在大幅加速的同時付出了更高的記憶體代價。我們提出以低秩張量表示輻射場，結合顯式表示的效率與隱式表示的緊湊性。

段落功能建立問題意識——指出 MLP 方法（慢）與體素方法（耗記憶體）的困境。

邏輯角色以「速度 vs. 記憶體」的折衷作為二分法，為張量分解的「第三條路」奠定基礎。

論證技巧 / 潛在漏洞巧妙地將問題框定為兩個極端之間的折衷，使 TensoRF 成為自然的中間解。

2. Method — 方法

We model the radiance field as a 4D tensor G in R^{X x Y x Z x P} where X, Y, Z are spatial dimensions and P encodes appearance features. For CP (CANDECOMP/PARAFAC) decomposition, G is factorized as a sum of R rank-one tensors: G = sum_r v_r^X (outer) v_r^Y (outer) v_r^Z (outer) v_r^P. This requires storing only 4R vectors instead of X*Y*Z*P values. For our proposed Vector-Matrix (VM) decomposition, we factorize G into pairs of vectors and matrices: G = sum_r v_r^X (outer) M_r^{YZ} + v_r^Y (outer) M_r^{XZ} + v_r^Z (outer) M_r^{XY}, where each component captures one spatial axis as a vector and the remaining two as a matrix.

我們將輻射場建模為四維張量 G 屬於 R^{X x Y x Z x P}，其中 X、Y、Z 為空間維度，P 編碼外觀特徵。對於 CP 分解，G 被分解為 R 個秩一張量之和：G = sum_r v_r^X (outer) v_r^Y (outer) v_r^Z (outer) v_r^P。這僅需儲存 4R 個向量而非 X*Y*Z*P 個值。對於我們提出的向量-矩陣（VM）分解，我們將 G 分解為向量與矩陣的組合：G = sum_r v_r^X (outer) M_r^{YZ} + v_r^Y (outer) M_r^{XZ} + v_r^Z (outer) M_r^{XY}，每個組件以向量捕捉一個空間軸，以矩陣捕捉其餘兩個。

段落功能核心方法——CP 分解與 VM 分解的數學定義。

邏輯角色以嚴謹的數學符號建立兩種分解策略，VM 分解作為 CP 的改良是主要創新。

論證技巧 / 潛在漏洞VM 分解在數學上是 CP 分解的推廣，透過放寬秩約束獲得更高的表達力。但三個方向的非對稱分解是否引入方向偏差值得關注。

The VM decomposition offers a favorable trade-off between compactness and expressiveness. While CP decomposition constrains all four modes to be rank-R, VM decomposition allows the matrix factors to have higher rank, capturing richer spatial correlations along two axes simultaneously. In practice, the features at any 3D location are retrieved by trilinear interpolation from the vector and matrix components, followed by a lightweight MLP to predict color and density. The total model size is typically 4-16 MB, compared to 200+ MB for voxel-based methods at comparable quality.

VM 分解提供了緊湊性與表達力之間的有利折衷。CP 分解將所有四個模式約束為秩 R，而 VM 分解允許矩陣因子具有更高的秩，同時沿兩個軸捕捉更豐富的空間相關性。在實務中，任意三維位置的特徵透過對向量和矩陣組件的三線性內插取得，隨後以輕量 MLP 預測顏色和密度。模型總大小通常為 4-16 MB，而品質相當的體素方法需 200+ MB。

段落功能優勢論述——量化 VM 分解的記憶體優勢。

邏輯角色4-16 MB vs. 200+ MB 的對比為核心論點提供了強有力的量化支撐。

論證技巧 / 潛在漏洞記憶體比較極具說服力。但使用了輕量 MLP 作為解碼器，使得方法並非純粹的顯式表示。

3. Experiments — 實驗

We evaluate TensoRF on standard benchmarks including Synthetic-NeRF, Synthetic-NSVF, and real-world scenes from Tanks and Temples and Forward-Facing datasets. TensoRF-VM achieves PSNR of 33.14 dB on Synthetic-NeRF, surpassing NeRF (31.01 dB), Plenoxels (31.71 dB), and DVGO (31.95 dB). Training time is approximately 30 minutes on a single GPU, compared to hours for NeRF. The model achieves rendering speeds of 100+ FPS at 800x800 resolution, making real-time applications feasible. On real-world scenes, TensoRF demonstrates comparable or superior LPIPS scores to concurrent works while using significantly less memory.

我們在標準基準上評估 TensoRF，包括 Synthetic-NeRF、Synthetic-NSVF 以及 Tanks and Temples 和前向場景資料集的真實世界場景。TensoRF-VM 在 Synthetic-NeRF 上達到 33.14 dB PSNR，超越 NeRF（31.01 dB）、Plenoxels（31.71 dB）和 DVGO（31.95 dB）。訓練時間為單 GPU 約 30 分鐘，而 NeRF 需要數小時。模型在 800x800 解析度下達到100+ FPS 的渲染速度，使即時應用成為可能。在真實世界場景上，TensoRF 展現了與同期工作相當或更優的 LPIPS 分數，同時使用顯著更少的記憶體。

段落功能核心實驗結果——多基準、多指標的全面比較。

邏輯角色以 PSNR、訓練時間、渲染速度和記憶體四個維度全面展示優勢。

論證技巧 / 潛在漏洞多維度的比較增強了說服力。但 PSNR 提升是否在視覺上可感知，以及在動態場景中的表現，是讀者可能追問的方向。

Ablation studies reveal that VM decomposition consistently outperforms CP decomposition by 0.8-1.5 dB PSNR across all scenes, validating the benefit of relaxed rank constraints. Increasing the number of components R from 48 to 192 yields diminishing returns beyond R=96, suggesting an effective capacity saturation point. The coarse-to-fine training strategy, where resolution is progressively increased, contributes +0.5 dB and 20% faster convergence compared to fixed-resolution training.

消融研究顯示 VM 分解在所有場景上一致地以 0.8-1.5 dB PSNR 超越 CP 分解，驗證了放寬秩約束的效益。將組件數 R 從 48 增加至 192 時，超過 R=96 後收益遞減，暗示了有效的容量飽和點。由粗至細的訓練策略（逐步提升解析度）相較固定解析度訓練貢獻了+0.5 dB 和 20% 更快的收斂。

段落功能消融分析——驗證各設計選擇。

邏輯角色VM vs. CP 的直接比較是論文核心創新的最佳驗證。

論證技巧 / 潛在漏洞容量飽和點的發現提供了實用的超參數選擇指引。

4. Conclusion — 結論

We have presented TensoRF, a tensor decomposition approach for efficient and high-quality radiance field reconstruction. The proposed VM decomposition provides a compelling middle ground between pure MLP representations and explicit voxel grids, achieving state-of-the-art rendering quality with dramatically reduced memory requirements. TensoRF opens new directions for applying classical tensor algebra to neural scene representations, potentially extending to dynamic scenes, relighting, and large-scale reconstruction.

本文提出了 TensoRF，一種用於高效高品質輻射場重建的張量分解方法。所提出的 VM 分解在純 MLP 表示與顯式體素網格之間提供了引人注目的中間路線，以大幅降低的記憶體需求達到最先進的渲染品質。TensoRF 為將經典張量代數應用於神經場景表示開闢了新方向，有潛力擴展至動態場景、重新照明和大規模重建。

段落功能全文總結——重申張量分解路線的核心價值。

邏輯角色以「middle ground」的比喻精準地定位了 TensoRF 在 NeRF 生態系統中的角色。

論證技巧 / 潛在漏洞提到的動態場景和大規模重建是合理的未來方向，但需要額外的技術突破。

Abstract — 摘要

1. Introduction — 緒論

2. Method — 方法

3. Experiments — 實驗

4. Conclusion — 結論

論證結構總覽

核心主張

最強論點

最弱環節