Group Normalization — 雙欄批注

Abstract — 摘要

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN's error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN's usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes.

批次正規化（BN）是深度學習發展中的里程碑技術，使各種網路得以訓練。然而，沿批次維度進行正規化會引發問題——當批次大小變小時，BN 的誤差急遽增加，原因是批次統計量估計不準確。這限制了 BN 在訓練較大模型以及將特徵遷移至電腦視覺任務（包括偵測、分割與影片處理）的應用，因為這些任務受記憶體限制需要使用小批次。本文提出群組正規化（GN）作為 BN 的簡單替代方案。GN 將通道劃分為群組，並在每個群組內計算均值與變異數進行正規化。GN 的計算不依賴於批次大小，其準確率在廣泛的批次大小範圍內保持穩定。

段落功能全文總覽——從 BN 的成功出發，指出其批次依賴的限制，引出 GN 方案。

邏輯角色經典的「讚美-批判-解決」三段式結構：先肯定 BN 的貢獻，再精準指出其痛點，最後提出 GN 作為替代。

論證技巧 / 潛在漏洞將 BN 的缺陷具體化為「小批次導致統計量不準」是清晰有力的問題定義。但 GN 是否在大批次情況下仍優於或等同於 BN，摘要中未明確說明。

1. Introduction — 緒論

Batch Normalization has been established as a very effective component in deep learning. BN normalizes the features by the mean and variance computed within a (mini-)batch. This has been shown by many practices to ease optimization and enable very deep networks to converge. BN has been a foundation of many competition-winning entries and important methodologies developed since 2015.

批次正規化已確立為深度學習中極為有效的元件。BN 透過在（小）批次內計算的均值與變異數來正規化特徵。許多實踐已證明它能簡化最佳化並使極深的網路得以收斂。BN 自 2015 年以來一直是許多競賽獲獎方案與重要方法論的基礎。

段落功能背景鋪陳——肯定 BN 的歷史貢獻。

邏輯角色先建立 BN 的重要性，使後續的批判更具分量——不是挑戰一個弱方法，而是改進一個里程碑技術。

論證技巧 / 潛在漏洞以「里程碑」和「競賽獲獎」的措辭建立 BN 的權威地位，使「改進 BN」的工作自動獲得較高的研究意義。

Despite its great success, BN exhibits drawbacks that are also caused by its distinct behavior of normalizing along the batch dimension. In particular, BN requires sufficiently large batch sizes to work effectively (e.g., 32 per worker). A small batch leads to inaccurate estimation of the batch statistics, and reducing BN's batch size increases the model error dramatically. This limits many visual recognition tasks that require large memory, such as object detection, semantic segmentation, and video recognition.

儘管取得了巨大成功，BN 也展現出由其沿批次維度正規化的獨特行為所導致的缺點。特別是，BN 需要足夠大的批次才能有效運作（例如每個工作器 32 個樣本）。小批次會導致批次統計量的不準確估計，且減小 BN 的批次大小會劇烈增加模型誤差。這限制了許多需要大量記憶體的視覺辨識任務，如物件偵測、語意分割與影片辨識。

段落功能問題診斷——BN 在小批次下的失效。

邏輯角色精準定位 BN 的核心缺陷，並將其影響範圍擴展到多個重要的視覺任務，強調問題的廣泛性。

論證技巧 / 潛在漏洞列舉三大應用場景（偵測、分割、影片）有效放大了問題的影響範圍。但未提及分散式訓練可部分緩解此問題（跨 GPU 同步 BN）。

Prior normalization methods include Layer Normalization (LN), Instance Normalization (IN), and Weight Normalization. LN operates along the channel dimension for each sample, and has been adopted in recurrent networks and Transformers. IN was originally designed for style transfer. However, LN and IN are less effective than BN for visual recognition tasks, because they either normalize over too many or too few channels. GN can be viewed as a natural interpolation between LN and IN.

先前的正規化方法包括層正規化（LN）、實例正規化（IN）與權重正規化。LN 對每個樣本沿通道維度運作，已被遞迴網路與 Transformer 採用。IN 最初為風格轉換而設計。然而，LN 與 IN 在視覺辨識任務上不如 BN 有效，因為它們對太多或太少的通道進行正規化。GN 可被視為 LN 與 IN 之間的自然內插。

段落功能文獻綜述——定位 GN 在正規化方法譜系中的位置。

邏輯角色建立一個「正規化方法的維度軸」：BN（批次）、LN（層）、IN（實例），GN 作為新的分組策略位於 LN 與 IN 之間。

論證技巧 / 潛在漏洞「GN 是 LN 與 IN 的內插」是一個極其優雅的定位，使新方法看起來既自然又必要。但也暗示了 GN 可能只是超參數（群組數量）的調整。

3. Group Normalization — 群組正規化

The central idea of Group Normalization is simple: we divide channels into groups and normalize within each group. The pixels in the same group are normalized together by the shared mean and variance. Formally, GN computes mean and variance for the set of pixels S_i defined by: S_i = {k | floor(k_C/G) = floor(i_C/G)}, where G is the number of groups, C is the number of channels, and k_C and i_C index the channel dimension. When G = 1, GN becomes equivalent to LN; when G = C, GN becomes equivalent to IN.

群組正規化的核心思想非常簡單：我們將通道劃分為群組並在每個群組內進行正規化。同一群組中的像素透過共享的均值與變異數一起被正規化。形式化地，GN 計算像素集合 S_i 的均值與變異數，定義為：S_i = {k | floor(k_C/G) = floor(i_C/G)}，其中 G 為群組數量，C 為通道數量，k_C 與 i_C 索引通道維度。當 G = 1 時，GN 等同於 LN；當 G = C 時，GN 等同於 IN。

段落功能核心方法——GN 的數學定義。

邏輯角色以簡潔的數學表述定義 GN，並透過 G 的極端值將其與 LN 和 IN 統一，強化「通用框架」的論述。

論證技巧 / 潛在漏洞方法的簡潔性本身就是最有力的論證——僅需一個超參數 G。與 LN/IN 的統一關係更增添了理論優雅性。但 G 的最佳選擇可能依賴於任務與架構。

The motivation behind grouping channels can be understood from classical feature engineering. Features like SIFT, HOG, and GIST are group-wise representations by design, where each group of channels is constructed by some kind of histogram. These features are often processed by normalization within each histogram or each orientation. GN inherits this group-wise normalization philosophy and applies it to learned deep features, where the grouping is not manually defined but implicitly learned by the network.

通道分組的動機可從經典特徵工程來理解。像 SIFT、HOG 與 GIST 等特徵本質上就是分組式的表徵，每組通道由某種直方圖建構。這些特徵通常在每個直方圖或每個方向內進行正規化。GN 繼承了這種分組正規化的哲學並將其應用於學習到的深度特徵，其中分組不是手動定義的，而是由網路隱式學習的。

段落功能直覺解釋——從經典特徵工程角度解釋 GN 的合理性。

邏輯角色為 GN 提供歷史根基，將其定位為經典 CV 智慧與深度學習的橋樑。

論證技巧 / 潛在漏洞連結 SIFT/HOG 的直方圖正規化是一個巧妙的類比，使 GN 看起來既有理論根基又不突兀。但深度網路中的通道是否真的形成類似直方圖的群組結構，是一個假設而非已證事實。

4. Experiments — 實驗

On ImageNet classification with ResNet-50, GN achieves comparable results to BN (only 0.5% lower top-1 error) when using a regular batch size of 32. Critically, when the batch size decreases, BN's error increases by 10.6% when using a batch size of 2, while GN's error increases by only 1.0%. This validates the core thesis that GN is robust to batch size variations.

在使用 ResNet-50 的 ImageNet 分類上，使用常規批次大小 32 時，GN 達到與 BN 相當的結果（top-1 誤差僅高 0.5%）。關鍵的是，當批次大小減小時，BN 在批次大小為 2 時誤差增加 10.6%，而 GN 的誤差僅增加 1.0%。這驗證了 GN 對批次大小變化具有穩健性的核心論點。

段落功能核心實驗——ImageNet 上的分類對比。

邏輯角色以最標準的基準（ImageNet + ResNet-50）提供可信的定量證據。10.6% vs 1.0% 的對比極具說服力。

論證技巧 / 潛在漏洞策略性地同時展示「正常批次下不輸」與「小批次下大幅優勢」，覆蓋了讀者可能的兩個疑慮。0.5% 的差距在正常批次下是否顯著，取決於具體應用。

For object detection and segmentation on COCO using Mask R-CNN, GN outperforms BN when the batch size is small (as is typical in these tasks). Specifically, GN achieves box AP of 40.3 and mask AP of 36.4, compared to BN's 39.2 and 35.4 respectively when using a batch of 2 images per GPU. GN also shows strong results in video classification on Kinetics, demonstrating its broad applicability across visual recognition tasks.

在使用 Mask R-CNN 進行 COCO 上的物件偵測與分割時，當批次大小較小時（這在這些任務中很常見），GN 優於 BN。具體而言，在每個 GPU 使用 2 張影像的批次時，GN 達到 box AP 40.3 與 mask AP 36.4，相比 BN 的 39.2 與 35.4。GN 在 Kinetics 影片分類上也展現強勁的結果，證明其在視覺辨識任務中的廣泛適用性。

段落功能延伸實驗——下游任務的驗證。

邏輯角色呼應緒論中提到的偵測、分割、影片三大應用場景，形成「問題-解決-驗證」的完整閉環。

論證技巧 / 潛在漏洞在實際需要小批次的任務上展示優勢，比純粹的 ImageNet 對比更具實際說服力。但 GN 引入的超參數 G 的敏感度分析值得更深入探討。

5. Conclusion — 結論

We have presented Group Normalization (GN) as a simple yet effective normalization method that is independent of batch sizes. GN can naturally transfer from pre-training to fine-tuning. We hope GN will become a powerful alternative to BN, especially for tasks that are constrained by small batch sizes. Given its simplicity and effectiveness, GN has the potential to become a fundamental building block in deep learning.

我們提出了群組正規化（GN）作為一種簡單但有效的、不依賴批次大小的正規化方法。GN 能自然地從預訓練遷移到微調。我們希望 GN 能成為 BN 的強大替代方案，尤其是對於受小批次限制的任務。鑑於其簡潔性與有效性，GN 有潛力成為深度學習中的基礎建構元件。

段落功能總結全文——重申 GN 的定位與展望。

邏輯角色結論收束於 GN 的「基礎元件」定位，呼應了 BN 作為「里程碑」的開場，暗示 GN 有相同層級的潛力。

論證技巧 / 潛在漏洞「基礎建構元件」的定位符合 Kaiming He 團隊一貫的風格——提出簡潔、通用的方法。後續的發展（如 Transformer 中普遍使用 LN）為 GN 的實際採用增添了一些不確定性。

論證結構總覽

問題
BN 依賴批次大小
小批次效能崩壞

→

論點
群組正規化
不依賴批次大小
效能穩定

→

證據
ImageNet 0.5% 差距
COCO AP +1.1
批次=2 穩定

→

反駁
大批次下略遜 BN
群組數 G 需調參

→

結論
GN 為小批次場景
的有效替代方案

作者核心主張（一句話）

群組正規化是一種簡單的通道分組正規化方法，其效能不隨批次大小變化，特別適合記憶體受限的視覺辨識任務。

論證最強處

統一框架的優雅性：透過群組數量 G 將 LN（G=1）與 IN（G=C）統一在同一框架下，使 GN 既有理論清晰度又有實踐靈活性。批次大小從 32 降到 2 時誤差僅增 1.0% vs BN 的 10.6%，是全文最具說服力的數據。

論證最弱處

正常批次下的代價：在正常批次大小（32）下，GN 仍比 BN 低 0.5%。對於不受批次大小限制的任務，GN 尚無取代 BN 的明確理由。此外，與 Synchronized BN 的對比不夠充分，後者也能解決跨 GPU 的小批次問題。