Grad-CAM: Visual Explanations from Deep Networks

Abstract — 摘要

We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent. Our approach — Gradient-weighted Class Activation Mapping (Grad-CAM) — uses the gradients of any target concept flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Grad-CAM is applicable to a wide variety of CNN model-families including CNNs with fully-connected layers (e.g., VGG), CNNs used for structured outputs (e.g., captioning), CNNs used in tasks with multi-modal inputs (e.g., VQA), without any architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM. We show that these visualizations help identify dataset bias, build trust in model predictions, and enable model diagnosis.

我們提出一種技術，為基於 CNN 的大量模型決策生成「視覺解釋」，使其更加透明。我們的方法——梯度加權類別啟動映射（Grad-CAM）——利用任何目標概念流入最後摺積層的梯度，產生粗略的定位圖，突顯影像中對預測該概念最重要的區域。Grad-CAM 適用於各種 CNN 模型族群，包括具全連接層的 CNN（如 VGG）、用於結構化輸出的 CNN（如影像描述）、用於多模態輸入任務的 CNN（如視覺問答），無需任何架構修改或重新訓練。我們將 Grad-CAM 與現有的細粒度視覺化方法結合，創建高解析度且具類別鑑別性的視覺化 Guided Grad-CAM。我們展示這些視覺化有助於識別資料集偏差、建立對模型預測的信任，以及實現模型診斷。

段落功能全文總覽——定義 Grad-CAM 的技術核心與廣泛適用性，並預告其實務價值。

邏輯角色摘要以三層結構推進：技術方法（梯度定位圖）-> 適用範圍（多種架構）-> 實務價值（偏差偵測、信任建立）。

論證技巧 / 潛在漏洞「無需架構修改或重新訓練」是極強的通用性主張，直接回應了 CAM 需要特定架構（全域平均池化）的限制。但「粗略定位圖」的措辭暗示解析度有限，Guided Grad-CAM 的引入即為彌補此不足。

1. Introduction — 緒論

Deep neural networks have enabled unprecedented breakthroughs across tasks in computer vision, but their lack of decomposability into intuitive and understandable components makes them hard to interpret. As these models are increasingly deployed in real-world applications, the need for transparency and interpretability becomes paramount. A "good" visual explanation should be class-discriminative (localizing the category in the image) and high-resolution (capturing fine-grained detail). Prior visualization approaches achieve one but not both: pixel-space gradient methods (Guided Backpropagation, Deconvolution) are high-resolution but not class-discriminative; CAM is class-discriminative but requires specific architectures with global average pooling layers directly before the softmax.

深度神經網路在電腦視覺的各項任務上實現了前所未有的突破，但其缺乏可分解為直覺且可理解組件的特性，使其難以解釋。隨著這些模型越來越多地部署於真實世界的應用中，透明度與可解釋性的需求變得至關重要。「好的」視覺解釋應具備類別鑑別性（定位影像中的類別）且高解析度（擷取細粒度細節）。先前的視覺化方法只能達到其中之一：像素空間梯度方法（引導式反向傳播、反摺積）具高解析度但缺乏類別鑑別性；CAM 具類別鑑別性但需要在 softmax 之前直接使用全域平均池化層的特定架構。

段落功能建立研究動機——從可解釋性的需求出發，指出現有方法的二元困境。

邏輯角色以「類別鑑別性 vs. 高解析度」的對立框架建立 Grad-CAM 的定位：一個同時滿足兩者的方法。

論證技巧 / 潛在漏洞二元對立的框架使問題定義清晰明確。但「透明度」與「可解釋性」的定義仍具爭議性——粗略的熱力圖是否真正構成「解釋」，在可解釋 AI 領域仍有辯論。

Class Activation Mapping (CAM) identifies discriminative image regions by using weights from the global average pooling layer to linearly combine feature maps. While effective, CAM is restricted to architectures ending with convolutional feature maps followed by global average pooling and a softmax layer. Gradient-based methods like Guided Backpropagation and Deconvolution provide pixel-level detail but are not class-discriminative — they highlight the same regions regardless of the target class. We prove that Grad-CAM is a strict generalization of CAM: when applied to the specific architecture CAM requires, Grad-CAM produces identical results, but Grad-CAM can be applied to any CNN architecture without modification.

類別啟動映射（CAM）透過使用全域平均池化層的權重線性組合特徵圖來辨識具鑑別性的影像區域。雖然有效，但 CAM 受限於以摺積特徵圖接全域平均池化再接 softmax 層結尾的架構。梯度方法如引導式反向傳播與反摺積提供像素層級的細節，但不具類別鑑別性——無論目標類別為何，它們都突顯相同的區域。我們證明 Grad-CAM 是 CAM 的嚴格泛化：當應用於 CAM 所需的特定架構時，Grad-CAM 產生相同的結果，但 Grad-CAM 可應用於任何 CNN 架構而無需修改。

段落功能文獻定位——確立 Grad-CAM 為 CAM 的嚴格泛化。

邏輯角色「嚴格泛化」的數學證明是極為有力的定位——它意味著 Grad-CAM 在所有 CAM 能用的場景中表現相同，但適用範圍更廣。

論證技巧 / 潛在漏洞以數學證明建立泛化關係，比僅靠實驗對比更有說服力。但此證明僅在特定架構條件下成立；在一般架構中，Grad-CAM 的理論保證較弱。

3. Approach — 方法

Grad-CAM computes the gradient of the score for class c (y^c) with respect to feature map activations A^k of the last convolutional layer. These gradients are global-average-pooled to obtain the neuron importance weights alpha_k^c. The Grad-CAM localization map is then: L_Grad-CAM^c = ReLU(sum_k(alpha_k^c * A^k)). The ReLU is applied because we are only interested in features that have a positive influence on the class of interest — pixels whose intensity should be increased to increase y^c. The resulting map has the same spatial dimensions as the last convolutional feature maps (e.g., 14x14 for VGG-16) and is upsampled to image resolution for visualization.

Grad-CAM 計算類別 c 的分數（y^c）相對於最後摺積層特徵圖啟動值 A^k 的梯度。這些梯度經全域平均池化以取得神經元重要性權重 alpha_k^c。Grad-CAM 定位圖為：L_Grad-CAM^c = ReLU(sum_k(alpha_k^c * A^k))。施加 ReLU 是因為我們僅關注對目標類別有正向影響的特徵——即強度應增加以提高 y^c 的像素。產生的定位圖與最後摺積特徵圖具有相同的空間維度（如 VGG-16 的 14x14），再上取樣至影像解析度進行視覺化。

段落功能核心演算法——定義 Grad-CAM 的計算流程。

邏輯角色此段是全文的技術核心。Grad-CAM 的計算極為簡潔：一次前向傳播 + 一次部分反向傳播 + 加權求和 + ReLU。簡潔性是其被廣泛採用的關鍵。

論證技巧 / 潛在漏洞 ReLU 的使用有明確的物理動機（僅保留正向影響）。但此選擇也意味著 Grad-CAM 忽略了負向影響的區域——這些區域可能同樣具有解釋價值（例如，某區域的存在降低了某類別的信心）。

3.2 Guided Grad-CAM

While Grad-CAM provides class-discriminative localization, it is coarse (e.g., 14x14). To achieve both high-resolution and class-discriminative visualizations, we fuse Grad-CAM with Guided Backpropagation via element-wise multiplication: the Grad-CAM map is first upsampled to the input resolution, then multiplied with the Guided Backpropagation map. This produces Guided Grad-CAM, which retains fine-grained pixel detail from Guided Backpropagation while being modulated by the class-discriminative coarse map from Grad-CAM. In human studies, Guided Grad-CAM achieved 61.23% accuracy in class discrimination tests, compared to Guided Backpropagation's 44.44%.

雖然 Grad-CAM 提供具類別鑑別性的定位，但其解析度粗略（如 14x14）。為同時實現高解析度與類別鑑別性的視覺化，我們將 Grad-CAM 與引導式反向傳播透過逐元素乘法融合：先將 Grad-CAM 定位圖上取樣至輸入解析度，再與引導式反向傳播圖相乘。這產生了 Guided Grad-CAM，保留了引導式反向傳播的細粒度像素細節，同時受 Grad-CAM 的類別鑑別粗略圖調制。在人類研究中，Guided Grad-CAM 在類別鑑別測試中達到 61.23% 準確率，而引導式反向傳播為 44.44%。

段落功能解析度提升——描述如何融合粗略定位與細粒度細節。

邏輯角色直接回應緒論中設定的二元困境：Guided Grad-CAM = 高解析度（Guided Backprop）+ 類別鑑別性（Grad-CAM）。

論證技巧 / 潛在漏洞逐元素乘法的融合策略簡潔且直覺。但此方法假設兩個來源的視覺化在空間上一致且互補——當它們衝突時（例如 Grad-CAM 認為重要但 Guided Backprop 沒有梯度的區域），結果可能不可靠。

4. Evaluating Visualizations — 評估視覺化

We evaluate Grad-CAM across multiple dimensions. For weakly-supervised localization on ILSVRC-15, Grad-CAM achieves 56.51% top-1 localization error on VGG-16, outperforming CAM (57.20%) while maintaining classification performance. On weakly-supervised segmentation (PASCAL VOC 2012), replacing CAM with Grad-CAM as seeds improved Intersection over Union from 44.6 to 49.6. In the pointing game evaluation, Grad-CAM achieved 70.58% accuracy versus c-MWP's 60.30%. For faithfulness evaluation, Grad-CAM's rank correlation with occlusion sensitivity reached 0.254, outperforming Guided Backpropagation (0.168), CAM (0.208), and c-MWP (0.220), demonstrating better fidelity to actual model behavior despite being more interpretable.

我們從多個維度評估 Grad-CAM。在 ILSVRC-15 的弱監督定位上，Grad-CAM 在 VGG-16 上達到 56.51% 的 top-1 定位錯誤率，優於 CAM（57.20%）同時維持分類效能。在 PASCAL VOC 2012 的弱監督分割上，以 Grad-CAM 取代 CAM 作為種子，將交集比聯集從 44.6 提升至 49.6。在指向遊戲評估中，Grad-CAM 達到 70.58% 準確率，而 c-MWP 為 60.30%。在忠實度評估中，Grad-CAM 與遮擋敏感度的等級相關達到 0.254，優於引導式反向傳播（0.168）、CAM（0.208）與 c-MWP（0.220），展示了在更具可解釋性的同時對實際模型行為有更好的忠實度。

段落功能多維度定量評估——以多種基準驗證 Grad-CAM 的各項品質。

邏輯角色四個評估維度構成全面的驗證：定位精度、分割品質、指向準確度、忠實度。每個維度均超越先前方法。

論證技巧 / 潛在漏洞多維度評估展現了方法的全面優勢。但忠實度指標（與遮擋敏感度的相關性）本身也有侷限——遮擋敏感度假設像素移除的影響是局部的，這在 CNN 的感受野較大時可能不成立。

5. Diagnosing CNNs with Grad-CAM — 以 Grad-CAM 診斷 CNN

We demonstrate Grad-CAM's utility for model diagnosis and bias detection. In a doctor/nurse classification task, Grad-CAM revealed that the model learned gender stereotypes by focusing on faces and hairstyles rather than medical equipment. The training data was gender-imbalanced: 78% male doctors, 93% female nurses, causing poor generalization (82% test accuracy). After adding balanced gender representation, accuracy improved to 90% with the model correctly focusing on relevant features. Grad-CAM also revealed that "seemingly unreasonable predictions have reasonable explanations" — misclassifications often occurred because the model focused on legitimate but insufficient features. For adversarial robustness, despite networks assigning near-zero probability to true categories after perturbations, Grad-CAM correctly localized the original objects, suggesting adversarial attacks affect the classification layer more than feature representations.

我們展示 Grad-CAM 在模型診斷與偏差偵測上的效用。在醫生/護理師分類任務中，Grad-CAM 揭示了模型學習到性別刻板印象——聚焦於臉部與髮型而非醫療設備。訓練資料存在性別不平衡：78% 的男性醫生、93% 的女性護理師，導致泛化不良（82% 測試準確率）。加入平衡的性別代表後，準確率提升至 90%，模型正確聚焦於相關特徵。Grad-CAM 也揭示了「看似不合理的預測有合理的解釋」——錯誤分類通常發生在模型聚焦於合法但不充分的特徵時。在對抗穩健性方面，儘管網路在擾動後對真實類別給出接近零的機率，Grad-CAM 仍正確定位了原始物件，暗示對抗攻擊影響分類層多於特徵表示。

段落功能實務應用——展示 Grad-CAM 在偏差偵測與模型診斷中的價值。

邏輯角色此段將 Grad-CAM 從純技術工具提升至具有社會影響力的工具。性別偏差案例尤其引人注目，直接連結到 AI 公平性的議題。

論證技巧 / 潛在漏洞醫生/護理師的案例研究極具說服力且具現實意義。但此案例相對簡單（二分類、明顯的視覺線索差異）；在更複雜的偏差場景中，Grad-CAM 的診斷能力可能受限於其粗略的解析度。

6. Conclusion — 結論

We proposed Grad-CAM, a class-discriminative localization technique that is applicable to any CNN-based architecture without architectural changes or retraining. By combining Grad-CAM with Guided Backpropagation, Guided Grad-CAM achieves both high-resolution and class-discriminative visualizations. We demonstrated that these visualizations are faithful to the model, help identify dataset biases, build appropriate trust, and enable diagnosis across multiple architectures and tasks including image classification, captioning, and visual question answering. Grad-CAM represents a step toward making deep learning models more transparent and trustworthy.

我們提出了 Grad-CAM，一種類別鑑別的定位技術，適用於任何基於 CNN 的架構而無需架構修改或重新訓練。透過結合 Grad-CAM 與引導式反向傳播，Guided Grad-CAM 同時實現高解析度與類別鑑別性的視覺化。我們展示了這些視覺化忠實於模型、有助於識別資料集偏差、建立適當的信任，並跨越多種架構與任務（包括影像分類、影像描述與視覺問答）實現診斷。Grad-CAM 代表了朝向使深度學習模型更透明、更可信的一步。

段落功能總結全文——重申通用性與實務價值。

邏輯角色結論精準回應了緒論設定的二元困境（解析度 vs. 類別鑑別性），並以更高層次的「透明與可信」作為收束。

論證技巧 / 潛在漏洞「朝向...的一步」的謙遜措辭適切——Grad-CAM 確實不是可解釋 AI 的終極解決方案，但作為視覺化工具，它的影響力已遠超論文本身（成為深度學習除錯的標準工具之一）。

論證結構總覽

問題
CNN 決策不透明
缺乏視覺解釋

→

論點
梯度加權定位圖
適用於任何 CNN

→

證據
多基準超越 CAM +
偏差偵測案例研究

→

反駁
Guided Grad-CAM 融合
解決解析度不足

→

結論
邁向透明可信的
深度學習模型

作者核心主張（一句話）

透過計算目標類別梯度對最後摺積層的全域平均池化權重，Grad-CAM 能為任何 CNN 架構生成類別鑑別的視覺解釋，無需修改架構或重新訓練，並可用於模型診斷與偏差偵測。

論證最強處

通用性的數學證明：證明 Grad-CAM 是 CAM 的嚴格泛化，賦予方法堅實的理論地位。醫生/護理師的資料集偏差案例將技術貢獻連結至社會影響，極具說服力。多架構、多任務的全面評估（VGG、ResNet、影像描述、VQA）令人信服。

論證最弱處

解析度與深度的取捨：Grad-CAM 的輸出解析度受限於最後摺積層的空間維度（如 14x14），雖然 Guided Grad-CAM 提供了補救，但融合策略缺乏理論保證。此外，「解釋」的定義仍具爭議——熱力圖顯示模型「看哪裡」，但不解釋模型「如何推理」。