OpenGAN: Open-Set Recognition via Open Data Generation

Abstract — 摘要

An important open problem in computer vision is open-set recognition: determining whether input data is from known or unknown classes. Two dominant approaches exist for open-set discrimination: learning a binary classifier from outlier data and using a GAN discriminator as a likelihood function. Both have limitations: binary classifiers overfit to available outlier data, while GAN discriminators are unstable and unreliable. OpenGAN addresses these through three innovations: (1) carefully selecting GAN discriminators on real outlier data, (2) augmenting open training examples with adversarially synthesized fake data, and (3) building discriminators over features from K-way networks rather than pixels.

電腦視覺中一個重要的開放性問題是開集辨識：判定輸入資料是否來自已知或未知類別。開集判別存在兩種主流方法：從離群資料學習二元分類器，以及使用 GAN 判別器作為似然函數。兩者皆有局限：二元分類器對可用的離群資料過度擬合，而 GAN 判別器不穩定且不可靠。OpenGAN 透過三項創新加以解決：(1) 以真實離群資料仔細選擇 GAN 判別器，(2) 以對抗式合成的假資料擴增開放訓練樣本，(3) 在 K 路網路的特徵上而非像素上建構判別器。

段落功能全文總覽——以「兩種方法的互補缺陷」為框架引出 OpenGAN 的三重創新。

邏輯角色摘要以「二元分類器 vs GAN 判別器」的二分法建立問題空間，再以三點列舉的方式預告解決方案，結構清晰且便於讀者快速掌握貢獻。

論證技巧 / 潛在漏洞將「GAN 判別器不穩定」定為已知缺陷，再以「仔細選擇」化解，暗示問題不在方法本身而在使用方式——這是一個巧妙的重新框架。但「仔細選擇」依賴驗證用的離群資料，此額外需求在實際應用中可能是個限制。

1. Introduction — 緒論

Real-world recognition systems must handle open-set conditions: inputs may belong to classes not seen during training. This is critical for safety applications such as autonomous driving, where "contemporary benchmarks such as Cityscapes focus on K classes of interest for evaluation, ignoring a sizeable set of 'other' pixels that include vulnerable objects like wheelchairs and strollers." Two conceptual approaches exist: learning binary open-vs-closed classifiers from outlier data (which generalizes poorly to diverse test data) and using GAN discriminators as likelihood functions (which performs poorly due to unstable GAN training). OpenGAN combines these insights by operating on learned feature representations rather than raw pixels.

真實世界的辨識系統必須處理開集條件：輸入可能屬於訓練時未見過的類別。這對自動駕駛等安全應用至關重要——「當代基準如 Cityscapes 專注於 K 個感興趣的類別進行評估，忽略了包含輪椅和嬰兒車等脆弱物件的大量『其他』像素。」存在兩種概念性方法：從離群資料學習二元開放-封閉分類器（對多元測試資料的泛化能力不佳），以及使用 GAN 判別器作為似然函數（因 GAN 訓練不穩定而表現欠佳）。OpenGAN 透過在學習的特徵表示而非原始像素上操作來結合這些洞見。

段落功能建立問題動機——以自動駕駛安全場景賦予開集辨識急迫的現實意義。

邏輯角色以「輪椅和嬰兒車」等具體物件喚起讀者的安全意識，使抽象的開集問題變得生死攸關。隨後的兩種方法對比為 OpenGAN 的「融合」策略建立邏輯必然性。

論證技巧 / 潛在漏洞以「脆弱物件」的具體例子增強論文的社會影響力訴求。但自動駕駛場景中的開集問題與一般影像分類的開集問題有顯著差異——論文需在兩種場景上都提供驗證。

Open-set recognition methods typically train K-way classifiers then exploit them for detecting unknown classes through density estimation, uncertainty modeling, and reconstruction errors. OpenMax replaces the softmax layer with a calibrated open-set probability. Prior work using GANs for open-set recognition generates fake data to augment training sets and relies on reconstruction error, but the discriminator itself has not been successfully used as a likelihood function due to training instability. Outlier exposure methods show that simple binary classifiers work "surprisingly well" but overfit to available outlier data, failing to generalize across datasets.

開集辨識方法通常先訓練 K 路分類器，再透過密度估計、不確定性建模與重建誤差來偵測未知類別。OpenMax 以校準的開集機率取代 softmax 層。先前利用 GAN 進行開集辨識的研究生成假資料以擴增訓練集，依賴重建誤差，但判別器本身因訓練不穩定而未能成功作為似然函數使用。離群暴露法顯示，簡單的二元分類器「出人意料地有效」，但對可用的離群資料過度擬合，無法跨資料集泛化。

段落功能文獻回顧——系統性地盤點三種方法論路線的成就與局限。

邏輯角色建立「判別器不能直接做似然函數」的學界共識，使 OpenGAN 的成功更具衝擊力——它挑戰了這一共識並證明判別器在適當條件下是可行的。

論證技巧 / 潛在漏洞引用「出人意料地有效」的評語巧妙地設定了基準——既肯定了簡單方法的價值，又指出其泛化缺陷，為 OpenGAN 的改進留出空間。

3. Method — 方法

3.1 OpenGAN Formulation

The OpenGAN objective combines real and synthetic open data: max_D min_G E[log D(x)] + lambda_o * E[log(1-D(x_bar))] + lambda_G * E[log(1-D(G(z)))], where D(x) represents the probability of closed-set membership, x are closed-set features, x_bar are real outlier features, and G(z) generates synthetic outlier features. The generator G learns to produce features that fool the discriminator into classifying them as open-set, while the discriminator D learns to distinguish closed-set features from both real and synthesized outliers. This adversarial interplay creates a more robust decision boundary than training on real outliers alone.

OpenGAN 的目標函數結合了真實與合成的開放資料：max_D min_G E[log D(x)] + lambda_o * E[log(1-D(x_bar))] + lambda_G * E[log(1-D(G(z)))]，其中 D(x) 表示封閉集合成員的機率，x 為封閉集特徵，x_bar 為真實離群特徵，G(z) 生成合成離群特徵。生成器 G 學習產生能欺騙判別器將其歸為開集的特徵，而判別器 D 學習將封閉集特徵與真實及合成離群值區分開來。此對抗式互動創造了比僅以真實離群值訓練更穩健的決策邊界。

段落功能方法核心——定義 OpenGAN 的三項目標函數及其對抗式訓練機制。

邏輯角色以數學公式精確定義方法：三個損失項分別對應封閉集、真實離群與合成離群。lambda 參數控制三者的權衡，但作者在此段未說明如何設定。

論證技巧 / 潛在漏洞公式化的三項結構清晰地對應三種資料來源。但 GAN 的 min-max 最佳化本身就是訓練不穩定的根源——作者在下一段需解釋如何克服此問題。

3.2 Features vs. Pixels — 特徵空間判別

The critical design choice is building discriminators over "off-the-shelf features computed by closed-world K-way networks" rather than raw pixels. Using a pre-trained ResNet-18 or HRNet to extract low-dimensional feature vectors (512-dim or 720-dim), the GAN operates in a compact feature space where the discriminator architecture is a simple MLP with batch normalization and LeakyReLU (~2MB). This design choice yields dramatic improvements: on Cityscapes segmentation, feature-based OpenGAN achieves 0.885 AUROC vs. only 0.549 for pixel-based variants. The features already encode class-discriminative information learned during closed-world training, giving the discriminator a strong starting representation.

關鍵設計選擇是在「封閉世界 K 路網路計算的現成特徵」上而非原始像素上建構判別器。使用預訓練的 ResNet-18 或 HRNet 提取低維特徵向量（512 維或 720 維），GAN 在緊湊的特徵空間中運作，判別器架構為簡單的 MLP 加批次正規化與 LeakyReLU（約 2MB）。此設計選擇帶來顯著改進：在 Cityscapes 分割上，基於特徵的 OpenGAN 達到 0.885 AUROC，而基於像素的變體僅 0.549。這些特徵已編碼了在封閉世界訓練中學得的類別判別資訊，為判別器提供了強力的起始表示。

段落功能核心洞察——以實證證明特徵空間操作相對於像素空間的壓倒性優勢。

邏輯角色 0.885 vs 0.549 的對比是全文最具說服力的單一數據點。它不僅支持「特徵優於像素」的論點，也解釋了為何先前的 GAN 判別器方法失敗——它們在錯誤的空間中操作。

論證技巧 / 潛在漏洞「現成特徵」的使用使方法極具實用性——無需重新訓練特徵提取器。但這也意味著方法的性能上限受限於封閉世界分類器的特徵品質。若分類器本身性能不佳，OpenGAN 的改進空間可能有限。

3.3 Open Validation — 開放驗證

A crucial finding is that model selection requires a validation set of real outlier data. The authors demonstrate that longer GAN training does not consistently improve open-set performance due to training instability. The solution is "open validation": using a small held-out set of real outlier examples to select the discriminator checkpoint that achieves the best open-vs-closed classification accuracy. This is critical because "synthesized data are insufficient for model selection" — the generator may collapse or drift, producing samples that no longer represent the true open distribution. The requirement for a small outlier validation set is a practical limitation, but the authors show that even a few hundred examples suffice.

一項關鍵發現是模型選擇需要真實離群資料的驗證集。作者展示了由於訓練不穩定，更長的 GAN 訓練並不一致地改善開集性能。解決方案是「開放驗證」：使用小規模的保留真實離群樣本集來選擇達到最佳開放-封閉分類準確率的判別器檢查點。這至關重要，因為「合成資料不足以進行模型選擇」——生成器可能崩潰或漂移，產生不再代表真實開放分布的樣本。對小規模離群驗證集的需求是一個實際限制，但作者展示即使幾百個樣本就已足夠。

段落功能關鍵讓步——坦承方法需要少量真實離群驗證資料。

邏輯角色此段展現了學術誠實：主動揭示「合成資料不足以選模型」的限制，同時以「幾百個樣本即可」緩解讀者的擔憂。這種先承認再化解的策略增強了信任感。

論證技巧 / 潛在漏洞「開放驗證」概念的引入是方法成功的關鍵——它解決了 GAN 判別器在開集辨識中長期被忽視的模型選擇問題。但「幾百個離群樣本」的需求在完全未知的開集場景中可能仍然難以滿足。

4. Experiments — 實驗

Experiments span three setups. Setup I (single dataset): On MNIST, SVHN, CIFAR, and TinyImageNet, OpenGAN achieves AUROC of 0.999, 0.988, 0.973, and 0.907 respectively, substantially outperforming prior methods. Setup II (cross-dataset): Using TinyImageNet as closed-set with diverse open-sets, OpenGAN achieves 0.984 average AUROC, far surpassing binary classifiers (0.918) which overfit to training outliers. Setup III (semantic segmentation): On Cityscapes with 19 closed classes, feature-based OpenGAN achieves 0.885 AUROC with full data, vastly exceeding pixel-based (0.549) and entropy-based (0.697) methods. The model is also extremely lightweight: ~2MB discriminator vs. 250MB HRNet backbone.

實驗涵蓋三種設定。設定一（單一資料集）：在 MNIST、SVHN、CIFAR 與 TinyImageNet 上，OpenGAN 分別達到 0.999、0.988、0.973 與 0.907 的 AUROC，大幅超越先前方法。設定二（跨資料集）：以 TinyImageNet 作為封閉集搭配多元開集，OpenGAN 達到 0.984 的平均 AUROC，遠超因對訓練離群值過度擬合而僅達 0.918 的二元分類器。設定三（語意分割）：在含 19 個封閉類別的 Cityscapes 上，基於特徵的 OpenGAN 以完整資料達到 0.885 AUROC，大幅超越基於像素的（0.549）與基於熵的（0.697）方法。模型也極為輕量：判別器約 2MB 對比 250MB 的 HRNet 主幹。

段落功能全面實證——三種實驗設定覆蓋從簡單到複雜的多種開集場景。

邏輯角色三個設定的遞進設計極為周到：設定一驗證基本有效性，設定二驗證泛化能力（最關鍵），設定三驗證實際應用價值。跨資料集的 0.984 AUROC 是最有力的論據。

論證技巧 / 潛在漏洞 2MB vs 250MB 的對比巧妙地突顯了方法的輕量特性——判別器幾乎是「免費」的。但所有基準皆為影像分類/分割任務，對於其他模態（點雲、文字）的開集問題是否同樣有效尚未探討。

5. Conclusion — 結論

OpenGAN demonstrates that GAN discriminators can achieve state-of-the-art open-set discrimination once properly selected using a validation set of outlier examples. The key insight is that operating in learned feature space, combining real and synthesized outliers, and using open validation for model selection together overcome the long-standing limitations of both binary classifiers and GAN-based approaches. The method is lightweight, practical, and significantly outperforms prior methods across classification and semantic segmentation tasks. This work challenges the assumption that discriminators cannot serve as effective likelihood functions, opening new avenues for GAN-based open-set recognition.

OpenGAN 證明了 GAN 判別器在以離群樣本驗證集妥善選擇後，可達到最先進的開集判別性能。關鍵洞察在於，在學習特徵空間中操作、結合真實與合成離群值、以及使用開放驗證進行模型選擇，三者共同克服了二元分類器與 GAN 方法的長期限制。此方法輕量、實用，且在分類與語意分割任務上顯著超越先前方法。本研究挑戰了「判別器不能作為有效似然函數」的假設，為基於 GAN 的開集辨識開闢了新途徑。

段落功能總結全文——將三項創新統合為一個連貫的敘事。

邏輯角色結論段的核心訊息不僅是方法有效，更是挑戰了學界共識（判別器不能做似然函數）。這種「挑戰既有假設」的敘事提升了論文的影響力等級。

論證技巧 / 潛在漏洞「挑戰假設」的修辭極具學術感召力，但需要謹慎解讀：OpenGAN 的成功依賴特徵空間操作與驗證集選擇，這些條件與原始假設的情境（像素空間、無驗證集）有顯著不同。嚴格來說，它未完全推翻原假設，而是發現了使假設不成立的特定條件。

論證結構總覽

問題
二元分類器過擬合
GAN 判別器不穩定

→

論點
特徵空間 + 合成擴增
+ 開放驗證三管齊下

→

證據
跨資料集 0.984 AUROC
Cityscapes 0.885 AUROC

→

反駁
需少量離群驗證集
但幾百樣本即足夠

→

結論
GAN 判別器可作為
有效的開集似然函數

作者核心主張（一句話）

在學習特徵空間中以對抗式合成資料擴增離群訓練集，並透過真實離群驗證集仔細選擇 GAN 判別器，可使其作為穩健的開集似然函數，達到跨多種場景的最先進開集辨識性能。

論證最強處

跨資料集泛化能力：在設定二中，當訓練與測試的離群資料來自完全不同的分布時，OpenGAN 仍達到 0.984 AUROC，遠超因過擬合而降至 0.918 的二元分類器。這直接回應了開集辨識最核心的挑戰——面對未知的未知。特徵空間操作帶來的 0.885 vs 0.549 AUROC 差距更是無可爭辯的證據。

論證最弱處

對驗證離群資料的依賴：方法的成功高度依賴「開放驗證」步驟——即需要一小組真實離群樣本來選擇最佳判別器檢查點。在真正的開集場景中，可能無法預先獲取代表性的離群樣本。此外，如果驗證離群資料與測試時遇到的未知類別分布差異過大，模型選擇的有效性可能下降。作者以「幾百個樣本」緩解了此顧慮，但未系統性地分析驗證集與測試集之間的分布差距對性能的影響。