Open Set Domain Adaptation

Abstract — 摘要

Domain adaptation addresses the problem where training and test data belong to different domains. While previous work on domain adaptation assumes that both domains contain the same set of object classes (closed set), this paper considers a more realistic open set scenario where only a few categories of interest are shared between source and target domains. The authors propose a method that learns a mapping from the source to the target domain by jointly solving an assignment problem that labels those target instances that potentially belong to the categories of interest present in the source dataset. The approach handles both open set and closed set domain adaptation in a unified framework.

領域適應處理的是訓練資料與測試資料屬於不同領域的問題。先前的領域適應研究假設兩個領域包含相同的物件類別（封閉集合），本文則考量更為寫實的開放集合場景，其中僅有少數感興趣的類別在來源領域與目標領域之間共享。作者提出一種方法，透過聯合求解指派問題來學習從來源領域到目標領域的映射，該指派問題會標記那些可能屬於來源資料集中所含感興趣類別的目標實例。此方法在統一框架下同時處理開放集合與封閉集合的領域適應。

段落功能全文總覽——從封閉集合的假設出發，指出其不切實際之處，引出開放集合領域適應的核心命題。

邏輯角色摘要同時執行「問題重新定義」與「解決方案預告」的功能：先挑戰既有假設（封閉集合），再以指派問題的聯合求解概述方法論。

論證技巧 / 潛在漏洞以「更寫實」修飾開放集合場景，暗示先前研究的假設過於理想化，具有修辭說服力。但「少數共享類別」的具體定義與門檻尚未說明，留待方法章節補充。

1. Introduction — 緒論

Acquiring annotated training data is costly, leading practitioners to leverage large public datasets despite domain differences. However, depending on the application, the type of sensor or the perspective of the sensor, the entire captured scene might greatly differ from pictures on the Internet. This creates a significant domain shift between the source and target distributions. Domain adaptation methods aim to bridge this gap by learning domain-invariant representations or transformations.

取得標註訓練資料的成本高昂，使得實務工作者傾向利用大型公開資料集，儘管存在領域差異。然而，根據應用場景、感測器類型或感測器視角的不同，所擷取的整個場景可能與網路上的圖片大相逕庭。這在來源分布與目標分布之間造成顯著的領域偏移。領域適應方法旨在透過學習領域不變表示或轉換來彌合此差距。

段落功能建立研究動機——從實務需求出發，說明標註成本與領域偏移的核心矛盾。

邏輯角色論證鏈的起點：以實務困境（標註成本）驅動讀者認同領域適應的必要性，為後續挑戰封閉集合假設鋪路。

論證技巧 / 潛在漏洞以感測器差異作為領域偏移的具體實例，使抽象問題具象化。但此處預設讀者已熟悉領域適應的基本概念，對非專業讀者可能不夠友善。

Critically, all available evaluation protocols for domain adaptation describe a closed set recognition task, where both domains contain images only of the same set of object classes. In real-world applications, however, the target domain often contains unknown categories that are absent from the source domain. The authors argue that a robust domain adaptation method must be able to identify and reject such unknown classes while still transferring knowledge for the shared categories.

關鍵在於，所有現有的領域適應評估協定都描述的是封閉集合辨識任務，其中兩個領域僅包含相同的物件類別集合。然而在真實世界的應用中，目標領域往往包含來源領域所沒有的未知類別。作者主張，一個穩健的領域適應方法必須能夠辨識並拒絕此類未知類別，同時仍為共享類別進行知識遷移。

段落功能挑戰既有假設——直接指出封閉集合評估協定的根本缺陷。

邏輯角色此段是全文的轉捩點：從「領域適應有用」到「現有領域適應有根本缺陷」，將研究空隙精確化為「未知類別的處理」。

論證技巧 / 潛在漏洞以「所有」一詞強調問題的普遍性，修辭效果強烈。但實際上可能存在部分開放集合的嘗試性研究，此處的絕對化措辭有過度簡化之嫌。

Traditional domain adaptation methods can be broadly categorized into instance-based, feature-based, and parameter-based approaches. Instance-based methods re-weight source instances to match the target distribution. Feature-based methods learn shared representations that minimize domain discrepancy, such as through maximum mean discrepancy (MMD) or domain adversarial training. Parameter-based methods adapt model parameters from source to target. However, all these approaches assume a closed set setting where the label spaces of source and target are identical.

傳統的領域適應方法大致可分為基於實例、基於特徵與基於參數的方法。基於實例的方法重新加權來源實例以匹配目標分布。基於特徵的方法學習最小化領域差異的共享表示，例如透過最大均值差異（MMD）或領域對抗訓練。基於參數的方法將模型參數從來源適應到目標。然而，所有這些方法都假設封閉集合設定，即來源與目標的標籤空間完全一致。

段落功能文獻分類——系統性整理三類領域適應方法，並統一指出其共同缺陷。

邏輯角色以分類學式的文獻回顧建立全景觀，然後以「全部假設封閉集合」一句收束，強化本文開放集合設定的新穎性。

論證技巧 / 潛在漏洞三分法的整理清晰有效，但將所有現有方法歸結為「封閉集合假設」可能忽略了零樣本學習或遷移學習中部分處理未知類別的嘗試。

Open set recognition has been studied independently of domain adaptation. Methods such as nearest non-outlier (NNO) and open set SVM can reject test samples from unknown classes. However, these methods do not address the domain shift problem — they assume training and test data come from the same distribution. The authors identify the gap: combining open set recognition with domain adaptation is an under-explored but practically important direction.

開放集合辨識已在領域適應之外被獨立研究。如最近非離群值（NNO）和開放集合 SVM 等方法能拒絕來自未知類別的測試樣本。然而，這些方法並未處理領域偏移問題——它們假設訓練與測試資料來自相同分布。作者辨識出研究空隙：將開放集合辨識與領域適應結合是一個探索不足但實務上重要的方向。

段落功能交叉定位——連結開放集合辨識與領域適應兩個研究領域的交集。

邏輯角色在兩個成熟領域之間找到未被填補的空隙，為本文的貢獻提供精確定位。這是經典的「研究空隙」論證策略。

論證技巧 / 潛在漏洞將兩個已知領域的「交集」定義為新問題，論證結構紮實。但未充分討論為何此交集長期被忽略——是技術困難還是需求不足？

3. Method — 方法

3.1 Problem Formulation — 問題定義

Given a source domain with labeled data and a target domain with unlabeled (or partially labeled) data, the standard domain adaptation assumes the label sets are identical. In the proposed open set domain adaptation formulation, the target domain may contain classes not present in the source, and vice versa. The method must simultaneously adapt to the domain shift and identify which target instances belong to known versus unknown classes.

給定帶有標註資料的來源領域與無標註（或部分標註）資料的目標領域，標準領域適應假設標籤集合完全一致。在所提出的開放集合領域適應公式化中，目標領域可能包含來源中不存在的類別，反之亦然。方法必須同時適應領域偏移，並辨識哪些目標實例屬於已知類別、哪些屬於未知類別。

段落功能形式化定義——嚴謹地陳述開放集合領域適應的數學設定。

邏輯角色將緒論中的直覺性描述轉化為精確的問題定義，為後續演算法推導建立基礎。

論證技巧 / 潛在漏洞問題定義簡潔明確，但「反之亦然」（來源有而目標無的類別）的情況處理方式未被詳細討論，可能在實驗中被忽略。

3.2 Joint Assignment and Adaptation — 聯合指派與適應

The core of the approach is a joint optimization that simultaneously learns a domain mapping and solves an assignment problem. Specifically, the method assigns pseudo-labels to target instances based on their similarity to source class prototypes in a shared feature space. Instances that cannot be confidently assigned to any source class are labeled as "unknown". The domain mapping is then updated using only the confidently assigned target instances, ensuring that unknown-class data does not corrupt the adaptation process. This alternating optimization iterates between assignment refinement and mapping update.

此方法的核心是一個聯合最佳化，同時學習領域映射並求解指派問題。具體而言，方法根據目標實例與來源類別原型在共享特徵空間中的相似度，為目標實例分配虛擬標籤。無法被自信地分配到任何來源類別的實例被標記為「未知」。接著僅使用自信分配的目標實例來更新領域映射，確保未知類別的資料不會損害適應過程。此交替最佳化在指派精煉與映射更新之間迭代進行。

段落功能核心演算法——描述聯合指派與適應的交替最佳化機制。

邏輯角色此段是全文的技術支柱：將開放集合問題分解為「指派」與「映射」兩個子問題的交替求解，直接回應摘要中的核心承諾。

論證技巧 / 潛在漏洞交替最佳化是經典的策略，直覺上合理。但存在兩個隱憂：(1) 初始指派錯誤可能在迭代中被放大（錯誤累積）；(2)「自信度」門檻的選擇可能高度影響效能，作者需在實驗中驗證穩健性。

The assignment problem is formulated as an integer linear program that maximizes the total similarity between assigned target instances and source prototypes, subject to constraints that enforce class balance and allow for an "unknown" assignment. The domain mapping utilizes a linear transformation learned via the assigned pairs, projecting source features into the target feature space. By coupling the assignment and mapping objectives, the framework avoids the pitfall of adapting to irrelevant target classes.

指派問題被公式化為一個整數線性規劃，最大化被指派的目標實例與來源原型之間的總相似度，並受到強制類別平衡及允許「未知」指派的約束。領域映射利用透過已指派配對所學到的線性轉換，將來源特徵投射到目標特徵空間。透過耦合指派與映射目標，框架避免了適應到不相關目標類別的陷阱。

段落功能數學細節——闡明指派問題的最佳化形式與映射學習機制。

邏輯角色為前段的直覺描述提供數學基礎，整數線性規劃的選擇賦予方法全域最佳性的保證。

論證技巧 / 潛在漏洞整數線性規劃提供了理論上的全域最佳解，但在大規模資料集上可能面臨計算瓶頸。線性轉換的假設也限制了方法捕捉複雜非線性領域偏移的能力。

4. Experiments — 實驗

The method is evaluated on standard domain adaptation benchmarks including Office dataset (Amazon, DSLR, Webcam) and cross-dataset recognition tasks. The authors design evaluation protocols for both open set and closed set scenarios, demonstrating that the approach is versatile. In the open set setting, varying numbers of unknown classes are introduced in the target domain. Results show that the proposed method outperforms the state-of-the-art domain adaptation methods that are modified to handle unknown classes, while also achieving competitive performance in the standard closed set setting.

方法在標準領域適應基準上進行評估，包括 Office 資料集（Amazon、DSLR、Webcam）及跨資料集辨識任務。作者為開放集合與封閉集合場景設計了評估協定，展示方法的通用性。在開放集合設定中，目標領域引入不同數量的未知類別。結果顯示所提方法優於最先進的經修改以處理未知類別的領域適應方法，同時在標準封閉集合設定中也達到具競爭力的效能。

段落功能實驗驗證——在多場景下全面評估方法的有效性與通用性。

邏輯角色實證支柱：覆蓋開放集合與封閉集合兩種設定，證明方法不僅解決新問題，也不犧牲在傳統問題上的表現。

論證技巧 / 潛在漏洞同時展示開放集合與封閉集合的結果是強有力的論證策略。但由於開放集合設定是本文首次提出，缺乏與專門針對此問題設計之方法的直接比較，基線選擇的公平性有待商榷。

5. Conclusion — 結論

This paper introduces the problem of open set domain adaptation, a more realistic formulation that relaxes the standard assumption of identical label spaces. The proposed method jointly optimizes domain mapping and instance assignment, effectively transferring knowledge for shared categories while rejecting unknown classes. Thorough evaluation demonstrates that the approach outperforms the state-of-the-art, providing a practical solution for realistic domain adaptation scenarios. Future work may explore deep neural network integration and extension to more complex recognition tasks.

本文引入開放集合領域適應問題，此為更具寫實性的公式化，放寬了標籤空間完全一致的標準假設。所提方法聯合最佳化領域映射與實例指派，有效地為共享類別遷移知識，同時拒絕未知類別。全面的評估證明此方法優於現有最先進方法，為寫實的領域適應場景提供實用的解決方案。未來工作可探索深度神經網路的整合以及更複雜辨識任務的擴展。

段落功能總結全文——重申問題定義、方法貢獻與實驗結果，展望未來方向。

邏輯角色結論呼應摘要，形成完整的論證閉環：問題定義（開放集合）-> 方法（聯合最佳化）-> 驗證（優於現有方法）-> 展望（深度整合）。

論證技巧 / 潛在漏洞坦承未來需整合深度網路，間接承認當前以傳統特徵為主的限制。此誠實的自我評估增強了論文的可信度，但也暴露出方法在深度學習時代的潛在落伍風險。

論證結構總覽

問題
封閉集合假設
不符真實場景

→

論點
開放集合領域適應
聯合指派與映射

→

證據
Office 等基準
開放/封閉雙驗證

→

反駁
兼顧封閉集合效能
不犧牲傳統表現

→

結論
開放集合為更寫實
的領域適應框架

作者核心主張（一句話）

透過聯合求解指派問題與領域映射學習，可以在開放集合設定下有效地將來源領域知識遷移到含有未知類別的目標領域。

論證最強處

問題定義的前瞻性：首次正式定義開放集合領域適應問題，填補了領域適應與開放集合辨識之間的研究空隙，具有高度的實務價值與理論意義。

論證最弱處

方法的可擴展性疑慮：基於整數線性規劃的指派求解在大規模資料集上可能面臨計算瓶頸，且線性領域映射可能無法捕捉深層特徵空間中的複雜非線性偏移。