Rewriting a Deep Generative Model

Abstract — 摘要

We present a method for rewriting the rules encoded by a deep generative model. A generative adversarial network (GAN) learns rich semantics about the visual world, encoding rules such as "trees have leaves" or "doors are on buildings." We propose a method to directly modify specific rules in the model's weights through a constrained optimization that changes a small number of model parameters to achieve a desired visual effect while preserving the model's other capabilities.

我們提出一種重寫深度生成模型所編碼規則的方法。生成對抗網路（GAN）學習了豐富的視覺世界語意，編碼了如「樹有葉子」或「門在建築物上」等規則。我們提出直接修改模型權重中特定規則的方法，透過約束最佳化改變少量模型參數，在保持模型其他能力的同時實現所需的視覺效果。

段落功能全文總覽——定義「重寫生成模型規則」的新問題。

邏輯角色將 GAN 的內部表徵解讀為「規則」是富有創意的概念框架。

論證技巧 / 潛在漏洞「改變少量參數」的約束是保持模型完整性的關鍵設計決策。

1. Introduction — 緒論

Deep generative models like StyleGAN produce remarkably realistic images, but controlling their output semantics remains challenging. Existing methods for GAN editing operate in the latent space, modifying individual samples rather than the model's learned rules. We take a fundamentally different approach: instead of editing individual images, we edit the model itself to change the rules it has learned. For example, we can make a model that always generates churches with domes, or horses with stripes.

StyleGAN 等深度生成模型產生極為逼真的影像，但控制其輸出語意仍具挑戰。現有 GAN 編輯方法在潛在空間中操作，修改個別樣本而非模型學到的規則。我們採取根本不同的方法：不編輯個別影像，而是編輯模型本身以改變它學到的規則。例如，可使模型始終生成有穹頂的教堂或有條紋的馬。

段落功能建立動機——區分「編輯樣本」與「編輯模型」的根本差異。

邏輯角色「編輯模型」vs「編輯樣本」的對比清晰地建立了新穎性。

論證技巧 / 潛在漏洞具體例子（穹頂教堂、條紋馬）使抽象概念變得直觀且易於理解。

This distinction has profound implications: model-level editing changes apply to all future generated samples, not just one. This enables systematic modifications like "remove all windows from churches" or "add grass to all outdoor scenes." Such operations would require editing each generated image individually with latent-space methods, which is impractical for large-scale generation.

這個區分具有深遠意義：模型層級的編輯修改適用於所有未來生成的樣本，而非僅限一個。這使得系統性修改成為可能，如「移除所有教堂的窗戶」或「為所有戶外場景添加草地」。此類操作若以潛在空間方法逐一編輯每張生成影像，在大規模生成中不切實際。

段落功能價值論證——模型層級編輯的系統性優勢。

邏輯角色「所有未來樣本」的全局效果是模型編輯相比樣本編輯的根本優勢。

論證技巧 / 潛在漏洞全局修改的不可逆性可能是把雙刃劍——無法對個別樣本例外處理。

2. Method — 方法

Our approach works by identifying a specific layer and set of neurons in the generator that are responsible for the rule to be changed. We then solve a constrained optimization problem that modifies the weights of that layer to produce the desired effect on a set of exemplar images, while minimizing changes to the network's behavior on other inputs. The key constraint is a rank-1 update to the weight matrix, which limits the modification to a single semantic direction.

我們的方法透過識別生成器中負責待修改規則的特定層和神經元集合來運作。接著求解約束最佳化問題，修改該層權重以在一組範例影像上產生所需效果，同時最小化對其他輸入的行為變化。關鍵約束是對權重矩陣的秩一更新，將修改限制在單一語意方向。

段落功能核心方法——描述層級定位、約束最佳化與秩一更新。

邏輯角色秩一更新是精巧的數學約束，確保修改精確且不破壞其他功能。

論證技巧 / 潛在漏洞秩一約束可能限制了複雜語意修改的能力，但保證了穩定性。

2.1 Rank-1 Model Rewriting — 秩一模型重寫

Formally, given a layer with weight matrix W, we compute the optimal rank-1 update W' = W + d * k^T, where d is the desired output direction and k is the key that selects when to apply the change. The key k is computed to activate on the specific visual pattern (e.g., tree canopy) while remaining inactive on other patterns. This formulation allows precise, localized editing of the model's rules without retraining.

形式上，給定權重矩陣 W 的層，我們計算最優秩一更新 W' = W + d * k^T，其中 d 為所需輸出方向，k 為選擇何時應用修改的鍵。鍵 k 被計算為在特定視覺模式（如樹冠）上啟動，而在其他模式上保持不活躍。此公式化允許精確、局部化地編輯模型規則而無需重新訓練。

段落功能數學形式化——秩一更新的鍵值分解。

邏輯角色鍵值分解使得「何時修改」和「修改什麼」可以獨立控制。

論證技巧 / 潛在漏洞無需重新訓練即可編輯模型是強大的實用優勢，後續 ROME 等工作將此概念應用於 LLM。

3. Experiments — 實驗

We demonstrate model rewriting on StyleGAN trained on churches, horses, and kitchens. Examples include: adding domes to all churches, changing tree species, adding stripes to horses. User studies show that edited models produce coherent changes that are preferred over latent-space editing in 78% of cases. The changes are consistent across all generated samples, confirming that we are editing rules rather than individual images. Editing takes only seconds per rule change.

我們在訓練於教堂、馬和廚房的 StyleGAN 上展示模型重寫。範例包括：為所有教堂添加穹頂、改變樹種、為馬添加條紋。使用者研究顯示編輯後的模型產生的連貫修改在 78% 的案例中優於潛在空間編輯。修改在所有生成樣本上一致，確認我們正在編輯規則而非個別影像。每次規則修改僅需數秒。

段落功能定量與定性評估——使用者研究與一致性驗證。

邏輯角色 78% 的偏好率和跨樣本一致性有力支撐核心論點。

論證技巧 / 潛在漏洞「數秒」的編輯速度使方法具有高度實用性，遠快於重新訓練。

We also study the scope of modifications. The rank-1 constraint ensures that changes are localized to the targeted semantic concept. Quantitatively, FID scores on the unmodified concepts change by less than 1 point, confirming minimal collateral damage. We compare against fine-tuning the entire layer, which achieves similar target modifications but degrades FID by 5-10 points on other concepts.

我們也研究修改的範圍。秩一約束確保修改局限於目標語意概念。定量地，未修改概念的 FID 分數變化不到 1 分，確認了極小的附帶損害。與微調整個層相比，後者達到類似的目標修改但在其他概念上FID 劣化 5-10 分。

段落功能範圍分析——秩一約束的局部性驗證。

邏輯角色 FID 不到 1 分的變化 vs 整層微調的 5-10 分劣化，清楚證明秩一約束的價值。

論證技巧 / 潛在漏洞與全層微調的對比是強有力的消融，直接回應了「為何需要秩一約束」。

4. Conclusion — 結論

We have shown that the rules encoded by a deep generative model can be directly rewritten through targeted modification of model weights. Our rank-1 rewriting approach provides a new tool for understanding and controlling the knowledge stored in neural networks. This opens up possibilities for model debugging, creative applications, and studying how neural networks encode semantic knowledge.

我們展示了深度生成模型編碼的規則可透過針對性的模型權重修改直接重寫。我們的秩一重寫方法為理解和控制神經網路中儲存的知識提供了新工具。這為模型除錯、創意應用以及研究神經網路如何編碼語意知識開闢了可能性。

段落功能總結——從技術貢獻上升到神經網路可解釋性的更廣視角。

邏輯角色將方法與可解釋性研究連結，拓展了工作的學術意義。

論證技巧 / 潛在漏洞「模型除錯」的應用方向極具前瞻性，後續 ROME 等工作驗證了此概念的廣泛適用性。

論證結構總覽

問題
GAN 語意控制困難

→

論點
編輯模型而非樣本

→

方法
秩一權重更新

→

證據
使用者研究 78% 偏好

→

結論
神經網路知識可重寫

核心主張

透過對生成模型權重的秩一約束更新，可精確地重寫模型編碼的特定語意規則，而不影響其他生成能力。

論證最強處

使用者研究與跨樣本一致性有力驗證了「編輯模型規則」而非「編輯個別影像」的核心宣稱，且操作僅需數秒。

論證最弱處

秩一約束限制了複雜語意修改的能力，且方法主要在 StyleGAN 上驗證，對其他生成架構的泛化性不確定。