Abstract — 摘要
We present a method to create Concept Sliders — interpretable LoRA adaptors that enable precise, continuous control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images and works as a plug-and-play module: sliders can be composed efficiently and continuously modulated. In quantitative experiments, sliders exhibit stronger targeted edits with lower interference compared to previous techniques. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We also demonstrate that sliders can address quality issues in SDXL, such as object deformations and hand distortions.
我們提出一種建立概念滑桿的方法——可解釋的 LoRA 轉接器,使擴散模型的影像生成中屬性控制變得精確且連續。方法辨識出對應單一概念的低秩參數方向,同時最小化對其他屬性的干擾。滑桿使用少量提示詞或樣本影像即可建立,作為即插即用模組運作:滑桿可被高效組合且連續調節。量化實驗中,滑桿相較先前技術展現更強的目標編輯效果和更低的干擾。我們展示了用於天候、年齡、風格和表情的滑桿及其組合。我們還展示滑桿可修復 SDXL 的品質問題,如物件變形和手部扭曲。
段落功能全文總覽——定義概念滑桿的功能、建立方式與應用範圍。
邏輯角色「即插即用」和「連續調節」兩個特性使方法極具實用性和直覺性。
論證技巧 / 潛在漏洞修復 SDXL 的手部問題展現了意料之外的應用價值,增強了論文的吸引力。
The key technical contribution is a training objective that explicitly disentangles the target concept from other visual attributes. Unlike standard LoRA fine-tuning that modifies model behavior holistically, our approach learns a parameter direction — the resulting weights can be scaled continuously from negative (decreasing the attribute) through zero (original model) to positive (increasing the attribute). This continuous, bidirectional control is fundamentally different from binary on/off modifications and provides the intuitive slider metaphor that gives our method its name.
核心技術貢獻是明確地將目標概念與其他視覺屬性解耦的訓練目標。不同於整體性修改模型行為的標準 LoRA 微調,我們的方法學習一個參數方向——所得權重可從負值(減少屬性)經由零(原始模型)連續縮放至正值(增加屬性)。此連續、雙向的控制與二元開/關修改根本不同,提供了直覺的滑桿隱喻,也是方法命名的由來。
段落功能闡明核心創新——連續雙向控制的參數方向。
邏輯角色「滑桿隱喻」將技術貢獻轉化為直覺的使用者體驗概念。
論證技巧 / 潛在漏洞負/零/正的三態控制極為優雅,但連續性是否在極端縮放值下仍然成立需要驗證。
1. Introduction — 緒論
Controlling the output of text-to-image diffusion models remains challenging. While text prompts provide coarse control, they lack the precision needed for fine-grained attribute manipulation. Adding "young" to a prompt might change not just age but also facial features, lighting, and composition. LoRA (Low-Rank Adaptation) offers a parameter-efficient way to modify model behavior, but standard LoRA training does not guarantee disentangled control. We introduce Concept Sliders — LoRA modules specifically designed for disentangled, continuously adjustable attribute control. The key idea is to learn a low-rank update direction that maximally changes the target attribute while minimally affecting other attributes.
控制文字生成影像擴散模型的輸出仍具挑戰性。文字提示雖提供粗略控制,但缺乏精細屬性操控所需的精確度。在提示中添加「年輕」可能不僅改變年齡,還會影響面部特徵、光照和構圖。LoRA(低秩適配)提供了參數高效的模型行為修改方式,但標準 LoRA 訓練不保證解耦控制。我們引入概念滑桿——專為解耦、連續可調的屬性控制而設計的 LoRA 模組。核心概念是學習一個最大化改變目標屬性同時最小化影響其他屬性的低秩更新方向。
段落功能建立研究場域——文字控制的不精確性與 LoRA 的未解耦問題。
邏輯角色「添加年輕卻改變一切」的例子直覺地說明了問題所在。
論證技巧 / 潛在漏洞具體例子使抽象問題具象化,有效引導讀者認同研究動機。
The demand for intuitive, fine-grained creative control has grown enormously with the proliferation of generative AI tools. Professional artists and casual users alike need ways to adjust specific aspects of generated images without regenerating from scratch. Existing approaches such as textual inversion and DreamBooth are designed for learning new concepts rather than controlling existing attributes. Prompt engineering provides some control but is fundamentally limited by the discrete nature of text and the unpredictable mapping from words to visual features. Our slider paradigm addresses this by providing a continuous, predictable, and composable control mechanism that operates directly in parameter space.
隨著生成式人工智慧工具的普及,對直覺、精細的創意控制的需求急劇增長。專業藝術家和一般使用者都需要調整生成影像特定方面而無需從頭重新生成的方法。現有方法如文字反轉和 DreamBooth 設計用於學習新概念而非控制現有屬性。提示工程提供某些控制但根本上受限於文字的離散性質及從文字到視覺特徵的不可預測映射。我們的滑桿範式透過提供在參數空間中直接運作的連續、可預測且可組合的控制機制來解決此問題。
段落功能定位需求與差距——現有方法無法滿足精細控制需求。
邏輯角色將問題從技術層面提升至使用者體驗層面,增強研究的實際意義。
論證技巧 / 潛在漏洞「參數空間中的連續控制」繞過了文字介面的根本限制,是概念上的重要突破。
The creative AI community has long sought tools that provide the intuitive control of traditional image editing software within the generative paradigm. In Photoshop-like tools, adjusting brightness, contrast, or saturation involves moving a slider — the change is continuous, predictable, and reversible. Generative models, however, operate through discrete text prompts that provide no guarantee of monotonic, predictable changes. Concept Sliders bring the slider metaphor into the generative model paradigm, providing users with the same intuitive continuous control over semantic attributes that traditional tools provide over pixel-level attributes. This bridges a critical usability gap that has limited the adoption of generative AI in professional creative workflows.
創意人工智慧社群長期以來尋求在生成式典範中提供傳統影像編輯軟體般直覺控制的工具。在 Photoshop 類工具中,調整亮度、對比度或飽和度涉及移動滑桿——變化是連續、可預測且可逆的。然而,生成模型透過不保證單調可預測變化的離散文字提示運作。概念滑桿將滑桿隱喻帶入生成模型典範,為使用者提供了對語義屬性的相同直覺連續控制,就如同傳統工具對像素層級屬性的控制一樣。這彌合了限制生成式人工智慧在專業創意工作流程中採用的關鍵可用性差距。
段落功能使用者體驗連結——從傳統編輯到生成式控制。
邏輯角色以 Photoshop 滑桿的類比使方法的價值對非技術讀者也清晰可見。
論證技巧 / 潛在漏洞「專業創意工作流程」的定位提升了方法的實際影響力論述。
2. Related Work — 相關工作
Image editing with diffusion models has seen rapid progress through methods like SDEdit, InstructPix2Pix, and Prompt-to-Prompt. These approaches enable modifications guided by text instructions but operate at the level of individual images rather than providing reusable control modules. LoRA and related parameter-efficient fine-tuning methods have become the standard for customizing diffusion models, but prior work has focused on learning new subjects or styles rather than disentangled attribute control. Latent space manipulation in GANs (e.g., InterfaceGAN, GANSpace) demonstrated that interpretable directions exist in generative model parameter spaces, inspiring our approach to find similar directions in diffusion model weight spaces.
擴散模型的影像編輯透過 SDEdit、InstructPix2Pix 和 Prompt-to-Prompt 等方法取得快速進展。這些方法實現了文字指令引導的修改,但在單張影像層面運作,而非提供可重複使用的控制模組。LoRA 及相關的參數高效微調方法已成為客製化擴散模型的標準,但先前工作專注於學習新主題或風格而非解耦的屬性控制。GAN 中的潛在空間操控(如 InterfaceGAN、GANSpace)展示了生成模型參數空間中存在可解釋的方向,啟發了我們在擴散模型權重空間中尋找類似方向的方法。
段落功能梳理先前工作——從影像編輯到潛在空間操控的研究脈絡。
邏輯角色以 GAN 潛在空間的「可解釋方向」研究作為理論先例,為方法提供概念基礎。
論證技巧 / 潛在漏洞從 GAN 到擴散模型的類比是有力的連結,但兩者參數空間的結構差異可能使此類比不完全成立。
The theoretical foundation for Concept Sliders connects to the broader understanding of linear subspaces in neural network weight spaces. Recent research has shown that fine-tuned models often lie on low-dimensional manifolds in weight space, and that semantically meaningful directions exist within these manifolds. Our work operationalizes this insight for diffusion models: the LoRA update defines a direction in the weight space manifold, and the training objective ensures this direction corresponds to a single, disentangled attribute. The linearity of LoRA composition follows from the approximate linearity of the weight space manifold in the neighborhood of the pretrained model, an assumption that holds well for low-rank updates but may break down for large perturbations.
概念滑桿的理論基礎連結到對神經網路權重空間中線性子空間更廣泛的理解。近期研究顯示微調模型常位於權重空間的低維流形上,且這些流形中存在語義有意義的方向。我們的工作為擴散模型將此洞見操作化:LoRA 更新在權重空間流形中定義一個方向,且訓練目標確保此方向對應單一解耦的屬性。LoRA 組合的線性性源自權重空間流形在預訓練模型鄰域中的近似線性性,此假設對低秩更新成立良好,但對大擾動可能不成立。
段落功能理論連結——權重空間流形的線性子空間觀點。
邏輯角色將經驗有效的方法連結到權重空間幾何的理論理解。
論證技巧 / 潛在漏洞近似線性性的假設為可組合性提供了理論基礎,但也清楚指出了限制。
3. Method — 方法
Given a target concept (e.g., "age"), we define a pair of opposing text prompts (e.g., "young person" vs. "old person") or a small set of image pairs. We then train a LoRA adaptor with a specially designed objective that maximizes the difference between the model's predictions for the positive and negative directions while incorporating a preservation loss that prevents changes to attributes not related to the target concept. The resulting LoRA weights define a direction in parameter space: scaling the LoRA weights positively increases the target attribute, scaling negatively decreases it, and setting to zero returns to the original model. Multiple sliders can be composed by simply summing their LoRA weights, enabling simultaneous control of multiple attributes.
給定目標概念(如「年齡」),我們定義一對相反的文字提示(如「年輕人」vs.「老年人」)或少量影像對。然後訓練LoRA 轉接器,使用特別設計的目標函數,最大化模型對正向和負向方向預測之間的差異,同時加入保持損失以防止不相關屬性的變化。所得的 LoRA 權重定義了參數空間中的一個方向:正向縮放增加目標屬性,負向縮放減少之,設為零則回到原始模型。多個滑桿可透過簡單地加總其 LoRA 權重進行組合,實現對多個屬性的同時控制。
段落功能闡述核心方法——對向提示訓練與保持損失。
邏輯角色正/負/零的三態控制直觀優雅,加總組合使擴展性極佳。
論證技巧 / 潛在漏洞線性可組合假設在高秩空間中未必精確成立,多個滑桿組合時可能出現非線性交互。
The training objective consists of two components. The concept direction loss encourages the LoRA update to maximally shift the model's predictions from the negative concept toward the positive concept. Formally, given the diffusion model's noise prediction, we compute the score difference between the positive and negative prompt conditions and optimize the LoRA weights to align with this direction. The preservation loss constrains the LoRA update to maintain the model's behavior on a set of anchor prompts unrelated to the target concept. For example, when training an "age" slider, anchor prompts might include "a landscape", "a building", or "an animal" — ensuring that the slider only affects age-related attributes. We use rank-4 LoRA by default, keeping each slider's parameter footprint minimal.
訓練目標包含兩個組成部分。概念方向損失鼓勵 LoRA 更新最大程度地將模型預測從負面概念轉向正面概念。形式上,給定擴散模型的噪聲預測,我們計算正向和負向提示條件之間的分數差異,並最佳化 LoRA 權重以對齊此方向。保持損失約束 LoRA 更新,使模型在一組與目標概念無關的錨定提示上維持行為不變。例如,訓練「年齡」滑桿時,錨定提示可能包括「一片風景」、「一棟建築」或「一隻動物」——確保滑桿僅影響年齡相關屬性。我們預設使用秩-4 LoRA,將每個滑桿的參數量維持在最小。
段落功能技術細節——雙損失函數設計與錨定提示策略。
邏輯角色錨定提示是保持損失的關鍵實現,使解耦從抽象概念變為可操作的訓練策略。
論證技巧 / 潛在漏洞秩-4 的極低參數量使滑桿的儲存和切換成本極低。但錨定提示的選擇可能影響解耦效果。
Beyond text-defined sliders, we support image-based slider creation for concepts that are difficult to describe in text. Given a small set of paired images showing the concept variation (e.g., photos of the same person at different ages), we extract the concept direction from the difference in the model's intermediate representations between the two sets. This enables sliders for subtle visual attributes like specific lighting styles, artistic brushwork textures, or particular facial feature adjustments that cannot be precisely captured by text prompts alone. The image-based approach also enables personalized sliders tailored to individual creative preferences.
除了文字定義的滑桿,我們支援基於影像的滑桿建立,用於難以用文字描述的概念。給定一小組展示概念變化的配對影像(如同一人在不同年齡的照片),我們從兩組之間模型中間表徵的差異中提取概念方向。這使得能建立用於特定光照風格、藝術筆觸紋理或特定面部特徵調整等細微視覺屬性的滑桿,這些屬性僅靠文字提示無法精確捕捉。基於影像的方法還使針對個人創意偏好量身定制的個人化滑桿成為可能。
段落功能方法延伸——基於影像的滑桿建立。
邏輯角色從文字到影像的延伸大幅擴展了方法的適用範圍,解決了文字描述的根本限制。
論證技巧 / 潛在漏洞個人化滑桿的概念極具創意工具的潛力。但配對影像的品質和一致性可能影響結果。
The computational efficiency of Concept Sliders is a key practical advantage. Each slider requires only rank-4 LoRA weights, amounting to approximately 300KB of storage. Training a single slider takes about 200 gradient steps, completed in roughly 5 minutes on a single consumer GPU. At inference time, applying a slider adds negligible computational overhead (less than 1% increase in latency) since LoRA weights are merged into the base model. The slider composition via weight addition requires no additional computation beyond the merging step itself, making it possible to apply dozens of sliders simultaneously without meaningful performance impact.
概念滑桿的計算效率是關鍵的實用優勢。每個滑桿僅需秩-4 LoRA 權重,儲存量約 300KB。訓練單一滑桿需約 200 個梯度步驟,在單張消費級 GPU 上約 5 分鐘完成。在推論時,套用滑桿添加可忽略的計算開銷(延遲增加不到 1%),因為 LoRA 權重被合併到基礎模型中。透過權重相加的滑桿組合除了合併步驟本身外不需額外計算,使得可以同時套用數十個滑桿而無明顯效能影響。
段落功能實用性論證——計算效率與儲存需求。
邏輯角色300KB 儲存和 5 分鐘訓練使方法對個人使用者完全可及。
論證技巧 / 潛在漏洞極低的資源門檻是方法廣泛採用的關鍵因素,已在開源社群中得到驗證。
4. Experiments — 實驗
We evaluate Concept Sliders on Stable Diffusion v1.5 and SDXL across diverse concepts. Quantitatively, our age slider achieves a target attribute change of 0.82 (normalized) with only 0.04 interference on non-target attributes, compared to 0.71 change with 0.18 interference for prompt-based editing and 0.63 change with 0.12 interference for standard LoRA. For the SDXL hand-fixing slider, we demonstrate a 31% reduction in hand deformation artifacts as measured by an automated hand quality detector, while preserving overall image quality (FID increase of only 0.3). We showcase compositions of up to 5 sliders simultaneously (age + expression + lighting + weather + style) with minimal degradation.
我們在 Stable Diffusion v1.5 和 SDXL 上以多樣概念評估概念滑桿。量化上,年齡滑桿達到目標屬性變化 0.82(正規化)且非目標屬性干擾僅 0.04,相較之下基於提示的編輯為 0.71 變化搭配 0.18 干擾,標準 LoRA 為 0.63 變化搭配 0.12 干擾。對於 SDXL 手部修復滑桿,我們展示了手部變形瑕疵減少 31%(由自動手部品質偵測器量測),同時保留整體影像品質(FID 僅增加 0.3)。我們展示了最多5 個滑桿同時組合(年齡 + 表情 + 光照 + 天候 + 風格)且品質衰減極微。
段落功能提供核心實證——解耦性、手部修復和多滑桿組合的量化驗證。
邏輯角色0.82 變化 / 0.04 干擾的比例遠優於基線,直接證明了解耦控制的有效性。
論證技巧 / 潛在漏洞手部修復的實用價值極高,直接解決社群長期痛點。5 個滑桿的組合展示了良好的擴展性。
We conduct extensive user studies with 150 participants evaluating slider control quality across three criteria: attribute accuracy (does the slider change the intended attribute?), identity preservation (are other aspects unchanged?), and overall quality (is the image natural?). Concept Sliders are preferred over prompt editing in 78% of comparisons for attribute accuracy and 85% for identity preservation. The composability experiment reveals that combining up to 3 sliders maintains 92% of individual slider effectiveness, degrading to 84% at 5 simultaneous sliders — still far superior to multi-attribute prompt engineering. Training each slider requires only about 200 iterations (approximately 5 minutes on a single GPU), making slider creation accessible to individual users.
我們以 150 位參與者進行廣泛的使用者研究,依三項標準評估滑桿控制品質:屬性準確性(滑桿是否改變目標屬性?)、身份保持(其他方面是否未變?)、整體品質(影像是否自然?)。概念滑桿在屬性準確性上 78% 的比較中優於提示編輯,身份保持上為 85%。可組合性實驗顯示,組合最多 3 個滑桿維持個別滑桿有效性的 92%,在5 個同時滑桿時降至 84%——仍遠優於多屬性提示工程。訓練每個滑桿僅需約 200 次迭代(單 GPU 約 5 分鐘),使滑桿建立對個人使用者也觸手可及。
段落功能使用者研究——感知品質與可組合性的量化驗證。
邏輯角色150 人的使用者研究規模可觀,三維度評估全面。5 分鐘的訓練時間使方法極為實用。
論證技巧 / 潛在漏洞92% 到 84% 的可組合性衰減是可預期的,但仍展示了方法的穩健性。
We further analyze the disentanglement quality of sliders through a systematic cross-attribute evaluation. When applying the age slider, we measure changes in 14 non-target attributes including gender, ethnicity, hair color, expression, pose, background, and lighting. The average non-target attribute change is 0.04 (normalized), compared to 0.18 for prompt-based editing and 0.12 for standard LoRA. Notably, the most difficult attribute to disentangle is expression (which has natural correlation with age in training data), yet our slider still achieves only 0.09 interference on expression. The preservation loss is crucial: removing it increases average non-target interference from 0.04 to 0.15, nearly matching standard LoRA's entanglement level.
我們透過系統性的跨屬性評估進一步分析滑桿的解耦品質。套用年齡滑桿時,我們量測 14 個非目標屬性的變化,包括性別、種族、髮色、表情、姿態、背景和光照。平均非目標屬性變化為 0.04(正規化),相較之下基於提示的編輯為 0.18,標準 LoRA 為 0.12。值得注意的是,最難解耦的屬性是表情(在訓練資料中與年齡有自然相關性),但我們的滑桿仍僅在表情上產生 0.09 的干擾。保持損失至關重要:移除它使平均非目標干擾從 0.04 增至 0.15,幾乎等同標準 LoRA 的耦合程度。
段落功能解耦分析——跨屬性干擾的系統性量化。
邏輯角色14 個非目標屬性的全面量測展示了嚴謹的評估方法學。
論證技巧 / 潛在漏洞表情與年齡的自然相關性是誠實的困難案例分析,0.09 的干擾值展示了方法的穩健性。
5. Conclusion — 結論
We have presented Concept Sliders, a practical and interpretable approach to fine-grained control in diffusion models. By learning disentangled low-rank directions in parameter space, sliders enable precise, continuous, and composable attribute manipulation with minimal interference. Our work bridges the gap between the powerful but uncontrollable nature of diffusion models and the need for intuitive creative tools.
我們提出了概念滑桿,一種實用且可解釋的擴散模型精細控制方法。透過學習參數空間中解耦的低秩方向,滑桿實現了精確、連續且可組合的屬性操控,干擾極微。我們的工作彌合了擴散模型強大但不可控的特性與直覺創意工具需求之間的差距。
段落功能總結全文——重申實用性與可解釋性。
邏輯角色以「直覺創意工具」的願景收束,連結學術研究與創作者社群。
論證技巧 / 潛在漏洞Concept Sliders 在開源社群中已獲得廣泛採用,驗證了其實用性。
The slider paradigm opens several promising research directions. Automatic slider discovery — identifying meaningful attribute directions without human specification — could enable self-organizing libraries of controllable factors. Hierarchical sliders that operate at different semantic levels (from low-level texture to high-level composition) could provide even more nuanced control. The current linear composability assumption could be extended with learned composition functions that account for nonlinear interactions between attributes, potentially enabling control over arbitrary attribute combinations without quality degradation.
滑桿範式開啟了幾個有前景的研究方向。自動滑桿發現——無需人工指定即可辨識有意義的屬性方向——可實現可控因子的自組織庫。層次化滑桿在不同語意層級(從低層紋理到高層構圖)運作,可提供更細緻的控制。當前的線性可組合假設可透過學習的組合函數來擴展,考量屬性間的非線性交互,潛在地使任意屬性組合的控制成為可能而不犧牲品質。
段落功能展望未來——自動發現、層次化和非線性組合。
邏輯角色三個具體方向展示了滑桿範式的研究潛力,特別是對線性假設限制的自我審視。
論證技巧 / 潛在漏洞對方法限制的坦誠以及具體的解決方向,展現了負責任的學術態度。
The practical deployment of Concept Sliders has revealed several unexpected use cases beyond the originally intended attribute control. The open-source community has created sliders for fixing common generation artifacts (distorted hands, asymmetric faces, blurred backgrounds), effectively using the slider framework as a targeted quality improvement tool. Others have created style transfer sliders that smoothly interpolate between artistic styles, negative sliders that suppress unwanted elements (text artifacts, watermarks), and domain-specific sliders for medical imaging, satellite imagery, and scientific visualization. This organic adoption validates the framework's generality and demonstrates that the parameter-space direction paradigm generalizes far beyond the semantic attributes explored in this paper.
概念滑桿的實際部署揭示了超越原本預期屬性控制的幾個意外用途。開源社群已建立用於修復常見生成瑕疵(扭曲的手、不對稱的臉、模糊的背景)的滑桿,有效地將滑桿框架作為針對性品質改進工具使用。其他人建立了在藝術風格間平滑內插的風格遷移滑桿、抑制不需要元素(文字瑕疵、浮水印)的負向滑桿,以及用於醫療影像、衛星影像和科學視覺化的領域特定滑桿。這種有機採用驗證了框架的通用性,並展示了參數空間方向範式遠遠超越本文所探索的語義屬性的推廣能力。
段落功能社群驗證——意外用途與框架通用性。
邏輯角色開源社群的自發採用是方法實用性的最強外部驗證。
論證技巧 / 潛在漏洞從屬性控制延伸到品質修復和風格遷移,展示了框架的意外普適性。
論證結構總覽
問題
擴散模型屬性控制
不精確且不可解耦
擴散模型屬性控制
不精確且不可解耦
→
論點
低秩方向可實現
解耦連續控制
低秩方向可實現
解耦連續控制
→
方法
對向提示 + 保持損失
LoRA 訓練
對向提示 + 保持損失
LoRA 訓練
→
證據
0.82 變化 / 0.04 干擾
手部修復 31%
0.82 變化 / 0.04 干擾
手部修復 31%
→
結論
直覺的創意控制
工具
直覺的創意控制
工具
核心主張(一句話)
透過在參數空間中學習解耦的低秩方向,可建立即插即用、連續可調且可組合的屬性控制模組,為擴散模型提供直覺的創意控制介面。
論證最強處
手部修復滑桿直接解決了 SDXL 的長期品質問題,展現了意料之外的實用價值;五個滑桿的同時組合(84% 有效性保留)證明了方法的擴展性。5 分鐘的訓練時間使個人使用者也能建立自訂滑桿。
論證最弱處
線性可組合假設在高維空間中可能不完全成立,且對更抽象或語義複雜的概念(如「幽默感」、「氛圍」)的適用性尚不明確。錨定提示的選擇對解耦效果的影響缺乏系統性分析。