Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Abstract

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learns steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate existing approaches for multi-objective finetuning.

Publication
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ryan Sullivan
Ryan Sullivan
Computer Science PhD Candidate
rsulli@umd.edu

My research interests include reinforcement learning, curriculum learning, and training agents in multi-agent or open-ended environments.