Large language models (LLMs) have revolutionized natural language processing, enabling applications from AI-assisted writing to conversational agents. However, one major limitation remains: personalization. Most current models assume a universal preference model, failing to adapt to individual user needs. Traditionally, RLHF optimizes LLMs using preference data aggregated across many users. While this ensures broad alignment with human values, it lacks the ability to tailor responses to individual users.
For example, a business professional might prefer concise and formal responses, while a student may value detailed and explanatory answers. Training separate models per user is computationally expensive and requires large amounts of user-specific data—often impractical in real-world settings.
In this paper, we introduce a new approach to efficiently personalize LLMs using a low-dimensional reward factorization framework. Our method allows models to align with user-specific preferences using as few as 10 preference examples, significantly improving user satisfaction compared to standard RLHF.
In our human experiment, we were able to personalized GPT4o responses to the user preference, leading to a 67% win rate against the non-personalized responses.