Reading: Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
I recently read Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward and found it an interesting read, so I figured I’d write a quick summary and some thoughts on the work. Motivation We want to personalize LLMs in situ and without assuming access to labeled preference datasets for users. If we treat the user as a world that the LLM must explore, then a curiosity or intrinsic exploration reward should help us to discover the user’s innate preferences more quickly and effectively....