Andrew Writing Things

Reading: Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

I recently read Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward and found it an interesting read, so I figured I’d write a quick summary and some thoughts on the work. Motivation We want to personalize LLMs in situ and without assuming access to labeled preference datasets for users. If we treat the user as a world that the LLM must explore, then a curiosity or intrinsic exploration reward should help us to discover the user’s innate preferences more quickly and effectively....

Learning a Personalized arXiv Feed

Keeping up with the latest on arXiv is hard ask, and requires somehow sifting through hundreds of papers every day. While I spent a long time relying on my network to surface important papers for my particular interests, this usually biases me towards a subset of authors or labs that are particularly adept at becoming popular on social media. ArXiv-Sanity is a great solution to this problem, allowing you to find recommended papers based on TF-IDF vectors for papers....

Investigating Relative Life Expectancy in the United States

I recently came across a compelling visualization of American healthcare costs vs. life expectancy, showing that the US is considerably outpacing other wealthy nations when it comes to spending on healthcare, but lagging behind when it comes to actually living longer. This graphic paints a grim picture on two fronts. For one, wow US healthcare is expensive. But more importantly: why aren’t we getting anything for it? Why is American life expectancy so far behind other developed nations?...

Reading: Scaling Monosemanticity -- Extracting Interpretable Features from Claude 3 Sonnet

In a recent paper from Anthropic, the authors show that they can discover single features that correlate to concepts inside of a large language model (LLM). Specifically, the authors look at learning an autoencoder with an L1 norm penalty (to encourage sparsity) over the activations for a middle-layer in the network model, which learns to embed the activations of the LLM into a sparse feature space. This sparsity means that they can take thousands or millions of dense feature activations and reduce all that noise down to just a few hundred high-magnitude features (on average, each token is represented by 300 independent features)....

Reading: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero Paper)

DeepMind’s Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model was officially published today in Nature, and is a reasonably short read covering a cool direction for future reinforcement learning (RL) research. What is the paper trying to do? Following the incredible successes of AlphaGo and AlphaZero, this work introduces MuZero, which permits the extension of AlphaZero-like learning and performance to a new set of RL domains....