Mechanistic Interpretability

Reproducing Emotion Vector Part I

An independent reproduction of Anthropic's emotion vector research using the open-weight Llama 3.1 8B model, with the paper's verbatim 171 emotions, 100 topics, and 64 activities. We confirm 10 of 11 verification criteria, with causal steering r=0.956 closely matching the paper's r=0.85.