Junyuan Hong
Junyuan Hong
Research
Publications
Experiences
Teaching
Blog
Mechanistic Interpretability
Reproducing Emotion Vector Part I
An independent reproduction of Anthropic's emotion vector research using the open-weight Llama 3.1 8B model. We confirm 9 of 11 verification criteria and uncover how safety alignment shapes causal steering behavior.
Cite
×