Junyuan Hong
Junyuan Hong
Research
Publications
Experiences
Teaching
Low-Rank
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.
Cite
×