A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.
A momentum-based, sampling-driven method for scaling textual gradient optimization in LLM prompt engineering, improving performance and efficiency across diverse NLP tasks.