A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.
A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.
A momentum-based, sampling-driven method for scaling textual gradient optimization in LLM prompt engineering, improving performance and efficiency across diverse NLP tasks.
The first automated guardrail for agents.
A training-free approach that calibrates chain-of-thought reasoning in LLMs, improving accuracy while reducing computational overhead.
We examined how superficial LLM alignments are thru a linear distillation method.
We develop a chatbot for reminiscence therapy
Zeroth-order optimization for LLM.
We develop a chatbot for early dementia prevention and leverage LLMs to build digital twins to evaluate chatbots.
We make local LLMs to engineer privacy-preserving prompts that are transferrable for cloud models.