Junyuan Hong
Junyuan Hong
Research
Publications
Experiences
Teaching
AI Safety
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
The first automated guardrail for agents.
Cite
×