Junyuan Hong
Junyuan Hong
Research
Publications
Experiences
Teaching
1
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
The first automated guardrail for agents.
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
A training-free approach that calibrates chain-of-thought reasoning in LLMs, improving accuracy while reducing computational overhead.
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
Benchmark for medical hallucination by LLMs.
Extracting and Understanding the Superficial Knowledge in Alignment
We examined how superficial LLM alignments are thru a linear distillation method.
GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing
We develop a chatbot for reminiscence therapy
LLM-PBE: Assessing Data Privacy in Large Language Models
A comprehensive privacy assessment of LLMs.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
A comprehensive trustworthiness assessment of compressed LLMs.
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Zeroth-order optimization for LLM.
»
Cite
×