Junyuan Hong
Junyuan Hong
Research
Publications
Experiences
Teaching
Trustworthy
LLMs Can Get "Brain Rot"!
We find that LLMs can get Brain Rot just like human after browsing enormous brainless social media.
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
The first automated guardrail for agents.
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
Benchmark for medical hallucination by LLMs.
LLM-PBE: Assessing Data Privacy in Large Language Models
A comprehensive privacy assessment of LLMs.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
A comprehensive trustworthiness assessment of compressed LLMs.
Safe and Robust Watermark Injection with a Single OoD Image
A new method for safely and robustly injecting watermark after training without training data.
Who Leaked the Model? Tracking IP Infringers in Accountable Federated Learning
Tracking IP leakage in federated learning.
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
We uncover the security risk of data-free distillation from a poisoned teacher and propose the first countermeasure.
How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts
Increasing concerns have been raised on deep learning fairness in recent years. Existing fairness-aware machine learning methods mainly focus on the fairness of in-distribution data. However, in real-world applications, it is common to have …
»
Cite
×