We find that LLMs can get Brain Rot just like human after browsing enormous brainless social media.
The first automated guardrail for agents.
Benchmark for medical hallucination by LLMs.
We examined how superficial LLM alignments are thru a linear distillation method.
A comprehensive privacy assessment of LLMs.
A comprehensive trustworthiness assessment of compressed LLMs.
We develop a chatbot for early dementia prevention and leverage LLMs to build digital twins to evaluate chatbots.
A new method for safely and robustly injecting watermark after training without training data.
We propose a new risk to published generative models that finetuning on generated samples can exacerbate the privacy leakage.
We make local LLMs to engineer privacy-preserving prompts that are transferrable for cloud models.