Safety

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

Benchmark for medical hallucination by LLMs.

Extracting and Understanding the Superficial Knowledge in Alignment

We examined how superficial LLM alignments are thru a linear distillation method.