Robustness

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.

MECTA: Memory-Economic Continual Test-Time Model Adaptation

Continual Test-time Adaptation (CTA) is a promising art to secure accuracy gains in continually-changing environments. The state-of-the-art adaptations improve out-of-distribution model accuracy via computation-efficient online test-time gradient …

Federated Robustness Propagation: Sharing Adversarial Robustness in Federated Learning

Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without requiring raw data to be shared. One major challenge of FL comes from heterogeneity in users, which may have …

Holistic Trustworthy ML

Instead of isolated properties, we target on a holistic trustworthiness covering every properties in one solution.