3

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.

Scaling Textual Gradients via Sampling-Based Momentum

A momentum-based, sampling-driven method for scaling textual gradient optimization in LLM prompt engineering, improving performance and efficiency across diverse NLP tasks.

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

A training-free approach that calibrates chain-of-thought reasoning in LLMs, improving accuracy while reducing computational overhead.

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

Benchmark for medical hallucination by LLMs.

A-CONECT: Designing AI-based Conversational Chatbot for Early Dementia Intervention

We develop a chatbot for early dementia prevention and leverage LLMs to build digital twins to evaluate chatbots.

A Privacy-Preserving Hybrid Federated Learning Framework for Financial Crime Detection

We develop a hybrid federated learning for learning financial-crime predictive models from horizontal and vertical federated data structures.

FedNoisy: A Federated Noisy Label Learning Benchmark

The recent decade witnessed a surge of increase in financial crimes across the public and private sectors, with an average cost of scams of $102m to financial institutions in 2022. Developing a mechanism for battling financial crimes is an impending …

Precautionary Unfairness in Self-Supervised Contrastive Pre-training

Recently, self-supervised contrastive pre-training has become the de facto regime, that allows for efficient downstream fine-tuning. Meanwhile, its fairness issues are barely studied, though they have drawn great attention from the machine learning …