LLM

Scaling Textual Gradients via Sampling-Based Momentum

A momentum-based, sampling-driven method for scaling textual gradient optimization in LLM prompt engineering, improving performance and efficiency across diverse NLP tasks.

Reproducing Emotion Vector Part I

An independent reproduction of Anthropic's emotion vector research using the open-weight Llama 3.1 8B model, with the paper's verbatim 171 emotions, 100 topics, and 64 activities. We confirm 10 of 11 verification criteria, with causal steering r=0.956 closely matching the paper's r=0.85.

AD-VF: LLM-Automatic Differentiation Enables Fine-Tuning-Free Robot Planning from Formal Methods Feedback

A framework leveraging formal verification feedback for scalable and interpretable LLM-driven robot planning.

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

A study revealing safety-specific pitfalls of multi-model synthetic preference data in DPO alignment.

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

The first automated guardrail for agents.

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

A training-free approach that calibrates chain-of-thought reasoning in LLMs, improving accuracy while reducing computational overhead.

Extracting and Understanding the Superficial Knowledge in Alignment

We examined how superficial LLM alignments are thru a linear distillation method.

GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

We develop a chatbot for reminiscence therapy

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Zeroth-order optimization for LLM.