A machine-executable format that replaces traditional papers with structured, agent-consumable research artifacts capturing logic, code, exploration, and evidence.
A momentum-based, sampling-driven method for scaling textual gradient optimization in LLM prompt engineering, improving performance and efficiency across diverse NLP tasks.
An independent reproduction of Anthropic's emotion vector research using the open-weight Llama 3.1 8B model, with the paper's verbatim 171 emotions, 100 topics, and 64 activities. We confirm 10 of 11 verification criteria, with causal steering r=0.956 closely matching the paper's r=0.85.
A token-level confidence-calibrated negative preference alignment method for LLM unlearning that removes undesirable knowledge without requiring retention data or contrastive pairs.
A training-free method that robustifies LLM safety alignment against fine-tuning by extrapolating low-rank safety subspaces, significantly reducing attack success rates while preserving model utility.