An adversarial attack that recovers supposedly unlearned multi-modality knowledge from MLLMs via prompt-suffix optimization and fine-tuning, exposing vulnerabilities in machine unlearning defenses.
A token-level confidence-calibrated negative preference alignment method for LLM unlearning that removes undesirable knowledge without requiring retention data or contrastive pairs.