Tags

DPO

RLHF

Gradient Descent

Momentum

NLP

Prompt Optimization

Sampling

Chain-of-Thought

CoT

Efficiency