Abstract
Reward design underpins the alignment of large language models with human intent, yet the landscape of reward formulations, construction pipelines, optimization roles, and evaluation practices remains fragmented. This review unifies recent progress by addressing four guiding questions. How are rewards specified mathematically across scalar, token-level, and multidimensional regimes, and which shaping objectives support each? How are rewards built in practice through rule-based, data-driven, and hybrid workflows that combine heterogeneous supervision sources? How do rewards interact with optimization paradigms spanning reinforcement learning from human feedback, direct preference optimization, and incontext alignment? How are rewards assessed via benchmarks that probe robustness, generalization, safety diagnostics, and downstream impact? Organizing the field around these questions yields a taxonomy that links structural choices to practical trade-offs, highlights emerging trends such as hybrid supervision and reinforcement-learning-free alignment, and clarifies how the progression of large language model alignment can be viewed as a continuous refinement of reward design strategies, outlining research priorities for reliable, value-sensitive reward systems.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Artificial Intelligence |
| Early online date | 23 Jan 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 23 Jan 2026 |
Keywords
- Human feedback
- Large language model alignment
- Preference learning
- Reward design
Fingerprint
Dive into the research topics of 'A survey of progress in LLM alignment from the perspective of reward design'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver