Skip to main navigation Skip to search Skip to main content

A survey of progress in LLM alignment from the perspective of reward design

Miaomiao Ji, Yanqiu Wu, Zhibin Wu, Shoujin Wang, Jian Yang, Mark Dras, Usman Naseem*

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

Abstract

Reward design underpins the alignment of large language models with human intent, yet the landscape of reward formulations, construction pipelines, optimization roles, and evaluation practices remains fragmented. This review unifies recent progress by addressing four guiding questions. How are rewards specified mathematically across scalar, token-level, and multidimensional regimes, and which shaping objectives support each? How are rewards built in practice through rule-based, data-driven, and hybrid workflows that combine heterogeneous supervision sources? How do rewards interact with optimization paradigms spanning reinforcement learning from human feedback, direct preference optimization, and incontext alignment? How are rewards assessed via benchmarks that probe robustness, generalization, safety diagnostics, and downstream impact? Organizing the field around these questions yields a taxonomy that links structural choices to practical trade-offs, highlights emerging trends such as hybrid supervision and reinforcement-learning-free alignment, and clarifies how the progression of large language model alignment can be viewed as a continuous refinement of reward design strategies, outlining research priorities for reliable, value-sensitive reward systems.

Original languageEnglish
JournalIEEE Transactions on Artificial Intelligence
Early online date23 Jan 2026
DOIs
Publication statusE-pub ahead of print - 23 Jan 2026

Keywords

  • Human feedback
  • Large language model alignment
  • Preference learning
  • Reward design

Fingerprint

Dive into the research topics of 'A survey of progress in LLM alignment from the perspective of reward design'. Together they form a unique fingerprint.

Cite this