内容简介
第一章,从阿尔法零的卓越性能出发,深入解读其背后着实不易的成长历程,揭示其数学模型。第二章,从确定性和随机动态规划问题入手,介绍决策问题的数学模型。第三章,从抽象视角回顾纷繁复杂的强化学习算法,揭示值函数近似与滚动改进的重要作用。第四章,从经典的线性二次型最优控制问题入手,分析从阿尔法零的成功中学到的经验。第五章,分别从鲁棒、自适应、模型预测控制等问题入手,分析值函数近似与滚动改进对算法性能的提升潜力。第六章,从离散优化的视角审视阿尔法零的成功经验。第七章,总结全书。适合作为本领域研究者作为学术专著阅读,也适合作为研究生和本科生作为参考书使用。
目录
1. AlphaZero, Off-Line Training, and On-Line Play
1.1. Off-Line Training and Policy Iteration P. 3
1.2. On-Line Play and Approximation in Value Space -
Truncated Rollout p. 6
1.3. The Lessons of AlphaZero p. 8
1.4. A New Conceptual Framework for Reinforcement Learning p. 11
1.5. Notes and Sources p. 14
2. Deterministic and Stochastic Dynamic Programming
2.1. Optimal Control Over an Infinite Horizon p. 20
2.2. Approximation in Value Space p. 25
2.3. Notes and Sources p. 30
3. An Abstract View of Reinforcement Learning
3.1. Bellman Operators p. 32
3.2. Approximation in Value Space and Newton's Method p. 39
3.3. Region of Stability p. 46
3.4. Policy Iteration, Rollout, and Newton's Method p. 50
3.5. How Sensitive is On-Line Play to the Off-Line
Training Process? p. 58
3.6. Why Not Just Train a Policy Network and Use it Without
On-Line Play? p. 60
3.7. Multiagent Problems and Multiagent Rollout p. 61
3.8. On-Line Simplified Policy Iteration p. 66
3.9. Exceptional Cases p. 72
3.10. Notes and Sources p. 79
4. The Linear Quadratic Case - Illustrations
4.1. Optimal Solution p. 82
4.2. Cost Functions of Stable Linear Policies p. 83
4.3. Value Iteration p. 86
vii
viii Contents
4.4. One-Step and Multistep Lookahead - Newton Step
Interpretations p. 86
4.5. Sensitivity Issues p. 91
4.6. Rollout and Policy Iteration p. 94
4.7. Truncated Rollout - Length of Lookahead Issues . . ? p. 97
4.8. Exceptional Behavior in Linear Quadratic Problems . ? p. 99
4.9. Notes and Sources p. 100
5. Adaptive and Model Predictive Control
5.1. Systems with Unknown Parameters - Robust and
PID Control p. 102
5.2. Approximation in Value Space, Rollout, and Adaptive
Control p. 105
5.3. Approximation in Value Space, Rollout, and Model
Predictive Control p. 109
5.4. Terminal Cost Approximation - Stability Issues . . . p. 112
5.5. Notes and Sources p. 118
6. Finite Horizon Deterministic Problems - Discrete
Optimization
6.1. Deterministic Discrete Spaces Finite Horizon Problems. p. 120
6.2. General Discrete Optimization Problems p. 125
6.3. Approximation in Value Space p. 128
6.4. Rollout Algorithms for Discrete Optimization . . . p. 132
6.5. Rollout and Approximation in Value Space with Multistep
Lookahead p. 149
6.5.1. Simplified Multistep Rollout - Double Rollout . . p. 150
6.5.2. Incremental Rollout for Multistep Approximation in
Value Space p. 153
6.6. Constrained Forms of Rollout Algorithms p. 159
6.7. Adaptive Control by Rollout with a POMDP Formulation p. 173
6.8. Rollout for Minimax Control p. 182
6.9. Small Stage Costs and Long Horizon - Continuous-Time
Rollout p. 190
6.10. Epilogue p. 197
Appendix A: Newton's Method and Error Bounds
A.1. Newton's Method for Differentiable Fixed
Point Problems p. 202
A.2. Newton's Method Without Differentiability of the
Hellman Operator p. 207
Contents ix
A.3. Local and Global Error Bounds for Approximation in
Value Space p. 210
A.4. Local and Global Error Bounds for Approximate
Policy Iteration p. 212
References p. 217