Discrete Event Dynamic Systems. 2003;13(1-2):111-148. doi: 10.1023/a:1022145020786 Q31.02025
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
马尔可夫奖励过程策略空间优化中的近似梯度方法
DOI: 10.1023/a:1022145020786
摘要 查看摘要
Discrete Event Dynamic Systems. 2003;13(1-2):111-148. doi: 10.1023/a:1022145020786 Q31.02025
DOI: 10.1023/a:1022145020786
摘要 查看摘要