曾妮,陈俊豪,傅清爽
摘要:为解决单目标玩家在仅知道当天的天气状况下如何规划最佳行动策略的问题,提出一种基于贪心算法的动态规划策略。通过分析单目标玩家的状态转移过程,提出基于Floyd算法得出最短路径以及贪心算法的最优后续决策期望方法,分析最终收益的期望值,从而选择一种最佳行动策略,并通过蒙特卡洛模拟对天气进行随机模拟,将出现概率最大的视为最佳路线进行对比检验。分析结果表明:该策略能够使玩家在一般情况的未知天气组合下选择出最佳行动路线,使得最终资金收益值达到最大。
关键词:动态规划模型;蒙特卡洛模拟;贪心算法;Floyd算法;决策模型
中图分类号:TP391.9 文献标识码:A
文章编号:1009-3044(2021)20-0141-03
Dynamic Programming Strategy Based on Greedy Algorithm
ZENG Ni1, CHEN Jun-hao2, FU Qing-shuang3
(1.School of Science, Jiangxi University of Science and Technology, Ganzhou 341000,China;2.School of Civil and Surveying Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China; 3.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China)
Abstract: To solve the problem of single target player under just know the day's weather conditions due to the problem of how to plan the best strategies in this paper, a dynamic planning strategy based on greedy algorithm, through the analysis of the status of the single target player transfer process resource state function model is established and the optimal decision model of funds, it is concluded that the shortest path based on Floyd algorithm, and the optimal expected follow-up decision-making method based on greedy algorithm, the analysis of the subsequent decisions ultimately earnings expectations, to choose a best course of action strategy, and through monte carlo simulation to stochastic simulation of the weather, will be regarded as the best route with the highest probability compared test The analysis results show that this strategy enables the player to choose the best course of action under the general circumstance of unknown weather combination, which makes the final capital gain reach the maximum.
Key words: dynamic programming model; monte carlo simulation; greedy algorithm; floyd algorithm;decision-making mode
1 引言
近年来,越来越多的探险家为了领略沙漠壮观的景色以及对自己毅力的考验进而选择徒步穿越沙漠,为了更方便地对探险家行走方式进行研究,将此过程模擬成一款穿越沙漠的小游戏,从沙漠的起点出发前往所规划的终点过程,会受到多种因素的限制,而探险家穿越沙漠希望能够在预计时间内到达终点且此过程花费的成本最少,因此途中如何进行决策将面临挑战。
程凯等[1]通过将地图数字化后,通过历遍前往矿山以及村庄的所有路径,从中得到在天气已知的情况下第一关和第二关的最优解,其次,在天气未知的情况下,通过最大似然估计得到未来天气的分布函数来预测未来天气,但具体最佳行动策略仍未得出确切解。臧洋等[2]根据Bellman-Ford算法和最短路的思想,通过确定目标函数和约束条件,搭建线性规划模型,得到在天气已知的情况下每种情况的最优策略,但对于单个玩家在天气未知的情况下没有给出具体的分析。
笔者基于贪心算法在单目标玩家仅知当天天气状况下,对比得出最优后续决策期望的选择策略,并通过蒙特卡洛模拟对天气进行随机模拟,将出现概率最大的视为最佳路线进行对比检验,验证了该种选择策略方法的可行度。
2 模型建立
2.1 问题提出