Thomas Bolander: Learning to Plan from Raw Data in Grid-based Games [2017-12-7]

西溪逻辑论坛第76期

Speaker: Thomas Bolander (Technical University of Denmark)
Date & Time: 7 December 2017 (Thursday), 18:30 – 20:30
Place: Room 259, Main Teaching Building, Zhejiang University
Title: Learning to Plan from Raw Data in Grid-based Games

Abstract:
An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional (e.g. pixels) and noisy. Recent methods in deep reinforcement learning for games learn from vector representations of pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from high-dimensional visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. We evaluate our approach on visual versions of the standard domain Sokoban, and show that it learns a transition model that can be successfully used to solve new levels of the game.

会后报道：

12月7日，作为BRaD的2017年系列活动之一的第76期 “西溪逻辑论坛”，来自丹麦技术大学应用数学和计算机科学的副教授Thomas Bolander，分享了他来自神经网络和机器学习的研究成果，在浙江大学语言与认知研究中心举办了一场题为“基于搜索规划和机器学习的网格游戏求解”的讲座。

网格游戏模拟的是具有固定格局、动作结果取决于玩家策略的一类场景，这在一定程度上类似于自主学习的agent在动态环境中的工作机制。Bolander以推箱子游戏为例，尝试用机器学习的方法对该游戏在不同格局下的求解建立一个通用模型。基于agent在任何状态下都亲知自己网格坐标的假设，他首先建立了游戏底层的逻辑框架，这是一个由状态集、动作集、目标集和转移函数组成的元组。agent的动作规划是基于决策搜索技术。根据游戏规则，agent的每一个动作只影响到邻近单位网格其中之一，因此，只需要对全局中局部网格进行计算并比对与前一个动作之间的差别。随后，Bolander引入了神经网络使agent进行强化学习。为了提高计算效率，Bolander对图像像素进行降维并尝试使用各种搜索策略。经过运行测试，Bolander建立的机器学习模型至少可以实现在六万步以内对30*30网格中8个箱子的游戏百分百求解。Bolander还发现，将一步式搜索和规划搜索结合的搜索策略将大大提高机器学习的效率。

Bolander的研究和发现，为大数据背景下的推理与决策提供了新的思路和方法，也是当前热门的机器学习在深化实践和应用方向上的一个推进。
(李崇慧报道)