This post is the first of the series Sandboxing AI: building blocks to (the) Universe which will look at different AI methods and algorithms in digital training/testing environments.
A recent blog post touched on the idea that today’s state-of-the-art AI is not really intelligent. Let’s take Google DeepMind’s AlphaGo as an example:, it did beat professional humans in one of the most complex games ever, Go, (and is widely regarded as the benchmark for modern AI) but it’s not able to generalise its successes in other fields. While the underlying techniques could potentially be extended other games such as chess, this will need partial rewriting of its source code (e.g. to deal with the different game rules). Even though AlphaGo is one of the most advanced examples of current AI, it is still part of the Narrow/Weak AI category, due to the limited scope it can be used within. The plethora of amazing AI applications that have been produced in the past years (from facial recognition to medical diagnosis) are all part of this category.
In contrast, the terms General/Strong AI refers to algorithms and methods that are not restricted to individual well defined tasks and environments. The AI often pictured in media and Hollywood movies is within this class (with the addition of homicidal tendencies). A major difficulty with General AI is finding/building a suitable environment that is both flexible and computable. The flexibility is needed to provide a dynamic environment where to perform a broad range of different tasks. Moreover, a computable environment allows for create models in a computer program with finite resources. Training environments and tasks without an adequate variety will cause the AI to overfit the training environment (and consequently fail when trivial changes are applied to the environment; Whiteson et al, 2011). This is where Microsoft’s Malmö comes in the picture.
What is Malmö?
No, we’re not talking about the Swedish city, although it is where the project gets its name from (Project Malmö Blog, 2016). Malmö is a relatively new AI Research Platform by Microsoft in mid-2016 that is built on top of the massively popular Minecraft game as an environment for AI entities. It has been released on GitHub under an open source license to “support openness and collaboration in AI research” (Johnson et al. 2016). With the wide range of possible activities in Minecraft, the Malmö Project is designed to promote the development of sophisticated General AI (Linn, 2016).
Wait… What is Minecraft?
Minecraft is a 3D sandbox, first released by the publisher Majong in May 2009 (GameSpot), in which the player is given free rein to explore a procedurally generated world made of cubes, and populated by a wide range of potentially dangerous creatures. Each cube represents a resource that can be collected to create various tools, combined into composite resources, and used to create shelter from the dangers of the outside (but still in-game) world. The Minecraft World makes an excellent environment to explore General AI techniques, as it has the flexibility to model a wide variety of real world problems (e.g. navigating into an unknown space, interact with other entities), while still being simple enough to represent in a computational model.
Okay, back to Malmö
Malmö provides an API to control the player entity within Minecraft, that can perceive and interact with its environment (this is commonly referred to as an Agent). Furthermore, Malmö also provides the tools to create custom environments, allowing researchers to implement necessary constraints to focus their experiments. The Malmö environment is perceived by the agent through various kinds of observations: from JSON objects representing surrounding cubes (e.g. position, shape and material), to the actual video frames obtained from the player entity’s point of view. The latter opens up the opportunity for applying recent developments in image processing using convolutional neural networks (CNN) to agent/game interactions.
The Malmö API is cross-platform project (it supports Linux, Mac OS X, and Windows), with bindings for C++, C#, Lua, Java, and Python. Source code and pre-built versions are available from GitHub, along with instructions for building from source and installing. The repository also contains a basic tutorial centred around the Python examples provided. These examples vary from a simple demonstration of player aimlessly movement in the space, to an efficiently navigating agent in a maze by processing only video observations.
I was particularly interested in the possibility of interactions between multiple agents and humans. Unfortunately, to run these multi-agent scenarios each agent requires its own Minecraft credentials (Malmö Issue #225). The Malmö development team is currently looking into possible workarounds, but until a solution is found the exploration of Multi-Agent Systems is currently restricted to those willing fork out the additional registration fees. Although this raises the cost of entry for hobbyists, Malmö still has a tremendous amount of potential as AI Research testing ground, for both Narrow and General AI.
Examples and Potential Projects
The team behind Malmö has provided a number of examples to demonstrate the potential of Malmö, most of which focus on navigation (space-awareness and optimal path finding; e.g. reaching a goal cube in various mazes). The first tutorials incorporate actions such as destroying obstacles in the way, and jumping over lava pits using JSON observations of the block directly in front of the agent. Following examples demonstrate techniques such as reinforcement learning and image processing. However, in these cases the shown techniques are rather simplistic and their solutions are inefficient. For instance, one of the examples involves a maze runner exploiting the graphical input. This running agent uses a depth map to navigate towards gaps, which often leads it into corners, where it becomes stuck. It is important to note that these examples are only meant to showcase the possibilities of Malmö as a AI research environment.
Screenshot of a Reinforcement Learning example © 2016 Rodney Pilgrim, Deakin Software and Technology Innovation Lab.
Future blog posts in this series will explore the potential of Malmö to look at various AI techniques in navigation and machine learning. For example, by integrating a CNN classifier with the video input, it could be possible to identify different resources in the game, and their distance from the player position. Since Malmö provides JSON descriptions of the environment, we can use these to label the video frames (most AI techniques are based on supervised learning). Other ideas include looking at different navigation techniques (e.g. methods to avoid dangerous obstacles) and investigation of evolutionary programming. While all this techniques are still examples of narrow AI by themselves, by using the common environment of Malmö we could then combine these techniques to create human-like players that might be able to survive in a standard Minecraft survival game.
EDIT: At the time of writing, a new sandbox AI training platform, Universe, was released by OpenAI. According to the presentation blog post, Malmö will be soon included in the platform (OpenAI 2016).
- “Deep Mind’s AlphaGo”: https://deepmind.com/research/alphago/
- GameSpot. Minecraft. CBS Interactive: http://www.gamespot.com/minecraft/
- Malmö Issue #225 (2016): https://github.com/Microsoft/malmo/issues/225
- Johnson M., Hofmann K., Hutton T., Bignell D. (2016) The Malmö Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence, Ed. Kambhampati S., p. 4246. AAAI Press, Palo Alto, California USA: http://www.ijcai.org/Proceedings/16/Papers/643.pdf
- Linn A. (2016). Project Malmö, which lets researchers use Minecraft for AI research, makes public debut. Next at Microsoft: http://blogs.microsoft.com/next/2016/07/07/project-malmo-lets-researchers-use-minecraft-ai-research-makes-public-debut/
- Minecraft. Microsoft: https://minecraft.net
- Project Malmö Blog. (2016) Home: http://microsoft.github.io/malmo/blog/#project-malm-blog
- Whiteson S. , Tanner B., Taylor M.E., and Stone P. (2011) Protecting against evaluation overfitting in empirical reinforcement learning. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL): http://www.cs.utexas.edu/users/ai-lab/?ADPRL11-shimon
- OpenAI, (2016) Universe: https://openai.com/blog/universe/
Thanks to Shannon Pace and Nicola Pastorello for proofreading and providing suggestions.