AI bots learn to use tools while playing hide-and-seek

September 20th

An illustration of the hide-and-seek game environment. (OpenAI)

AI bots playing hide-and-side discovered and taught themselves how to use tools, researchers say, without being instructed to do so.

Developed by OpenAI, the research project involved teams of 2 or 3 bots playing brief games of hide-and-seek nearly 500 million times. The environments varied but were sparse, often consisting of walls and a collection of movable blocks and ramps that became increasingly important to the bots as they played.

Told only to avoid the seeking team's line-of-sight, hider bots taught themselves to move blocks to hide themselves after playing 22 million times. About 70 million games later, the seekers learned to move ramps and climb them to jump over the hiders' blockades.

The dynamics then became more complicated -- hiders learned to lock ramps, rendering them immovable. In turn, seekers began exploiting the game's mechanics. Instead of pushing blocks, seekers discovered that they could move a box while standing on it, allowing them to jump over any obstacles the hiders built. After 458 million games, the hiders taught themselves the ultimate, game-ending exploitation -- they blockaded themselves, and then rendered all tools immovable.

Researchers wrote that this proves that tool discovery, which they call "Emergent tool use," occurs among AI agents inside competitive environments.

To prove this, researchers placed individual bots inside similar environments, where they were left to explore and experiment with tools without the need to play any game. Unlike the agents in competitive envrionments, the non-competitive, experimental bots showcased increasingly erratic behavior as time went on.

OpenAI writes that this suggests that competitive environments better incentivize AI to showcase human-like behavior, and could be used to train AI agents to develop human-like skills.

From OpenAI's blog post:

Agents trained in hide-and-seek qualitatively center around far more human interpretable behaviors such as shelter construction, whereas agents trained with intrinsic motivation move objects around in a seemingly undirected fashion. Furthermore, as the state space increases in complexity, we find that intrinsic motivation methods have less and less meaningful interactions with the objects in their environment. For this reason, we believe multi-agent competition will be a more scalable method for generating human-relevant skills in an unsupervised manner as environments continue to increase in size and complexity.

Related people

Sam Altman

OpenAI CEO and former chairman of Y Combinator

The technology behind artificial intelligence

Neural networks: a computing model inspired by the human brain

September 10th

A pivotal technology in AI that enables machines to discover and recognize patterns and objects.

Reinforcement learning: how computers learn from experience

July 24th

A discipline of machine learning that allows the computer to experiment and discover rewards.

Sections

OpenAI

August 11th

OpenAI was founded as a non-profit research lab by Elon Musk and Sam Altman in 2015.

In February, 2018, Musk left, citing a conflict of interest with his work on Tesla's autopilot system.

In 2019, with Altman in charge, OpenAI formed OpenAI LP, a for-profit company it wrote will allow them "to rapidly increase our investments in compute and talent while including checks and balances to actualize our mission."

OpenAI has produced some impressive accomplishments-- in early 2019, its neural networks beat the world's best Dota 2 players. And in July, Microsoft invested $1 billion in OpenAI to pursue artificial general intelligence, an accomplishment many think is atill decades away, if not longer.

Go deeper →

Games

September 1st

Games have been an AI proving ground since the 1960s, when early bots competed with humans in checkers.

Today, leading AI startups train their technology on some of world's the most competitive and complicated video games, including Starcraft II and Dota 2.

Often, the training culminates in a real match against top players. Google DeepMind and OpenAI have both competed with human eSports champs.

Below are several high profile events:

OpenAI's OpenAI Five: OpenAI's AI system beat the world's top players at Dota 2 in April, 2019.
Google DeepMind's AlphaStar: Starcraft II is a real-time strategy game. In December, 2018, DeepMind's AlphaStar program beat a top-25 Starcraft II player.

Go deeper →

Sign up

Diagram is written, run and developed by me, Blake Hunsicker. Diagram is designed to help you explore the news and catch up on what you want to know.

I have a lot of new features in the works. For occasional updates, consider sharing your email below.

Go deeper →

AI bots learn to use tools while playing hide-and-seek

September 20th

An illustration of the hide-and-seek game environment. (OpenAI)

Related people

Sam Altman

OpenAI CEO and former chairman of Y Combinator

The technology behind artificial intelligence

Neural networks: a computing model inspired by the human brain

September 10th

Reinforcement learning: how computers learn from experience

July 24th

Sections

OpenAI

August 11th

Games

September 1st

Go deeper →

Sign up

Your email address

Copyright 2019 Diagram