The agents had fast reaction times, unsurprisingly, which gave them a slight advantage in initial experiments. In fact, they exceeded the win-rate of human players substantially, with an Elo rating (which corresponds to the probability of winning) of 1,600 compared with “strong” humans players’ 1,300 and average human players’ 1,050. So how’d the agents fare, ultimately? In a tournament involving 40 human players in which humans and agents were randomly matched in games (both as opponents and teammates), the FTW agents were more proficient than the baseline methods. They learned humanlike behaviors such as following teammates, camping in the opponent’s base, and defending their own base from waves of attackers, and they shed less advantageous behaviors (like closely following teammates around the map) as training progressed. The fully trained FTW agents, which run on commodity PC hardware, employed strategies generalizable across maps, team rosters, and team sizes. “You’re actually boosting performance - it looks like the multiagent aspects are actually making our life much easier in terms of succeeding in our research.” “This is a really, really powerful learning paradigm,” said Wojciech Marian Czarnecki, a research scientist at DeepMind who also contributed to AlphaStar. In all, agents individually played around 450,000 games of capture the flag, the equivalent of roughly four years of experience. They moreover leveraged a two-tier process to optimize their internal rewards and reinforcement learning on these rewards to suss out the overriding policies. Each agent learned its own reward signal, enabling them to generate their own internal goals (like capturing the flag). The FTW agents were trained in a population of 30 in total, which provided them with a range of teammates and opponents with which to play, and stages were selected randomly so as to prevent the agents from memorizing layouts. One is on a fast timescale and the other operates on a slow timescale, and they’re coupled by a variational objective, a type of memory they jointly use to make predictions about the game world and output actions through an emulated game controller. The ingested data is passed onto two recurrent long short-term memory (LSTM) networks, or networks capable of learning long-term dependencies. “The specific way we trained our … is a good example of how to scale up and operationalize some classic evolutionary ideas.”ĭeepMind’s cheekily-dubbed For The Win (FTW) agents learn directly from on-screen pixels using a convolutional neural network, a collection of mathematical functions (neurons) arranged in layers modeled after the visual cortex. “From a research perspective, it’s the novelty of the algorithmic approach that’s really exciting,” he said. He further explained that the key technique at play is reinforcement learning, which employs rewards to drive software policies toward goals - in the DeepMind agents’ case, whether their team won or not. The beauty of using approach like this is that you never know what kind of behaviors will emerge as the agents learn,” said Max Jaderberg, a research scientist at DeepMind who recently worked on AlphaStar, a machine learning system that recently bested a human team of professionals at StarCraft II. “No one has told how to play the game - only if they’ve beaten their opponent or not. In a paper published this week in the journal Science roughly a year following the preprint , researchers at DeepMind, the London-based subsidiary of Google parent company Alphabet, describe a system capable not only of learning how to play capture the flag in Id Software’s Quake III Arena, but of devising entirely novel human-level team-based strategies. But AI and machine learning promise to turn this paradigm on its head.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |