DeepMind, which became Google's subsidiary last year, has taken another step forward in artificial intelligence with an algorithm that can master several games.
The algorithm, deep Q-network (DQN), combines deep neural networks and reinforcement learning. Deep neural networks have multiple hidden layers between the data inputs and outputs. Reinforcement learning is where a machine learns which actions to take in its environment to maximise cumulative rewards by trial and error.
AI developers to date have not fully been able to build an algorithm for machines to master a variety of tasks or disciplines; it’s usually one algorithm for one discipline, according to DeepMind.
DeepMind, however, has found a way for machines to learn a variety of challenging tasks from scratch using a single algorithm, advancing artificial intelligence. It used games as a way to test and demonstrate what its algorithm can do.
The algorithm learnt to play 49 Atari 2600 arcade or retro games starting out as a newbie and progressing up to the level of an expert human gamer, without having to modify or re-adjust it each time it learnt a new game. The games it was tested on included Video Pinball, Boxing, Breakout, Star Gunner and Robotank.
The algorithm used high-dimensional sensory input data, relying only on the raw pixels and the game score for input.
DQN surpassed the performance of other machine learning algorithms in 43 of the 49 games, and achieved more than 75 per cent of the level of a professional human player in more than half of the games.
Figure courtesy of Mnih et al. "Human-level control through deep reinforcement learning”, Nature 26 Feb. 2015
“In certain games, DQN even came up with surprisingly far-sighted strategies that allowed it to achieve the maximum attainable score. For example, in Breakout, it learned to first dig a tunnel at one end of the brick wall so the ball could bounce around the back and knock out bricks from behind,” DeepMind’s Dharshan Kumaran and Demis Hassabis wrote on the Google Research Blog.
‘Experience replay’ – remembering past experiences and repeating them in the algorithm to speed up the learning process – was key in developing DQN.
“During the learning phase, DQN was trained on samples drawn from a pool of stored episodes—a process physically realised in a brain structure called the hippocampus through the ultra-fast reactivation of recent experiences during rest periods (e.g. sleep),” wrote Kumaran and Hassabis.
“The incorporation of experience replay was critical to the success of DQN: disabling this function caused a severe deterioration in performance.
DeepMind hopes to use DQN to build more intelligent Google data apps – anything from helping people plan their travels, to helping neuroscientists understand how the human brain learns.
“Imagine if you could ask the Google app to complete any kind of complex task [like] ‘Okay Google, plan me a great backpacking trip through Europe!’
“We also hope this kind of domain general learning algorithm will give researchers new ways to make sense of complex large-scale data creating the potential for exciting discoveries in fields such as climate science, physics, medicine and genomics,” Kumaran and Hassabis wrote.
DeepMind’s Neural Turing Machines
Another step forward in AI that DeepMind has made is its neural network that mimics short-term memory in the human brain.
In its research paper on the ‘Neural Turing Machines’, released in October 2014, DeepMind states the NTM extends the capabilities of recurrent neural networks (RNNs) – which make feedback connections – and resembles a working memory system.
Basically, it allows a machine to learn as it stores algorithms and data, and then retrieves this later to carry out tasks it hasn’t been taught to do before.
“This enrichment is primarily via a large, addressable memory, so, by analogy to Turing’s enrichment of finite-state machines by an infinite memory tape, we dub our device a ‘Neural Turing Machine’ (NTM),” said DeepMind in its research paper.
“A NTM architecture contains two basic components: a neural network controller and a memory bank… The controller network receives inputs from an external environment and emits outputs in response.
“Unlike a standard network, it also interacts with a memory matrix using selective read and write operations.”
DeepMind tested its NTM against conventional neural networks to see if it could copy data sequences beyond the length it had been trained to copy.
“We were curious to see if a network that had been trained to copy sequences of length up to 20 could copy a sequence of length 100 with no further training.
“NTM (with either a feedforward or LSTM controller) learned much faster than LSTM [long short-term memory, a RNN architecture] alone, and converged to a lower cost… NTM continues to copy as the length increases, while LSTM rapidly degrades beyond length 20.”