Today’s blog covers two recent reinforcement learning projects, both published in the top scientific magazine Nature. The first subject is about recent work from DeepMind, the Google subsidiary known for their development of, for example, AlphaGo and protein folding. The second is from gaming and electronics giant Sony and is about a self-driving car in Gran Turismo.
Safely Controlling Nuclear Fusion using Reinforcement Learning
With examples of severe climate change visible every day, researchers keep looking for new sources of clean energy. One of the most promising, and most complex technologies for clean energy is nuclear fusion. In theory, nuclear fusion can provide an unlimited source of clean energy but it unfortunately requires incredibly complex infrastructure and technology to function. This is because the temperatures involved in the nuclear fusion process are higher than those in the Sun’s core.
In their latest work, researchers from DeepMind and the École Polytechnique Fédérale de Lausanne (EPFL) institute use an AI agent, trained using reinforcement learning, to control part of the fusion reactor. In the reactor, a tokamak vessel, the challenge is to keep the plasma in a shape that is optimal for the fusion process. Due to the intense heat of the fusion process the plasma is not allowed to touch the reactor walls, instead the plasma is kept floating in the center of the reactor with the help of magnetic forces. These forces are generated and steered using magnetic actuator coils.
The authors of the work train a reinforcement learning algorithm to control the actuators, by sending voltage commands. The system is trained using a simulated version of the plasma reactor, and after the agent is trained, it is connected to the tokamak reactor. When connected to the reactor, the agent sends control commands at a rate of 10,000 per second, and it takes only 50 μs for the neural to compute the next action. The deployed agent was able to control the magnetic actuators such that the shape of the plasma matched with the predefined specifications.
This impressive work shows the power of reinforcement learning for real-time control in highly dynamic and time constraint environments. It also highlights how a robust controller can be relied upon to safely manage expensive assets in some of the most extreme conditions ever created. We are excited to be able to develop solutions for our clients that are created using similar technologies as used in this paper.
For more details, see the open-access paper here and the blog post here.
Combining RL with state-of-the-art racing games
AI has always been attractive to automotive, especially applied to self-driving cars, as seen by the work of Tesla and Waymo. So apart from street vehicles, it has also been one of the primary motivators for the Roborace events. And now Sony is showcasing their latest developments of self-driving (racing) cars in the Gran Turismo game, trained with Reinforcement Learning.
In the Gran Turismo the racing cars are modeled in great detail, and physical phenomena like downforce levels, air resistance, and tire friction are included. This results in a super realistic and detailed environment where enthusiastic gamers can operate, which also makes it ideal for realistic training of reinforcement learning algorithms.
In the research paper describing the Gran Turismo Sophie system, the authors describe how they trained the system in the virtual world using a cluster of PlayStation 4 consoles. The training cluster contained over 1,000 consoles and was combined with a large number of GPU and CPU resources for the training process.
The agent was modeled using a reinforcement learning training algorithm that taught the agent how to apply the throttle, steering wheel, and brakes. All this while optimizing the agent for not just controlling the car, but also for learning the best racing tactics (where to overtake) and etiquette (e.g., do not push your competitors off the road).
After being trained on many different scenarios, the system was tested against human drivers. The human team was able to win during the first iteration, but the second iteration of the agent, created after more tuning and training, was able to beat the human team.
There are still some steps to take before this can be deployed in a real racing car. However, this example shows that reinforcement learning, cloud computing, and platforms like DeepSim are ideally suited for developing complex control systems. The agents can take into account a large number of environment variables and can be tuned and optimized using various rules and KPIs. And just as important can be iterated upon over time based on human feedback.
For more information, see the Nature article and the project website.