AI英语|2024“计算机界的诺贝尔”图灵奖,80年代的RL理论如何引爆今日AI革命
作者:微信文章#2024图灵奖
ACM授予安德鲁·巴托和理查德·萨顿2024年ACM图灵奖,以表彰他们在强化学习(RL)的概念和算法基础上的开创性工作。
关于图灵奖你了解多少?RL呢?我们从ACM官网发布的文章里来窥一二。
Lexile:1250~1350 ,1190 words(文本包含专业术语,如‘MDP’、‘DRL’,句式结构复杂,但整体逻辑清晰,来源:https://awards.acm.org/about/2024-turing)
ACM A.M. Turing Award Honors Two Researchers Who Led the Development of Cornerstone AI Technology
Andrew Barto and Richard Sutton Recognized as Pioneers of Reinforcement Learning
ACM has named Andrew G. Barto and Richard S. Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations(基础), and developed important algorithms(算法) for reinforcement learning—one of the most important approaches for creating intelligent systems.
Barto is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. Sutton is a Professor of Computer Science at the University of Alberta, a Research Scientist at Keen Technologies, and a Fellow at Amii (Alberta Machine Intelligence Institute).
The ACM A.M. Turing Award, often referred to as the “Nobel Prize in Computing,” carries a $1 million prize with financial support provided by Google, Inc. The award is named for Alan M. Turing, the British mathematician who articulated the mathematical foundations of computing.
What is Reinforcement Learning?
The field of artificial intelligence (AI) is generally concerned with constructing agents—that is, entities that perceive and act. More intelligent agents are those that choose better courses of action. Therefore, the notion that some courses of action are better than others is central to AI. Reward—a term borrowed from psychology and neuroscience(神经科学)—denotes a signal provided to an agent related to the quality of its behavior. Reinforcement learning (RL) is the process of learning to behave more successfully given this signal.
The idea of learning from reward has been familiar to animal trainers for thousands of years. Later, Alan Turing’s 1950 paper “Computing Machinery and Intelligence,” addressed the question “Can machines think?” and proposed an approach to machine learning based on rewards and punishments.
While Turing reported having conducted some initial experiments with this approach and Arthur Samuel developed a checker-playing program in the late 1950s that learned from self-play, little further progress occurred in this vein of AI in the following decades. In the early 1980s, motivated by observations from psychology, Barto and his PhD student Sutton began to formulate reinforcement learning as a general problem framework.
They drew on the mathematical foundation provided by Markov decision processes (MDPs), wherein an agent makes decisions in a stochastic (randomly determined) environment, receiving a reward signal after each transition and aiming to maximize its long-term cumulative reward. Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems, as explained further below.
Barto and Sutton, jointly and with others, developed many of the basic algorithmic approaches for RL. These include their foremost contribution, temporal difference learning, which made an important advance in solving reward prediction problems, as well as policy-gradient methods and the use of neural networks as a tool to represent learned functions. They also proposed agent designs that combined learning and planning, demonstrating the value of acquiring knowledge of the environment as a basis for planning.
Perhaps equally influential was their textbook, Reinforcement Learning: An Introduction (1998), which is still the standard reference in the field and has been cited over 75,000 times. It allowed thousands of researchers to understand and contribute to this emerging field and continues to inspire much significant research activity in computer science today.
Although Barto and Sutton’s algorithms were developed decades ago, major advances in the practical applications of RL came about in the past fifteen years by merging RL with deep learning algorithms (pioneered by 2018 Turing Awardees Bengio, Hinton, and LeCun). This led to the technique of deep reinforcement learning.
The most prominent example of RL was the victory by the AlphaGo computer program over the best human Go players in 2016 and 2017. Another major achievement recently has been the development of the chatbot ChatGPT. ChatGPT is a large language model (LLM) trained in two phases, the second of which employs a technique called reinforcement learning from human feedback (RLHF), to capture human expectations.
RL has achieved success in many other areas as well. A high-profile research example is robot motor skill learning in the in-hand robotic manipulation(操纵) and solution of a physical (Rubik’s Cube), which showed it possible to do all the reinforcement learning in simulation yet ultimately be successful in the significantly different real world.
Other areas include network congestion control, chip design, internet advertising, optimization(优化), global supply chain optimization, improving the behavior and reasoning capabilities of chatbots, and even improving algorithms for one of the oldest problems in computer science, matrix multiplication.
Finally, a technology that was partly inspired by neuroscience has returned the favor. Recent research, including work by Barto, has shown that specific RL algorithms developed in AI provide the best explanations for a wide range of findings concerning the dopamine system in the human brain.
“Barto and Sutton’s work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field,” explains ACM President Yannis Ioannidis. “Research areas ranging from cognitive(认知) science and psychology to neuroscience inspired the development of reinforcement learning, which has laid the foundations for some of the most important advances in AI and has given us greater insight into how the brain works. Barto and Sutton’s work is not a stepping stone that we have now moved on from. Reinforcement learning continues to grow and offers great potential for further advances in computing and many other disciplines. It is fitting that we are honoring them with the most prestigious award in our field.”
“In a 1947 lecture, Alan Turing stated ‘What we want is a machine that can learn from experience,’” noted Jeff Dean, Senior Vice President, Google. “Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing’s challenge. Their work has been a lynchpin of progress in AI over the last several decades. The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments. RL’s impact will continue well into the future. Google is proud to sponsor the ACM A.M. Turing Award and honor the individuals who have shaped the technologies that improve our lives.”
About the ACM A.M. Turing Award
The A.M. Turing Award was named for Alan M. Turing, the British mathematician who articulated the mathematical foundation and limits of computing, and who was a key contributor to the Allied cryptanalysis of the Enigma cipher during World War II. Since its inception in 1966, the Turing Award has honored the computer scientists and engineers who created the systems and underlying theoretical foundations that have propelled the information technology industry.
AI Buzzwords
Reinforcement Learning (RL) 强化学习
· 通过奖励信号指导智能体逐步优化行为的AI学习范式。
(https://techvidvan.com/tutorials/reinforcement-learning)
Deep Reinforcement Learning (DRL) 深度强化学习
· 结合深度学习与强化学习的AI技术,用于复杂决策问题(如Alpha Go)。
Markov Decision Process (MDP) 马尔可夫决策过程
· 描述智能体在随机环境中基于状态转移和奖励进行决策的数学模型。
Temporal Difference Learning (TD Learning) 时序差分学习
· 通过预测未来奖励与当前奖励的差异来更新策略的RL算法。
(https://www.lancaster.ac.uk/stor-i-student-sites/jordan-j-hood/2021/04/12/reinforcement-learning-temporal-difference-td-learning/)
Neural Networks 神经网络
· 受生物神经系统启发,用于函数逼近和模式识别的AI模型。
(https://www.ibm.com/think/topics/neural-networks)
2024年图灵奖概况
01
时间
2025年3月5日,美国计算机协会(ACM)宣布2024年ACM A.M. 图灵奖授予Andrew G. Barto和Richard S. Sutton。
02
事件
两人因“开发强化学习的概念与算法基础”获此殊荣,该奖项被誉为“计算机界的诺贝尔奖”。
03 核心贡献
自20世纪80年代起,提出强化学习(Reinforcement Learning RL)的核心思想,构建数学基础并开发关键算法(如时间差分学习、策略梯度方法)。
合著经典教材《Reinforcement Learning: An Introduction》(1998年),被引用超75,000次,成为领域权威参考书。
04
奖项背景
奖金100万美元,由谷歌公司赞助。
以英国数学家艾伦·图灵命名,表彰其对计算理论的奠基性贡献。
END
页:
[1]