NeurIPS,神经信息处理系统大会(Conference on Neural Information Processing Systems),是机器学习领域的顶级国际会议,会议固定在每年的12月举行。NeurIPS2020论文接收的数据一出,想必大家都注意到了这位论文接收量第一的大佬——Sergey Levine。他以高达12篇的论文接收量位居第一,令人称羡。
Sergey Levine的学者画像
Sergey Levine目前是任职于加州大学伯克利分校电气工程与计算机科学系的助理教授。他于2009 年获得斯坦福大学计算机科学学士和硕士学位,2014 年获得斯坦福大学计算机科学博士学位。
Sergey Levine研究方向主要集中在控制与机器学习之间的交叉融合,旨在开发能够使机器具有自主掌握执行复杂任务技能的算法和技术。
探寻Sergey Levine更多信息,可以直达网页进行了解:
Sergey Levine与NeurIPS
在机器学习领域深耕的Sergey Levine,当然不会错过该领域的顶级会议NeurIPS。他近三年在NeurIPS上的论文接收量都相当可观,19年和20年更是以12篇的数量蝉联第一的宝座。是当之无愧的学术大牛。
如何走近Sergey Levine?
大家是否已经迫不及待地想“走近”Sergey Levine?虽然不能和他本人握手言欢,但是可以轻松走近他的研究领域、研读他的论文信息、了解最前沿的资讯!
AMiner平台收录了Sergey Levine 310篇论文,有详细的会议论文标注,并可按照年份、引用量进行分类查看。
当然,对于有能力有兴趣的朋友们,可以大胆尝试“努力飞上天和大佬肩并肩“的快乐!比如下面这位来自知乎的朋友,就如愿以偿的走近了Sergey Levine。
下面我们一起详细了解Sergey Levine的NeurIPS2020最新论文吧!
Sergey Levine被NeurIPS2020接收的12篇论文,均收录在AMiner平台的NeurIPS2020专题中,具体如下:
1.论文标题:MOPO: Model-based Offline Policy Optimization
论文作者:Yu Tianhe, Thomas Garrett, Yu Lantao, Ermon Stefano, Zou James, Levine Sergey, Finn Chelsea, Ma Tengyu
Recent advances in machine learning using deep neural networks have shown significant successes in scaling to large realistic datasets, such as ImageNet [12] in computer vision, SQuAD [54] in NLP, and RoboNet [9] in robot learning.
While off-policy RL algorithms [42, 26, 19] can in principle utilize previously collected datasets, they perform poorly without online data collection.
These failures are generally caused by large extrapolation error when the Q-function is evaluated on out-of-distribution actions [18, 35].
2.论文标题:Gradient Surgery for Multi-Task Learning
论文作者:Yu Tianhe, Kumar Saurabh, Gupta Abhishek, Levine Sergey, Hausman Karol, Finn Chelsea
While deep learning and deep reinforcement learning (RL) have shown considerable promise in enabling systems to perform complex tasks, the data requirements of current methods make it difficult to learn a breadth of capabilities when all tasks are learned individually from scratch.
Learning multiple tasks all at once results in a difficult optimization problem, sometimes leading to worse overall performance and data efficiency compared to learning tasks individually (Parisotto et al, 2015; Rusu et al, 2016a).
When considering the combined optimization landscape for multiple tasks, SGD produces gradients that struggle to efficiently find the optimum
This occurs due to a gradient thrashing phenomenon, where the gradient of one task destabilizes optimization in the valley.
In Section 6.2, the authors find experimentally that this thrashing phenomenon occurs in a neural network multi-task learning problem
3.论文标题:Conservative Q-Learning for Offline Reinforcement Learning
论文作者:Kumar Aviral, Zhou Aurick, Tucker George, Levine Sergey
Recent advances in reinforcement learning (RL), especially when combined with expressive deep network function approximators, have produced promising results in domains ranging from robotics [31] to strategy games [4] and recommendation systems [37].
Offline RL algorithms based on this basic recipe suffer from action distribution shift [32, 62, 29, 36] during training, because the target values for Bellman backups in policy evaluation use actions sampled from the learned policy, πk, but the Q-function is trained only on actions sampled from the behavior policy that produced the dataset D, πβ.
The authors develop a conservative Q-learning (CQL) algorithm, such that the expected value of a policy under the learned Q-function lower-bounds its true value.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。