赞
踩
AMiner平台由清华大学计算机系研发,拥有我国完全自主知识产权。平台包含了超过2.3亿学术论文/专利和1.36亿学者的科技图谱,提供学者评价、专家发现、智能指派、学术地图等科技情报专业化服务。系统2006年上线,吸引了全球220个国家/地区1000多万独立IP访问,数据下载量230万次,年度访问量超过1100万,成为学术搜索和社会网络挖掘研究的重要数据和实验平台。
AMiner平台:https://www.aminer.cn
NeurIPS,神经信息处理系统大会(Conference on Neural Information Processing Systems),是机器学习领域的顶级国际会议,会议固定在每年的12月举行。NeurIPS2020论文接收的数据一出,想必大家都注意到了这位论文接收量第一的大佬——Sergey Levine。他以高达12篇的论文接收量位居第一,令人称羡。
如此“战绩辉煌”的大佬是如何练成的?快跟我们一起一探究竟吧!
Sergey Levine的学者画像
Sergey Levine目前是任职于加州大学伯克利分校电气工程与计算机科学系的助理教授。他于2009 年获得斯坦福大学计算机科学学士和硕士学位,2014 年获得斯坦福大学计算机科学博士学位。
Sergey Levine研究方向主要集中在控制与机器学习之间的交叉融合,旨在开发能够使机器具有自主掌握执行复杂任务技能的算法和技术。
学者画像及更详细的内容介绍,可以查看AMiner平台的学者专页。
学者专页内容丰富,不仅有学者的基本信息,还有学者的关系网络,收录包括学生、合作者和导师在内的关系,形成学者的关系网;研究兴趣可以查看学者具体的研究领域、作者统计可以查看论文引用量、H指数等等学术指标;工作经历、教育背景也可以更好的帮你追踪大牛的“炼成”轨迹;相似作者能带你了解领域内的权威学者。
探寻Sergey Levine更多信息,可以直达网页进行了解:
https://www.aminer.cn/profile/sergey-levine/53f42828dabfaeb22f3ce756
Sergey Levine与NeurIPS
在机器学习领域深耕的Sergey Levine,当然不会错过该领域的顶级会议NeurIPS。他近三年在NeurIPS上的论文接收量都相当可观,19年和20年更是以12篇的数量蝉联第一的宝座。是当之无愧的学术大牛。
如何走近Sergey Levine?
大家是否已经迫不及待地想“走近”Sergey Levine?虽然不能和他本人握手言欢,但是可以轻松走近他的研究领域、研读他的论文信息、了解最前沿的资讯!
AMiner平台收录了Sergey Levine 310篇论文,有详细的会议论文标注,并可按照年份、引用量进行分类查看。
直达链接:https://www.aminer.cn/profile/sergey-levine/53f42828dabfaeb22f3ce756
当然,对于有能力有兴趣的朋友们,可以大胆尝试“努力飞上天和大佬肩并肩“的快乐!比如下面这位来自知乎的朋友,就如愿以偿的走近了Sergey Levine。
总结起来需要具备以下三点能力:①成绩好,G点4.0及以上或人工智能课程A+;②最好有做过project的经历;③英文文献的阅读和分析能力。
下面我们一起详细了解Sergey Levine的NeurIPS2020最新论文吧!
Sergey Levine被NeurIPS2020接收的12篇论文,均收录在AMiner平台的NeurIPS2020专题中,具体如下:
直达链接:https://www.aminer.cn/conf/neurips2020/papers
让我们一起更深入的了解其中的3篇论文吧!
1.论文标题:MOPO: Model-based Offline Policy Optimization
论文作者:Yu Tianhe, Thomas Garrett, Yu Lantao, Ermon Stefano, Zou James, Levine Sergey, Finn Chelsea, Ma Tengyu
论文链接:https://www.aminer.cn/pub/5ecf8d2391e01149f850f4dd?conf=neurips2020
简介:
Recent advances in machine learning using deep neural networks have shown significant successes in scaling to large realistic datasets, such as ImageNet [12] in computer vision, SQuAD [54] in NLP, and RoboNet [9] in robot learning.
While off-policy RL algorithms [42, 26, 19] can in principle utilize previously collected datasets, they perform poorly without online data collection.
These failures are generally caused by large extrapolation error when the Q-function is evaluated on out-of-distribution actions [18, 35].
2.论文标题:Gradient Surgery for Multi-Task Learning
论文作者:Yu Tianhe, Kumar Saurabh, Gupta Abhishek, Levine Sergey, Hausman Karol, Finn Chelsea
论文链接:https://www.aminer.cn/pub/5e281dbb3a55ac4d187e07a0?conf=neurips2020
简介:
While deep learning and deep reinforcement learning (RL) have shown considerable promise in enabling systems to perform complex tasks, the data requirements of current methods make it difficult to learn a breadth of capabilities when all tasks are learned individually from scratch.
Learning multiple tasks all at once results in a difficult optimization problem, sometimes leading to worse overall performance and data efficiency compared to learning tasks individually (Parisotto et al, 2015; Rusu et al, 2016a).
When considering the combined optimization landscape for multiple tasks, SGD produces gradients that struggle to efficiently find the optimum
This occurs due to a gradient thrashing phenomenon, where the gradient of one task destabilizes optimization in the valley.
In Section 6.2, the authors find experimentally that this thrashing phenomenon occurs in a neural network multi-task learning problem
3.论文标题:Conservative Q-Learning for Offline Reinforcement Learning
论文作者:Kumar Aviral, Zhou Aurick, Tucker George, Levine Sergey
论文链接:https://www.aminer.cn/pub/5edf5ddc91e011bc656defe2?conf=neurips2020
简介:
Recent advances in reinforcement learning (RL), especially when combined with expressive deep network function approximators, have produced promising results in domains ranging from robotics [31] to strategy games [4] and recommendation systems [37].
Offline RL algorithms based on this basic recipe suffer from action distribution shift [32, 62, 29, 36] during training, because the target values for Bellman backups in policy evaluation use actions sampled from the learned policy, πk, but the Q-function is trained only on actions sampled from the behavior policy that produced the dataset D, πβ.
The authors develop a conservative Q-learning (CQL) algorithm, such that the expected value of a policy under the learned Q-function lower-bounds its true value.
更详细了解各位大牛的NeurIPS2020论文,可以关注公众号或链接直达NeurIPS2020专题,最前沿的研究方向和最全面的论文数据等你来~
扫码了解更多NeurIPS2020会议信息
添加“小脉”微信,留言“NeurIPS”,即可加入【NeurIPS会议交流群】,与更多论文作者学习交流!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。