赞
踩
VX关注{晓理紫},每日更新论文,如感兴趣,请转发给有需要的同学,谢谢支持
如果你感觉对你有所帮助,请关注我,每日准时为你推送最新论文。
为了答谢各位网友的支持,从今日起免费为300名读者提供订阅主题论文服务,只需VX关注公号并回复{邮箱+论文主题}(如:123456@xx.com + chatgpt@large language model @LLM),主题必须是同一个领域,最多三个关键词。解释权归博主所有
分类:
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2312.12044v2
GitHub: https://github.com/corl-team/xland-minigrid|
中文摘要: 受XLand的多样性和深度以及MiniGrid的简单性和极简主义的启发,我们推出了XLand-MiniGrid,这是一套用于元强化学习研究的工具和网格世界环境。XLand-MiniGrid是用JAX编写的,它被设计成高度可扩展的,并且有可能在GPU或TPU加速器上运行,从而在有限的资源下实现大规模实验的民主化。除了环境之外,XLand-MiniGrid还提供了预采样基准,其中包含数百万个不同难度的独特任务和易于使用的基线,允许用户快速开始训练自适应代理。此外,我们还对缩放和泛化进行了初步分析,表明我们的基线能够在训练期间达到每秒数百万步,并验证了所提出的基准具有挑战性。
摘要: Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.03807v1
GitHub: https://github.com/dmksjfl/SEABO|
中文摘要: 离线强化学习(RL)因其能够从静态离线数据集学习并消除与环境交互的需要而备受关注。然而,线下RL的成功在很大程度上依赖于带有奖励标签的线下过渡。在实践中,我们经常需要手工制作奖励函数,这有时很困难,劳动密集型,或者效率低下。为了应对这一挑战,我们将重点放在离线模仿学习(IL)设置上,旨在获得基于专家数据和未标记数据的奖励函数。为此,我们提出了一种简单而有效的基于搜索的离线IL方法,标记为SEABO。SEABO在专家演示中为接近其最近邻居的转换分配较大的奖励,否则分配较小的奖励,所有这些都是以无监督学习的方式进行的。在各种D4RL数据集上的实验结果表明,在仅给定单个专家轨迹的情况下,SEABO可以实现与具有地面真实奖励的离线RL算法竞争的性能,并且可以在许多任务上优于先前的奖励学习和离线IL方法。此外,我们证明了如果专家演示只包含观察结果,SEABO也能很好地工作。我们的代码可以在https://github.com/dmksjfl/SEABO。
摘要: Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment. Nevertheless, the success of offline RL relies heavily on the offline transitions annotated with reward labels. In practice, we often need to hand-craft the reward function, which is sometimes difficult, labor-intensive, or inefficient. To tackle this challenge, we set our focus on the offline imitation learning (IL) setting, and aim at getting a reward function based on the expert data and unlabeled data. To that end, we propose a simple yet effective search-based offline IL method, tagged SEABO. SEABO allocates a larger reward to the transition that is close to its closest neighbor in the expert demonstration, and a smaller reward otherwise, all in an unsupervised learning manner. Experimental results on a variety of D4RL datasets indicate that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory, and can outperform prior reward learning and offline IL methods across many tasks. Moreover, we demonstrate that SEABO also works well if the expert demonstrations contain only observations. Our code is publicly available at https://github.com/dmksjfl/SEABO.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.04229v1
中文摘要: 我们提出了MusicRL,这是第一个根据人类反馈进行微调的音乐生成系统。对文本到音乐模型的欣赏是特别主观的,因为音乐性的概念以及标题背后的具体意图是依赖于用户的(例如,诸如“乐观的健身音乐”的标题可以映射到复古吉他独奏或电子流行节拍)。这不仅使这种模型的监督训练具有挑战性,而且还要求在部署后的微调中整合持续的人工反馈。MusicRL是离散音频令牌的预训练自回归MusicLM(阿戈斯蒂内利等人,2023)模型,通过强化学习进行微调,以最大化序列级奖励。我们在选定的评分者的帮助下设计了专门与文本依从性和音频质量相关的奖励函数,并使用这些函数将MusicLM微调为MusicRL-R。我们向用户部署MusicLM,并收集了包含300,000个成对偏好的大量数据集。使用来自人类反馈的强化学习(RLHF),我们训练了MusicRL-U,这是第一个大规模整合人类反馈的文本到音乐模型。人类评估表明,MusicRL-R和MusicRL-U都优于基线。最终,MusicRL-RU结合了这两种方法,并根据人类评分者得出最佳模型。消融研究揭示了影响人类偏好的音乐属性,表明文本坚持和质量只是其中的一部分。这强调了音乐欣赏中主观性的普遍存在,并呼吁人类听众进一步参与音乐生成模型的微调。
摘要: We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as “upbeat work-out music” can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2401.03301v2
中文摘要: 我们试图了解是什么促进了从历史数据集进行样本高效学习以进行顺序决策,这个问题通常被称为离线强化学习(RL)。此外,我们对在利用(值)函数近似的同时享受样本效率的算法感兴趣。在本文中,我们通过以下方式解决这些基本问题:(i)提出数据多样性的概念,该概念包含了离线RL中覆盖度量的先前概念;(ii)使用该概念{统一}基于版本空间(VS)、正则化优化(RO)和后验采样(PS)的三类不同的离线RL算法。我们确定了基于VS、基于RO和基于PS的算法,在标准假设下,实现了\emph{comparable}样本效率,这恢复了具有标准假设的有限和线性模型类的最新次最优性界限。这一结果令人惊讶,因为先前的工作表明,与基于VS的算法相比,基于RO的算法具有不利的样本复杂性,而后验采样由于其探索性而很少在离线RL中被考虑。值得注意的是,我们提出的用于离线RL的基于无模型PS的算法是{新颖}的,其次优性界限本质上是{frequentist}(即最坏情况)。
摘要: We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to {unify} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve \emph{comparable} sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sample complexity of the RO-based algorithm compared to the VS-based algorithm, whereas posterior sampling is rarely considered in offline RL due to its explorative nature. Notably, our proposed model-free PS-based algorithm for offline RL is {novel}, with sub-optimality bounds that are {frequentist} (i.e., worst-case) in nature.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.04182v1
中文摘要: 强化学习算法需要探索才能学习。然而,无监督的探索阻止了这种算法在安全关键任务上的部署,并限制了现实世界的部署。在本文中,我们提出了一种称为集成模型预测安全认证的新算法,该算法将基于模型的深度强化学习与基于管道的模型预测控制相结合,以纠正学习代理采取的行动,通过规划将违反安全约束的情况保持在最低限度。我们的方法旨在通过只需要由安全控制器生成的离线数据来减少关于实际系统的先验知识量。我们的结果表明,与类似的强化学习方法相比,我们可以实现明显更少的约束违反。
摘要: Reinforcement learning algorithms need exploration to learn. However, unsupervised exploration prevents the deployment of such algorithms on safety-critical tasks and limits real-world deployment. In this paper, we propose a new algorithm called Ensemble Model Predictive Safety Certification that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent, keeping safety constraint violations at a minimum through planning. Our approach aims to reduce the amount of prior knowledge about the actual system by requiring only offline data generated by a safe controller. Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.04168v1
中文摘要: 强化学习是一个非常活跃的研究领域,具有很好的进展。然而,在自动驾驶领域,通常会检查非常简单的场景。常见的方法使用不可解释的控制命令作为行动空间和缺乏结构的非结构化奖励设计。在这项工作中,我们介绍了知情强化学习,其中一个结构化的规则手册被整合为一个知识来源。我们学习轨迹,并通过情境感知奖励设计对其进行评估,从而产生动态奖励,允许代理学习需要受控交通规则异常的情境。我们的方法适用于任意RL模型。我们成功地用最新的基于模型的代理演示了复杂场景的高完成率。
摘要: Reinforcement Learning is a highly active research field with promising advancements. In the field of autonomous driving, however, often very simple scenarios are being examined. Common approaches use non-interpretable control commands as the action space and unstructured reward designs which lack structure. In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source. We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward which allows the agent to learn situations which require controlled traffic rule exceptions. Our method is applicable to arbitrary RL models. We successfully demonstrate high completion rates of complex scenarios with recent model-based agents.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.03647v1
中文摘要: 最近的进展引入了机器学习框架来增强用于求解混合整数线性规划(MILP)的分支和定界(B&B)分支策略。这些方法主要依靠强分支的模仿学习,表现出优异的性能。然而,收集专家样本进行模仿学习,特别是强分支学习,是一项耗时的工作。为了应对这一挑战,我们提出了使用\textbf{A}ugmented\textbf{M}ILPs进行转换学习,用于\textbf{Branch}ing(CAMBranch),这是一个通过将变量移位应用于原始MILPs的有限专家数据来生成增强MILPs(AMILPs)的框架。这种方法能够获得相当数量的标记专家样本。CAMBranch利用MILPs和AMILPs进行模仿学习,并采用对比学习来增强模型捕捉MILP特征的能力,从而提高分支决策的质量。实验结果表明,仅用10%的完整数据集训练的CAMBranch表现出优异的性能。消融研究进一步验证了我们方法的有效性。
摘要: Recent advancements have introduced machine learning frameworks to enhance the Branch and Bound (B&B) branching policies for solving Mixed Integer Linear Programming (MILP). These methods, primarily relying on imitation learning of Strong Branching, have shown superior performance. However, collecting expert samples for imitation learning, particularly for Strong Branching, is a time-consuming endeavor. To address this challenge, we propose \textbf{C}ontrastive Learning with \textbf{A}ugmented \textbf{M}ILPs for \textbf{Branch}ing (CAMBranch), a framework that generates Augmented MILPs (AMILPs) by applying variable shifting to limited expert data from their original MILPs. This approach enables the acquisition of a considerable number of labeled expert samples. CAMBranch leverages both MILPs and AMILPs for imitation learning and employs contrastive learning to enhance the model’s ability to capture MILP features, thereby improving the quality of branching decisions. Experimental results demonstrate that CAMBranch, trained with only 10% of the complete dataset, exhibits superior performance. Ablation studies further validate the effectiveness of our method.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.04206v1
中文摘要: 本文介绍了一个系统,该系统旨在为自主机器人在人机交互(HRI)中执行的动作生成解释。机器人学中的可解释性,封装在可解释自主机器人(XAR)的概念中,是一个不断增长的研究领域。本文描述的工作旨在利用大型语言模型(LLMs)在执行自然语言处理任务中的能力。本研究的重点是使用此类模型结合检索增强生成(RAG)方法生成解释的可能性,以解释从自治系统日志中收集的数据。此外,这项工作还提出了一个形式化的解释系统。它已经通过欧洲机器人联盟(ERL)的导航测试进行了评估,这是一项全欧洲的社会机器人竞赛。关于获得的结果,进行了验证问卷,从技术用户的角度衡量解释的质量。实验期间获得的结果强调了LLMs在实现机器人解释能力方面的潜在效用。
摘要: This paper introduces a system designed to generate explanations for the actions performed by an autonomous robot in Human-Robot Interaction (HRI). Explainability in robotics, encapsulated within the concept of an eXplainable Autonomous Robot (XAR), is a growing research area. The work described in this paper aims to take advantage of the capabilities of Large Language Models (LLMs) in performing natural language processing tasks. This study focuses on the possibility of generating explanations using such models in combination with a Retrieval Augmented Generation (RAG) method to interpret data gathered from the logs of autonomous systems. In addition, this work also presents a formalization of the proposed explanation system. It has been evaluated through a navigation test from the European Robotics League (ERL), a Europe-wide social robotics competition. Regarding the obtained results, a validation questionnaire has been conducted to measure the quality of the explanations from the perspective of technical users. The results obtained during the experiment highlight the potential utility of LLMs in achieving explanatory capabilities in robots.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.04070v1
中文摘要: 空中机器人有潜力在协助人类完成复杂而危险的任务方面发挥至关重要的作用。然而,未来的行业需要创新的解决方案来简化人类和无人机之间的交互过程,以实现无缝协作和高效的协同工作。在本文中,我们提出了一种新的远程沉浸框架,通过混合现实(MR)促进人类和机器人之间的认知和物理协作。该框架结合了一种新颖的双向空间感知和多模态虚拟——物理交互方法。前者无缝集成了物理和虚拟世界,提供了双向的以自我为中心和以外部为中心的环境表现。后者利用所提出的空间表示,进一步增强了将用于避障的机器人规划算法与可变导纳控制相结合的协作。这允许用户基于虚拟部队发出命令,同时保持与环境地图的兼容性。我们通过执行几个涉及无人机和配备MR耳机的用户的协作规划和探索任务来验证所提出的方法。
摘要: Aerial robots have the potential to play a crucial role in assisting humans with complex and dangerous tasks. Nevertheless, the future industry demands innovative solutions to streamline the interaction process between humans and drones to enable seamless collaboration and efficient co-working. In this paper, we present a novel tele-immersive framework that promotes cognitive and physical collaboration between humans and robots through Mixed Reality (MR). This framework incorporates a novel bi-directional spatial awareness and a multi-modal virtual-physical interaction approaches. The former seamlessly integrates the physical and virtual worlds, offering bidirectional egocentric and exocentric environmental representations. The latter, leveraging the proposed spatial representation, further enhances the collaboration combining a robot planning algorithm for obstacle avoidance with a variable admittance control. This allows users to issue commands based on virtual forces while maintaining compatibility with the environment map. We validate the proposed approach by performing several collaborative planning and exploration tasks involving a drone and an user equipped with a MR headset.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.03824v1
中文摘要: 我们建议将具体化人工智能作为追求人工通用智能的下一个基本步骤,将其与当前的人工智能进步,特别是大型语言模型进行对比。我们遍历了体现概念在不同领域的演变——哲学、心理学、神经科学和机器人学——以强调EAI如何区别于静态学习的经典范式。通过拓宽具身人工智能的范围,我们引入了一个基于认知架构的理论框架,强调感知、动作、记忆和学习是具身智能体的基本组成部分。这个框架与Friston的主动推理原则相一致,为EAI开发提供了一个全面的方法。尽管在人工智能领域取得了进展,但实质性的挑战仍然存在,如新的人工智能学习理论的形成和先进硬件的创新。我们的讨论为未来的人工智能研究奠定了基础。我们强调创造能够在现实世界环境中与人类和其他智能实体无缝通信、协作和共存的具体化人工智能代理的重要性,旨在引导人工智能社区应对多方面的挑战,并抓住未来在寻求AGI过程中的机遇。
摘要: We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston’s active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.00868v2
GitHub: https://github.com/SimarKareer/UnifiedVideoDA|
中文摘要: 在语义分割(DAS)的无监督域适应方面已经有大量的工作,寻求将在图像上训练的模型从标记的源域适应到未标记的目标域。虽然绝大多数先前的工作已经将此作为帧级图像-DAS问题来研究,但是少数视频-DAS工作已经寻求额外利用相邻帧中存在的时间信号。然而,视频DAS作品在历史上研究了一套不同于图像DAS的基准,交叉基准很少。在这项工作中,我们解决了这一差距。令人惊讶的是,我们发现(1)即使在仔细控制数据和模型架构后,最先进的图像-DAS方法(HRDA和HRDA+MIC)在已建立的视频-DAS基准上优于视频-DAS方法(在Viper → \rightarrow →CityscapesSeq上+14.5 mIoU,在Synthia → \rightarrow →CityscapesSeq上+19.0 mIoU),以及(2)图像-DAS和视频-DAS技术的简单组合仅导致跨数据集的边际改进。为了避免Image-DAS和Video-DAS之间的孤立进展,我们开源了我们的代码库,在一个通用基准上支持一套全面的Video-DAS和Image-DAS方法。代码可在https://github.com/simmarkareer/UnifiedVideoDA
摘要: There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper → \rightarrow →CityscapesSeq, +19.0 mIoU on Synthia → \rightarrow →CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.02474v2
GitHub: https://github.com/farnooshar/SpecUnIIS|
中文摘要: 深度谱方法通过使用自我监督学习提取特征并利用亲和矩阵的拉普拉斯算子获得特征段,将图像分解过程重新定义为图划分任务。然而,与深谱方法背景下的其他任务相比,实例分割受到的关注较少。本文解决了这样一个事实,即从自监督主干网提取的特征图的所有通道都不包含足够的信息用于实例分割目的。事实上,有些通道噪音很大,妨碍了任务的准确性。为了克服这一问题,本文提出了两个信道约简模块:噪声信道约简(NCR)和基于偏差的信道约简(DCR)。NCR保留具有较低熵的信道,因为它们不太可能是噪声的,而DCR修剪具有低标准偏差的信道,因为它们缺乏足够的信息用于有效的实例分割。此外,本文还证明了深谱方法中常用的点积不适合实例分割,因为它对特征图值很敏感,可能会导致不正确的实例分割。提出了一种新的相似性度量——切比雪夫上的Bray-Curtis(BoC)来解决这个问题。除了特征的值之外,它还考虑了特征的分布,为实例分割提供了更健壮的相似性度量。Youtube-VIS2019数据集上的定量和定性结果强调了通过提出的通道缩减方法和使用BoC代替传统的点积来创建亲和力矩阵所实现的改进。在联合和提取的实例段的平均交集方面观察到这些改进,证明了增强的实例分割性能。该代码可从以下网址获得:https://github.com/farnooshar/SpecUnIIS
摘要: Deep spectral methods reframe the image decomposition process as a graph partitioning task by extracting features using self-supervised learning and utilizing the Laplacian of the affinity matrix to obtain eigensegments. However, instance segmentation has received less attention compared to other tasks within the context of deep spectral methods. This paper addresses the fact that not all channels of the feature map extracted from a self-supervised backbone contain sufficient information for instance segmentation purposes. In fact, Some channels are noisy and hinder the accuracy of the task. To overcome this issue, this paper proposes two channel reduction modules: Noise Channel Reduction (NCR) and Deviation-based Channel Reduction (DCR). The NCR retains channels with lower entropy, as they are less likely to be noisy, while DCR prunes channels with low standard deviation, as they lack sufficient information for effective instance segmentation. Furthermore, the paper demonstrates that the dot product, commonly used in deep spectral methods, is not suitable for instance segmentation due to its sensitivity to feature map values, potentially leading to incorrect instance segments. A new similarity metric called Bray-Curtis over Chebyshev (BoC) is proposed to address this issue. It takes into account the distribution of features in addition to their values, providing a more robust similarity measure for instance segmentation. Quantitative and qualitative results on the Youtube-VIS2019 dataset highlight the improvements achieved by the proposed channel reduction methods and the use of BoC instead of the conventional dot product for creating the affinity matrix. These improvements are observed in terms of mean Intersection over Union and extracted instance segments, demonstrating enhanced instance segmentation performance. The code is available on: https://github.com/farnooshar/SpecUnIIS
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2306.05418v2
Project: https://ba2det.site|https://ba2det.site|
中文摘要: 随着大型模型的快速发展,对数据的需求变得越来越重要。特别是在3D对象检测中,昂贵的手动注释阻碍了进一步的发展。为了减轻标注的负担,我们研究了仅基于2D标注实现三维目标检测的问题。由于先进的3D重建技术,现在重建整个静态3D场景是可行的。然而,从整个场景中提取精确的对象级注释并将这些有限的注释推广到整个场景仍然是挑战。在本文中,我们介绍了一个新的范式称为BA 2 ^2 2-Det,包括伪标签生成和多阶段泛化。我们设计了双聚类算法,从重建的场景级点获得对象聚类,并通过开发三个泛化阶段来进一步增强模型的检测能力:从完全到部分,从静态到动态,从近到远。在大规模Waymo开放数据集上进行的实验表明,BA 2 ^2 2-Det的性能与使用10%注释的完全监督方法相当。此外,使用大型原始视频进行预训练,BA 2 ^2 2-Det可以在KITTI数据集上实现20%的相对改进。该方法对于复杂场景中的开放集三维目标检测也具有很大的潜力。项目页面:https://ba2det.site。
摘要: With the rapid development of large models, the need for data has become increasingly crucial. Especially in 3D object detection, costly manual annotations have hindered further advancements. To reduce the burden of annotation, we study the problem of achieving 3D object detection solely based on 2D annotations. Thanks to advanced 3D reconstruction techniques, it is now feasible to reconstruct the overall static 3D scene. However, extracting precise object-level annotations from the entire scene and generalizing these limited annotations to the entire scene remain challenges. In this paper, we introduce a novel paradigm called BA 2 ^2 2-Det, encompassing pseudo label generation and multi-stage generalization. We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model’s detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant. Experiments conducted on the large-scale Waymo Open Dataset show that the performance of BA 2 ^2 2-Det is on par with the fully-supervised methods using 10% annotations. Additionally, using large raw videos for pretraining,BA 2 ^2 2-Det can achieve a 20% relative improvement on the KITTI dataset. The method also has great potential for detecting open-set 3D objects in complex scenes. Project page: https://ba2det.site.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.02430v2
GitHub: https://github.com/zhouhuan-hust/LFD-RoadSeg|
中文摘要: 在嵌入式平台上实现实时性和准确性一直是道路分割方法的追求。为此,他们提出了许多轻量级网络。然而,他们忽略了一个事实,即道路是“东西”(背景或环境元素)而不是“东西”(特定的可识别对象),这启发我们探索用低级而不是高级特征来表示道路的可行性。令人惊讶的是,我们发现主流网络模型的初级阶段足以代表道路的大多数像素进行分割。受此启发,我们提出了一种低级特征主导的道路分割网络(LFD-RoadSeg)。具体来说,LFD-RoadSeg采用双边结构。通过ResNet-18的第一阶段,首先设计了空间细节分支来提取道路的低层特征表示。为了抑制低层特征中被误认为道路的无纹理区域,然后设计上下文语义分支来快速提取上下文特征。为此,在第二个分支中,我们对输入图像进行不对称下采样,并设计了一个聚合模块,以实现与ResNet-18的第三阶段相当的感受野,但时间消耗更少。最后,为了从低层特征中分割道路,提出了一种选择性融合模块来计算低层表示和上下文特征之间的像素级注意力,并通过这种注意力抑制非道路低层响应。在KITTI-Road上,LFD-RoadSeg实现了95.21%的最大F1测量值(MaxF)和93.71%的平均精度,同时在单个TITAN Xp上达到238 FPS,在Jetson TX2上达到54 FPS,所有这些都具有仅936k参数的紧凑模型尺寸。源代码可从https://github.com/zhouhuan-hust/LFD-RoadSeg获得。
摘要: Achieving real-time and accuracy on embedded platforms has always been the pursuit of road segmentation methods. To this end, they have proposed many lightweight networks. However, they ignore the fact that roads are “stuff” (background or environmental elements) rather than “things” (specific identifiable objects), which inspires us to explore the feasibility of representing roads with low-level instead of high-level features. Surprisingly, we find that the primary stage of mainstream network models is sufficient to represent most pixels of the road for segmentation. Motivated by this, we propose a Low-level Feature Dominated Road Segmentation network (LFD-RoadSeg). Specifically, LFD-RoadSeg employs a bilateral structure. The spatial detail branch is firstly designed to extract low-level feature representation for the road by the first stage of ResNet-18. To suppress texture-less regions mistaken as the road in the low-level feature, the context semantic branch is then designed to extract the context feature in a fast manner. To this end, in the second branch, we asymmetrically downsample the input image and design an aggregation module to achieve comparable receptive fields to the third stage of ResNet-18 but with less time consumption. Finally, to segment the road from the low-level feature, a selective fusion module is proposed to calculate pixel-wise attention between the low-level representation and context feature, and suppress the non-road low-level response by this attention. On KITTI-Road, LFD-RoadSeg achieves a maximum F1-measure (MaxF) of 95.21% and an average precision of 93.71%, while reaching 238 FPS on a single TITAN Xp and 54 FPS on a Jetson TX2, all with a compact model size of just 936k parameters. The source code is available at https://github.com/zhouhuan-hust/LFD-RoadSeg.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2402.03708v1
GitHub: https://github.com/Justlovesmile/SISP|
中文摘要: 卫星图像中的细粒度船舶实例分割对于监测海上活动具有相当重要的意义。然而,现有的数据集往往缺乏细粒度信息或像素定位注释,以及图像多样性和变化不足,从而限制了这项任务的研究。为此,我们提出了一个用于全色卫星图像中细粒度船舶实例分割的基准数据集,即SISP,它包含56,693个注释良好的船舶实例,具有10,000幅切片图像中的四个细粒度类别,所有图像都是从分辨率为0.5 m的SuperView-1卫星收集的。所提出的SISP数据集中的目标具有与真实卫星场景一致的特征,例如高度的类不平衡、各种场景、目标密度和尺度的大变化以及高度的类间相似性和类内多样性,所有这些都使得SISP数据集更适合真实世界的应用。此外,我们引入了一种动态特征细化辅助实例分割网络DFRInst作为卫星图像中船舶实例分割的基准方法,它可以加强关键特征的显式表示,从而提高船舶实例分割的性能。对提议的SISP数据集进行实验和分析,以评估基准方法和几种最先进的方法,从而为促进未来的研究建立基线。拟议的数据集和源代码可在以下网址查阅:https://github.com/Justlovesmile/SISP。
摘要: Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark dataset for fine-grained Ship Instance Segmentation in Panchromatic satellite images, namely SISP, which contains 56,693 well-annotated ship instances with four fine-grained categories across 10,000 sliced images, and all the images are collected from SuperView-1 satellite with the resolution of 0.5m. Targets in the proposed SISP dataset have characteristics that are consistent with real satellite scenes, such as high class imbalance, various scenes, large variations in target densities and scales, and high inter-class similarity and intra-class diversity, all of which make the SISP dataset more suitable for real-world applications. In addition, we introduce a Dynamic Feature Refinement-assist Instance segmentation network, namely DFRInst, as the benchmark method for ship instance segmentation in satellite images, which can fortify the explicit representation of crucial features, thus improving the performance of ship instance segmentation. Experiments and analysis are performed on the proposed SISP dataset to evaluate the benchmark method and several state-of-the-art methods to establish baselines for facilitating future research. The proposed dataset and source codes will be available at: https://github.com/Justlovesmile/SISP.
PubTime: 2024-02-06
Downlink: http://arxiv.org/abs/2310.10410v3
中文摘要: 当前用于从图像和视频中合成场景分割的面向时隙的方法依赖于提供的背景信息或时隙分配。我们提出了一个分段的位置和身份跟踪系统,Loci-Segmented(Loci-s),它不需要这些信息中的任何一个。它学习将场景动态分割成可解释的背景和基于槽的对象编码,为每个编码分离rgb、遮罩、位置和深度信息。结果揭示了在MOVi数据集和另一个以场景分割为目标的已建立数据集集合中非常优越的视频分解性能。该系统可很好解释的组合潜在编码可以作为下游任务的基础模型。
摘要: Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system’s well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.
关注{晓理紫|小李子},每日更新论文,如感兴趣,请转发给有需要的同学,谢谢支持
如果你感觉对你有所帮助,请关注我,每日准时为你推送最新论文。
为了答谢各位网友的支持,从今日起免费为300名读者提供订阅主题论文服务,只需VX关注公号并回复{邮箱+论文主题}(如:123456@xx.com + chatgpt@large language model @LLM),主题必须是同一个领域,最多三个关键词。解释权归博主所有
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。