当前位置:   article > 正文

Gamba:将高斯溅射与Mamba结合用于单视图3D重建_gamba: marry gaussian splatting with mamba for sin

gamba: marry gaussian splatting with mamba for single-view 3d reconstruction

Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction
Gamba:将高斯溅射与Mamba结合用于单视图3D重建

Qiuhong Shen11  Xuanyu Yi31 Zike Wu31  Pan Zhou2,42 Hanwang Zhang3,5
沈秋红 1 易轩宇 3 吴子可 3 潘周 2,4 2 张汉旺 3,5Shuicheng Yan5 Xinchao Wang12
严水成 5 王新潮 1 2
1National University of Singapore 2Singapore Management University
1 新加坡国立大学 2 新加坡管理大学
3Nanyang Technological University  4Sea AI Lab 5Skywork AI
3 南洋理工大学 4 Sea AI Lab 5 Skywork AI
Abstract 摘要    Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction

We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU.
随着对自动化3D内容创建管道的需求不断增长,我们应对了从单个图像有效重建3D资产的挑战。以前的方法主要依赖于分数蒸馏采样(SDS)和神经辐射场(NeRF)。尽管这些方法取得了显著的成功,但由于冗长的优化和相当大的内存使用,这些方法遇到了实际的限制。在这份报告中,我们介绍了Gamba,一种基于单视图图像的端到端摊销3D重建模型,强调了两个主要观点:(1)3D表示:利用大量3D高斯进行高效的3D高斯溅射过程;(2)主干设计:引入了基于Mamba的顺序网络,该网络有助于上下文相关的推理和序列(令牌)长度的线性可扩展性,容纳了大量的高斯人Gamba在数据预处理、正则化设计和训练方法方面取得了重大进展。 我们评估了Gamba对现有的基于优化和前馈的3D生成方法,使用真实世界的扫描OmniObject 3D数据集。在这里,Gamba展示了具有竞争力的生成能力,无论是质量还是数量,同时在单个NVIDIA A100 GPU上实现了约0.6秒的卓越速度。

3Work in progress, partially done in Sea AI Lab and 2050 Research, Skywork AI
正在进行的工作,部分在Sea AI Lab和2050 Research,Skywork AI完成

1Introduction 一、导言

We tackle the challenge of efficiently extracting a 3D asset from a single image, an endeavor with substantial implications across diverse industrial sectors. This endeavor facilitates AR/VR content generation from a single snapshot and aids in the development of autonomous vehicle path planning through monocular perception Sun et al. (2023); Gul et al. (2019); Yi et al. (2023).
我们解决了从单个图像中有效提取3D资产的挑战,这是一项对不同工业部门具有重大影响的奋进。这一奋进有助于从单个快照生成AR/VR内容,并有助于通过单目感知开发自动驾驶车辆路径规划Sun et al.(2023); Gul et al.(2019); Yi et al.(2023)。

Previous approaches to single-view 3D reconstruction have mainly been achieved through Score Distillation Sampling (SDS) Poole et al. (2022), which leverages pre-trained 2D diffusion models Graikos et al. (2022); Rombach et al. (2022) to guide optimization of the underlying representations of 3D assets. These optimization-based approaches have achieved remarkable success, known for their high-fidelity and generalizability. However, they require a time-consuming per-instance optimization process Tang (2022); Wang et al. (2023d); Wu et al. (2024) to generate a single object and also suffer from artifacts such as the “multi-face” problem arising from bias in pre-trained 2D diffusion models Hong et al. (2023a). On the other hand, previous approaches predominantly utilized neural radiance fields (NeRF) Mildenhall et al. (2021); Barron et al. (2021), which are equipped with high-dimensional multi-layer perception (MLP) and inefficient volume rendering Mildenhall et al. (2021). This computational complexity significantly limits practical applications on limited compute budgets. For instance, the Large reconstruction Model (LRM) Hong et al. (2023b) is confined to a resolution of 32 using a triplane-NeRF Shue et al. (2023) representation, and the resolution of renderings is limited to 128 due to the bottleneck of online volume rendering.
以前的单视图3D重建方法主要是通过分数蒸馏采样(SDS)Poole et al.(2022)实现的,该方法利用预先训练的2D扩散模型Graikos et al.(2022); Rombach et al.(2022)来指导3D资产底层表示的优化。这些基于优化的方法已经取得了显著的成功,以其高保真度和通用性而闻名。然而,它们需要耗时的每个实例优化过程Tang(2022); Wang等人(2023 d); Wu等人(2024)来生成单个对象,并且还遭受伪影,例如由预训练的2D扩散模型中的偏差引起的“多面”问题Hong等人(2023 a)。另一方面,以前的方法主要利用神经辐射场(NeRF)Mildenhall et al.(2021);巴伦et al.(2021),其配备了高维多层感知(MLP)和低效的体绘制Mildenhall et al.(2021)。 这种计算复杂性极大地限制了有限计算预算的实际应用。例如,大型重建模型(LRM)Hong等人(2023 b)使用三平面NeRF Shue等人(2023)表示被限制为32的分辨率,并且由于在线体绘制的瓶颈,渲染的分辨率被限制为128。

Refer to caption

Figure 1:(a): We propose Gamba, an end-to-end, feed-forward single-view reconstruction pipeline, which marries 3D Gaussian Splatting with Mamba to achieve fast reconstruction. (b): The relationship between the 3DGS generation process and the Mamba sequential predicting pattern.
图1:(a):我们提出了Gamba,一个端到端的前馈单视图重建管道,它将3D高斯溅射与Mamba结合在一起,以实现快速重建。(b)3DGS生成过程与Mamba序列预测模式之间的关系。

To address these challenges and thus achieve efficient single-view 3D reconstruction, we are seeking an amortized generative framework with the groundbreaking 3D Gaussian Splatting, notable for its memory-efficient and high-fidelity tiled rendering Kerbl et al. (2023); Zwicker et al. (2002); Chen & Wang (2024); Wang et al. (2024). Despite recent exciting progress Tang et al. (2023), how to properly and immediately generate 3D Gaussians remains a less studied topic. Recent prevalent 3D amortized generative models Hong et al. (2023b); Wang et al. (2023b); Xu et al. (20242023); Zou et al. (2023); Li et al. (2023) predominantly use transformer-based architecture as their backbones Vaswani et al. (2017); Peebles & Xie (2023), but we argue that these widely used architectures are sub-optimal for generating 3DGS. The crucial challenge stems from the fact that 3DGS requires a sufficient number of 3D Gaussians to accurately represent a 3D model or scene. However, the spatio-temporal complexity of Transformers increases quadratic-ally with the number of tokens Vaswani et al. (2017), which limits the expressiveness of the 3DGS due to the insufficient token counts for 3D Gaussians. Furthermore, the 3DGS parameters possess specific physical meanings, making the simultaneous generation of 3DGS parameters a more challenging task.
为了解决这些挑战,从而实现有效的单视图3D重建,我们正在寻求一个具有开创性的3D高斯溅射的摊销生成框架,以其高效的内存和高保真度平铺渲染而闻名Kerbl et al.(2023); Zwicker et al.(2002); Chen & Wang(2024); Wang et al.(2024)。尽管最近取得了令人兴奋的进展,但如何正确和立即生成3D高斯仍然是一个研究较少的话题。最近流行的3D摊销生成模型Hong et al.(2023 b); Wang et al.(2023 b); Xu et al.(2024; 2023); Zou et al.(2023); Li et al.(2023)主要使用基于transformer的架构作为其主干Vaswani et al.(2017);皮布尔斯和谢(2023),但我们认为,这些广泛使用的架构是次优的生成3DGS。关键的挑战来自于这样一个事实,即3DGS需要足够数量的3D高斯来准确地表示3D模型或场景。 然而,变形金刚的时空复杂度随着令牌数量的二次方增加Vaswani et al.(2017),由于3D高斯的令牌数量不足,这限制了3DGS的表现力。此外,3DGS参数具有特定的物理意义,使得同时生成3DGS参数成为更具挑战性的任务。

To tackle the above challenges, we start by revisiting the 3DGS reconstruction process from multi-view images. The analysis presented in fig 1(b) reveals that 3DGS densification during the reconstruction process can be conceptualized as a sequential generation based on previously generated tokens. With this insight, we introduce a novel architecture for end-to-end 3DGS generation dubbed Gaussian Mamba (Gamba), which is built upon a new scalable sequential network, Mamba Gu & Dao (2023a). Our Gamba enables context-dependent reasoning and scales linearly with sequence (token) length, allowing it to efficiently mimic the inherent process of 3DGS reconstruction when generating 3D assets enriched with a sufficient number of 3D Gaussians. Due to its feed-forward architecture and efficient rendering, Gamba is exceptionally fast, requiring only about 1 seconds to generate a 3D asset and 6 ms for novel view synthesis, which is 5000× faster than previous optimization-based methods Wu et al. (2024); Weng et al. (2023); Qian et al. (2023) while achieving comparable generation quality.
为了应对上述挑战,我们首先从多视图图像重新审视3DGS重建过程。图1(B)中呈现的分析揭示了重建过程期间的3DGS致密化可以被概念化为基于先前生成的令牌的顺序生成。有了这个见解,我们介绍了一种用于端到端3DGS生成的新架构,称为Gaussian Mamba(Gamba),它建立在一个新的可扩展顺序网络Mamba Gu & Dao(2023a)上。我们的Gamba支持上下文相关的推理,并与序列(令牌)长度呈线性关系,使其能够在生成富含足够数量的3D高斯的3D资产时有效地模拟3DGS重建的固有过程。由于其前馈架构和高效渲染,Gamba速度非常快,仅需约1秒即可生成3D资产,6 ms即可进行新颖的视图合成,比之前基于优化的方法快5000倍。 (2024); Weng等人(2023); Qian等人(2023),同时实现了相当的世代质量。

We demonstrate the superiority of Gamba on the OmniObject3D dataset Wu et al. (2023). Both qualitative and quantitative experiments clearly indicate that Gamba can instantly generate high-quality and diverse 3D assets from a single image, continuously outperforming other state-of-the-art methods. In summary, we make three-fold contributions:
我们证明了Gamba在OmniObject3D数据集Wu et al.(2023)上的优越性。定性和定量实验都清楚地表明,Gamba可以从单个图像立即生成高质量和多样化的3D资产,持续优于其他最先进的方法。总而言之,我们做出了三方面的贡献:

  • • 

    We introduce GambaFormer, a simple state space model to process 3D Gaussian Splatting, which has global receptive fields with linear complexity.


    ·我们引入了GambaFormer,一个简单的状态空间模型来处理3D高斯溅射,它具有线性复杂度的全局感受野。
  • • 

    Integrated with the GambaFormer, we present Gamba, an amortized, end-to-end 3D Gaussian Splatting Generation pipeline for fast and high-quality single-view reconstruction.


    ·与GambaFormer集成,我们提出了Gamba,这是一个摊销的端到端3D高斯溅射生成流水线,用于快速和高质量的单视图重建。
  • • 

    Extensive experiments show that Gamba outperforms the state-of-the-art baselines in terms of reconstruction quality and speed.


    广泛的实验表明,Gamba在重建质量和速度方面优于最先进的基线。

2Related Works 2相关作品

Amortized 3D Generation. Amortized 3D generation is able to instantly generate 3D assets in a feed-forward manner after training on large-scale 3D datasets Wu et al. (2023); Deitke et al. (2023); Yu et al. (2023), in contrast to tedious SDS-based optimization methods Wu et al. (2024); Lin et al. (2023); Weng et al. (2023); Guo et al. (2023); Tang (2022). Previous works Nichol et al. (2022); Nash et al. (2020) married de-noising diffusion models with various 3D explicit representations (e.g., point cloud and mesh), which suffers from lack of generalizablity and low texture quality. Recently, pioneered by LRM Hong et al. (2023b), several works utilize the capacity and scalability of the transformer Peebles & Xie (2023) and propose a full transformer-based regression model to decode a NeRF representation from triplane features. The following works extend LRM to predict multi-view images Li et al. (2023), combine with diffusion Xu et al. (2023), and pose estimation Wang et al. (2023b). However, their triplane NeRF-based representation is restricted to inefficient volume rendering and relatively low resolution with blurred textures. Gamba instead seeks to train an efficient feed-forward model marrying Gaussian splatting with Mamba for single-view 3D reconstruction.
分期付款的3D生成。分期3D生成能够在大规模3D数据集上训练后以前馈方式立即生成3D资产Wu et al.(2023);Deitke et al.(2023);Yu et al.(2023),与繁琐的基于SDS的优化方法Wu et al.(2024);Lin et al.(2023);Weng et al.(2023)形成对比。Guo等人(2023);Tang(2022)。Nichol等人(2022);Nash等人(2020)将去噪扩散模型与各种3D显式表示(例如,点云和网格),其遭受缺乏概括性和低纹理质量。最近,由LRM Hong等人(2023 b)开创的几项工作利用了Transformer皮布尔斯和Xie(2023)的容量和可扩展性,并提出了一个完整的基于transformer的回归模型来从三平面特征中解码NeRF表示。以下工作扩展LRM以预测多视图图像Li et al.(2023),联合收割机与扩散Xu et al.(2023),以及姿态估计Wang et al. (2023年b)。然而,他们的三平面NeRF为基础的表示是有限的,效率低下的体积渲染和相对较低的分辨率与模糊的纹理。相反,Gamba试图训练一个有效的前馈模型,将高斯溅射与Mamba结合起来,用于单视图3D重建。

Gaussian Splatting for 3D Generation. The explicit nature of 3DGS facilitates real-time rendering capabilities and unprecedented levels of control and editability, making it highly relevant for 3D generation. Several works have effectively utilized 3DGS in conjunction with optimization-based 3D generation Wu et al. (2024); Poole et al. (2022); Lin et al. (2023). For example, DreamGaussian Tang et al. (2023) utilizes 3D Gaussian as an efficient 3D representation that supports real-time high-resolution rendering via rasterization. Despite the acceleration achieved, generating high-fidelity 3D Gaussians using such optimization-based methods still requires several minutes and a large computational memory demand. TriplaneGaussian Zou et al. (2023) extends the LRM architecture with a hybrid triplane-Gaussian representation. AGG Xu et al. (2024) decomposes the geometry and texture generation task to produce coarse 3D Gaussians, further improving its fidelity through Gaussian Super Resolution. Splatter image Szymanowicz et al. (2023) and PixelSplat Charatan et al. (2023) propose to predict 3D Gaussians as pixels on the output feature map of two-view images. LGM Tang et al. (2024) generates high-resolution 3D Gaussians by fusing information from multi-view images generated by existing multi-view diffusion models Shi et al. (2023); Wang & Shi (2023) with an asymmetric U-Net. Among them, our Gamba demonstrates its superiority and structural elegance with single image as input and an end-to-endsingle-stage, feed-forward manner.
用于3D生成的高斯溅射。3DGS的显式性质促进了实时渲染能力和前所未有的控制和可编辑性水平,使其与3D生成高度相关。几项工作已经有效地利用了3DGS与基于优化的3D生成Wu et al.(2024); Poole et al.(2022); Lin et al.(2023)。例如,DreamGaussian Tang等人(2023)利用3D高斯作为有效的3D表示,通过光栅化支持实时高分辨率渲染。尽管实现了加速,但使用这种基于优化的方法生成高保真3D高斯仍然需要几分钟和大量的计算内存需求。TriplaneGaussian Zou et al.(2023)使用混合三平面高斯表示扩展了LRM架构。Aug Xu et al. (2024)分解几何和纹理生成任务以产生粗略的3D高斯,通过高斯超分辨率进一步提高其保真度。飞溅图像Szymanowicz等人(2023)和PixelSplat Charatan等人(2023)提出将3D高斯预测为双视图图像的输出特征图上的像素。LGM Tang et al.(2024)通过融合来自现有多视图扩散模型生成的多视图图像的信息来生成高分辨率3D高斯模型Shi et al.(2023); Wang & Shi(2023)使用非对称U网。其中,我们的Gamba展示了其优越性和结构优雅的单一图像作为输入和端到端,单级,前馈的方式。

State Space Models. Utilizing ideas from the control theory Glasser (1985), the integration of linear state space equations with deep learning has been widely employed to tackle the modeling of sequential data. The promising property of linearly scaling with sequence length in long-range dependency modeling has attracted great interest from searchers. Pioneered by LSSL Gu et al. (2021b) and S4 Gu et al. (2021a), which utilize linear state space equations for sequence data modeling, follow-up works mainly focus on memory efficiency Gu et al. (2021a), fast training speed Gu et al. (2022ba) and better performance Mehta et al. (2022); Wang et al. (2023a). More recently, Mamba Gu & Dao (2023b) integrates a selective mechanism and efficient hardware design, outperforms Transformers Vaswani et al. (2017) on natural language and enjoys linear scaling with input length. Building on the success of Mamba, Vision Mamba Zhu et al. (2024) and VMamba Liu et al. (2024) leverage the bidirectional Vim Block and the Cross-Scan Module respectively to gain data-dependent global visual context for visual representation; U-Mamba Ma et al. (2024) and Vm-unet Ruan & Xiang (2024) further bring Mamba into the field of medical image segmentation. PointMamba Liang et al. (2024a) and Point Cloud Mamba  Zhang et al. (2024) adapt Mamba for point cloud understanding through reordering and serialization strategy. In this manuscript, we explore the capabilities of Mamba in single-view 3D reconstruction and introduce Gamba.
状态空间模型利用控制理论Glasser(1985)的思想,线性状态空间方程与深度学习的集成已被广泛用于处理序列数据的建模。在长距离依赖建模中,序列长度的线性缩放特性吸引了研究者的极大兴趣。以LSSL Gu et al.(2021 b)和S4 Gu et al.(2021 a)为先驱,利用线性状态空间方程进行序列数据建模,后续工作主要集中在内存效率Gu et al.(2021 a)、快速训练速度Gu et al.(2022 b; a)和更好的性能Mehta et al.(2022); Wang et al.(2023 a)。最近,Mamba Gu & Dao(2023 b)集成了选择机制和高效的硬件设计,在自然语言方面优于Transformers Vaswani et al.(2017),并享有输入长度的线性缩放。在Mamba成功的基础上,Vision Mamba Zhu et al.(2024)和VMamba Liu et al. (2024)分别利用双向Vim块和交叉扫描模块来获得用于视觉表示的数据依赖的全局视觉上下文; U-Mamba Ma等人(2024)和Vm-unet Ruan & Xiang(2024)进一步将Mamba引入医学图像分割领域。PointMamba Liang等人(2024 a)和Point Cloud Mamba Zhang等人(2024)通过重新排序和序列化策略调整Mamba以用于点云理解。在这篇手稿中,我们探讨了Mamba在单视图3D重建中的功能,并介绍了Gamba。

3Preliminary 3初步

3.13D Gaussian Splatting 3.13D高斯溅射

3D Gaussian Splatting (3DGS) Kerbl et al. (2023) has gained prominence as an efficient explicit 3D representation, using anisotropic 3D Gaussians to achieve intricate modeling. Each Gaussian, denoted as �, is defined by its mean �∈ℝ3, covariance matrix Σ, associated color �∈ℝ3, and opacity �∈ℝ. To be better optimized, the covariance matrix Σ is constructed from a scaling matrix �∈ℝ3 and a rotation matrix �∈ℝ3×3 as follows:
3D高斯溅射(3DGS)Kerbl等人(2023)作为一种有效的显式3D表示方法而获得了突出地位,使用各向异性3D高斯来实现复杂的建模。表示为 � 的每个高斯由其均值 �∈ℝ3 、协方差矩阵 Σ 、相关联的颜色 �∈ℝ3 和不透明度 �∈ℝ 定义。为了更好地优化,协方差矩阵 Σ 由缩放矩阵 �∈ℝ3 和旋转矩阵 �∈ℝ3×3 构造如下:

Σ=�⁢�⁢��⁢��.(1)

This formulation allows for the optimization of Gaussian parameters separately while ensuring that Σ remains positive semi-definite. A Gaussian with mean � is defined as follows:
该公式允许单独优化高斯参数,同时确保 Σ 保持半正定。平均值为 � 的高斯分布定义如下:

�⁢(�)=exp⁡(−12⁢��⁢Σ−1⁢�),(2)

where � represents the offset from � to a given point �. In the blending phase, the color accumulation � is calculated by:
其中 � 表示从 � 到给定点 � 的偏移。在混合阶段中,颜色累积 � 通过下式计算:

�=∑�∈���⁢��⁢�⁢(��)⁢∏�=1�−1(1−��⁢�⁢(��)).(3)

3DGS utilizes a tile-based rasterizer to facilitate real-time rendering and integrates Gaussian parameter optimization with a dynamic density control strategy. This approach allows for the modulation of Gaussian counts through both densification and pruning operations.
3DGS利用基于瓦片的光栅化器来促进实时渲染,并将高斯参数优化与动态密度控制策略相结合。这种方法允许通过致密化和修剪操作来调制高斯计数。

3.2State Space Models 3.2状态空间模型

State Space Models (SSMs) Gu et al. (2021a) have emerged as a powerful tool for modeling and analyzing complex physical systems, particularly those that exhibit linear time-invariant (LTI) behavior. The core idea behind SSMs is to represent a system using a set of first-order differential equations that capture the dynamics of the system’s state variables. This representation allows for a concise and intuitive description of the system’s behavior, making SSMs well-suited for a wide range of applications. The general form of an SSM can be expressed as follows:
状态空间模型(State Space Models,SSM)Gu et al.(2021 a)已经成为建模和分析复杂物理系统的强大工具,特别是那些表现出线性时不变(LTI)行为的系统。SSM背后的核心思想是使用一组一阶微分方程来表示系统,这些微分方程捕获系统状态变量的动态。这种表示允许对系统的行为进行简洁和直观的描述,使SSM非常适合广泛的应用。SSM的一般形式可以表示如下:

ℎ˙⁢(�)=�⁢ℎ⁢(�)+�⁢�⁢(�),(4)
�⁢(�)=�⁢ℎ⁢(�)+�⁢�⁢(�).

where ℎ⁢(�) denotes the state vector of the system at time �, while ℎ˙⁢(�) denotes its time derivative. The matrices �, �, �, and � encode the relationships between the state vector, the input signal �⁢(�), and the output signal �⁢(�). These matrices play a crucial role in determining the system’s response to various inputs and its overall behavior.
其中 ℎ⁢(�) 表示系统在时间 � 的状态向量,而 ℎ˙⁢(�) 表示其时间导数。矩阵 � 、 � 、 � 和 � 对状态向量、输入信号 �⁢(�) 和输出信号 �⁢(�) 之间的关系进行编码。这些矩阵在确定系统对各种输入的响应及其整体行为方面起着至关重要的作用。

One of the challenges in applying SSMs to real-world problems is that they are designed to operate on continuous-time signals, whereas many practical applications involve discrete-time data. To bridge this gap, it is necessary to discretize the SSM, converting it from a continuous-time representation to a discrete-time one. The discretized form of an SSM can be written as:
将SSM应用于现实世界问题的挑战之一是,它们被设计为对连续时间信号进行操作,而许多实际应用涉及离散时间数据。为了弥合这一差距,有必要离散SSM,将其从连续时间表示转换为离散时间表示。SSM的离散化形式可以写为:

ℎ�=�¯⁢ℎ�−1+�¯⁢��,(5)
��=�¯⁢ℎ�+�¯⁢��.

Here, � represents the discrete time step, and the matrices �¯, �¯, �¯, and �¯ are the discretized counterparts of their continuous-time equivalents. The discretization process involves sampling the continuous-time input signal �⁢(�) at regular intervals, with a sampling period of Δ. This leads to the following relationships between the continuous-time and discrete-time matrices:
这里, � 表示离散时间步长,并且矩阵 �¯ 、 �¯ 、 �¯ 和 �¯ 是它们的连续时间等价物的离散化对应物。离散化处理涉及以规则间隔对连续时间输入信号 �⁢(�) 进行采样,采样周期为 Δ 。这导致了连续时间矩阵和离散时间矩阵之间的以下关系:

�¯=(�−Δ/2⋅�)−1⁢(�+Δ/2⋅�),(6)
�¯=(�−Δ/2⋅�)−1⁢Δ⁢�,
�¯=�.

Selective State Space Models Gu & Dao (2023a) are proposed to address the limitations of traditional SSMs in adapting to varying input sequences and capturing complex, input-dependent dynamics. The key innovation in Selective SSMs is the introduction of a selection mechanism that allows the model to efficiently select data in an input-dependent manner, enabling it to focus on relevant information and ignore irrelevant inputs. The selection mechanism is implemented by parameterizing the SSM matrices �¯, �¯, and Δ based on the input ��. This allows the model to dynamically adjust its behavior depending on the input sequence, effectively filtering out irrelevant information and remembering relevant information indefinitely.
Gu & Dao(2023a)提出了选择性状态空间模型,以解决传统SSM在适应不同输入序列和捕获复杂的输入依赖动态方面的局限性。选择性SSM的关键创新是引入了一种选择机制,允许模型以依赖于输入的方式有效地选择数据,使其能够专注于相关信息并忽略不相关的输入。通过基于输入 �� 参数化SSM矩阵 �¯ 、 �¯ 和 Δ 来实现选择机制。这允许模型根据输入序列动态调整其行为,有效地过滤掉不相关的信息并无限期地记住相关信息。

4Method 4方法

In this section, we detail our proposed single-view 3D reconstruction pipeline with 3D Gaussian Splatting (Fig. 2), called “Gamba”, whose core mechanism is the GambaFormer to predict 3D Gaussian from a single input image (Section 4.2). We design an elaborate Gaussian parameter constrain robust training pipeline (Section 4.3) to ensure stability and high quality.
在本节中,我们详细介绍了我们提出的具有3D高斯溅射的单视图3D重建管道(图2),称为“Gamba”,其核心机制是GambaFormer,用于从单个输入图像预测3D高斯(第4.2节)。我们设计了一个精心设计的高斯参数约束鲁棒训练管道(第4.3节),以确保稳定性和高质量。

4.1Overall Training Pipeline
4.1整体培训管道

Given a set of multi-view images and their corresponding camera pose pairs {

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/610774
推荐阅读
相关标签