当前位置:   article > 正文

CVPR2024|AIGC(图像生成,视频生成等)相关论文汇总(附论文链接/开源代码/解析)【持续更新】_cvpr2024 绘本生成

cvpr2024 绘本生成

CVPR2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)

Awesome-CVPR2024-AIGC

A Collection of Papers and Codes for CVPR2024 AIGC

整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。

欢迎star,fork和PR~
优先在Github更新Awesome-CVPR2024-AIGC,欢迎star~
知乎https://zhuanlan.zhihu.com/p/684325134

参考或转载请注明出处

CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024

CVPR完整论文列表:

开会时间:2024年6月17日-6月21日

论文接收公布时间:

【Contents】

1.图像生成(Image Generation/Image Synthesis)

CapHuman: Capture Your Moments in Parallel Universes

  • Paper: https://arxiv.org/abs/2402.18078
  • Code: https://github.com/VamosC/CapHuman

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

  • Paper: https://arxiv.org/abs/2402.00627
  • Code: https://github.com/YanzuoLu/CFLD

DeepCache: Accelerating Diffusion Models for Free

  • Paper: https://arxiv.org/abs/2312.00858
  • Code: https://github.com/horseee/DeepCache

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

  • Paper: https://arxiv.org/abs/2402.09812
  • Code: https://github.com/KU-CVLAB/DreamMatcher

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

  • Paper: https://arxiv.org/abs/2211.02048
  • Code: https://github.com/mit-han-lab/distrifuser

Diversity-aware Channel Pruning for StyleGAN Compression

  • Paper: https://arxiv.org/abs/2211.02048
  • Code: https://github.com/jiwoogit/DCP-GAN

Discriminative Probing and Tuning for Text-to-Image Generation

  • Paper: https://arxiv.org/abs/2211.02048
  • Code:

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

  • Paper: https://arxiv.org/abs/2312.04655
  • Code: https://github.com/eclipse-t2i/eclipse-inference

Efficient Dataset Distillation via Minimax Diffusion

  • Paper: https://arxiv.org/abs/2311.15529
  • Code: https://github.com/vimar-gu/MinimaxDiffusion

ElasticDiffusion: Training-free Arbitrary Size Image Generation

  • Paper: https://arxiv.org/abs/2311.18822
  • Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official

High-fidelity Person-centric Subject-to-Image Synthesis

  • Paper: https://arxiv.org/abs/2311.10329
  • Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file

InstanceDiffusion: Instance-level Control for Image Generation

  • Paper: https://arxiv.org/abs/2402.03290
  • Code: https://github.com/frank-xwang/InstanceDiffusion

Instruct-Imagen: Image Generation with Multi-modal Instruction

  • Paper: https://arxiv.org/abs/2401.01952
  • Code:

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

  • Paper: https://arxiv.org/abs/2306.00973
  • Code: https://github.com/haoningwu3639/StoryGen

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

  • Paper: https://arxiv.org/abs/2312.05849
  • Code: https://github.com/jiuntian/interactdiffusion

Inversion-Free Image Editing with Natural Language

  • Paper: https://arxiv.org/abs/2312.04965
  • Code: https://github.com/sled-group/InfEdit

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

  • Paper:
  • Code: https://github.com/ewrfcas/LeftRefill

Let us democratise high-resolution generation

  • Paper: https://arxiv.org/abs/2401.01952
  • Code: https://github.com/PRIS-CV/DemoFusion

MACE: Mass Concept Erasure in Diffusion Models

  • Paper: https://arxiv.org/abs/2402.05408
  • Code: https://github.com/Shilin-LU/MACE

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

  • Paper: https://arxiv.org/abs/2402.05408
  • Code: https://github.com/limuloo/MIGC

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

  • Paper: https://arxiv.org/abs/2312.04461
  • Code: https://github.com/TencentARC/PhotoMaker

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

  • Paper:
  • Code: https://github.com/cszy98/PLACE

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

  • Paper: https://arxiv.org/abs/2305.16223
  • Code: https://github.com/SHI-Labs/Prompt-Free-Diffusion

Residual Denoising Diffusion Models

  • Paper: https://arxiv.org/abs/2308.13712
  • Code: https://github.com/nachifur/RDDM

Shadow Generation for Composite Image Using Diffusion Model

  • Paper: https://arxiv.org/abs/2308.09972
  • Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

  • Paper: https://arxiv.org/abs/2312.04410
  • Code: https://github.com/SHI-Labs/Smooth-Diffusion

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

  • Paper: https://arxiv.org/abs/2312.01725
  • Code: https://github.com/rlawjdghek/StableVITON

SVGDreamer: Text Guided SVG Generation with Diffusion Model

  • Paper: https://arxiv.org/abs/2312.16476
  • Code: https://github.com/ximinng/SVGDreamer

TokenCompose: Grounding Diffusion with Token-level Supervision

  • Paper: https://arxiv.org/abs/2312.03626
  • Code: https://github.com/mlpc-ucsd/TokenCompose

2.图像编辑(Image Editing)

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

  • Paper: https://arxiv.org/abs/2311.18608
  • Code: https://github.com/HyelinNAM/ContrastiveDenoisingScore

Deformable One-shot Face Stylization via DINO Semantic Guidance

  • Paper: https://arxiv.org/abs/2403.00459
  • Code: https://github.com/zichongc/DoesFS

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

  • Paper: https://arxiv.org/abs/2312.07409
  • Code: https://github.com/Kevin-thu/DiffMorpher

Edit One for All: Interactive Batch Image Editing

  • Paper: https://arxiv.org/abs/2401.10219
  • Code: https://github.com/thaoshibe/edit-one-for-all

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

  • Paper: https://arxiv.org/abs/2312.10113
  • Code: https://github.com/guoqincode/Focus-on-Your-Instruction

Inversion-Free Image Editing with Natural Language

  • Paper: hhttps://arxiv.org/abs/2312.04965
  • Code: https://github.com/sled-group/InfEdit

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

  • Paper: https://arxiv.org/abs/2303.17546
  • Code: https://github.com/Picsart-AI-Research/PAIR-Diffusion

Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing

  • Paper: https://arxiv.org/abs/2303.17546
  • Code: https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE?tab=readme-ov-file

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

  • Paper: https://arxiv.org/abs/2312.13964
  • Code: https://github.com/open-mmlab/PIA

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

  • Paper: https://arxiv.org/abs/2312.06739
  • Code: https://github.com/TencentARC/SmartEdit

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

  • Paper: https://arxiv.org/abs/2312.09008
  • Code: https://github.com/jiwoogit/Style-InDi

3.视频生成(Video Generation/Video Synthesis)

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

  • Paper: https://arxiv.org/abs/2312.15770
  • Code: https://tf-t2v.github.io/

DisCo: Disentangled Control for Realistic Human Dance Generation

  • Paper: https://arxiv.org/abs/2307.00040
  • Code: https://github.com/Wangt-CN/DisCo

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

  • Paper: https://arxiv.org/abs/2311.16498
  • Code: https://github.com/magic-research/magic-animate

Make Your Dream A Vlog

  • Paper: https://arxiv.org/abs/2401.09414
  • Code: https://github.com/Vchitect/Vlogger

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

  • Paper: https://arxiv.org/abs/2311.16813
  • Code: https://github.com/wenyuqing/panacea

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

  • Paper: https://arxiv.org/abs/2308.13712
  • Code: https://github.com/yzxing87/Seeing-and-Hearing

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

  • Paper: https://arxiv.org/abs/2311.17590
  • Code: https://github.com/ZiqiaoPeng/SyncTalk

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

  • Paper: https://arxiv.org/abs/2401.09047
  • Code: https://github.com/AILab-CVC/VideoCrafter

4.视频编辑(Video Editing)

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

  • Paper: https://arxiv.org/abs/2308.07926
  • Code: https://github.com/qiuyu96/CoDeF

VidToMe: Video Token Merging for Zero-Shot Video Editing

  • Paper: https://arxiv.org/abs/2312.10656
  • Code: https://github.com/lixirui142/VidToMe

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

  • Paper: https://arxiv.org/abs/2312.00845
  • Code: https://github.com/HyeonHo99/Video-Motion-Customization

5.3D生成(3D Generation/3D Synthesis)

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

  • Paper: https://arxiv.org/abs/2311.16096
  • Code: https://github.com/lizhe00/AnimatableGaussians

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

  • Paper: https://arxiv.org/abs/2312.02136
  • Code: https://github.com/zqh0253/BerfScene

CAD: Photorealistic 3D Generation via Adversarial Distillation

  • Paper: https://arxiv.org/abs/2312.06663
  • Code: https://github.com/raywzy/CAD

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

  • Paper: https://arxiv.org/abs/2309.00610
  • Code: https://github.com/kxhit/EscherNet

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

  • Paper: https://arxiv.org/abs/2401.09050
  • Code: https://github.com/sail-sg/Consistent3D

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

  • Paper: https://arxiv.org/abs/2304.00916
  • Code: https://github.com/yukangcao/DreamAvatar

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

  • Paper: https://arxiv.org/abs/2312.03611
  • Code: https://github.com/yhyang-myron/DreamComposer

EscherNet: A Generative Model for Scalable View Synthesis

  • Paper: https://arxiv.org/abs/2402.03908
  • Code: https://github.com/hzxie/city-dreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

  • Paper: https://arxiv.org/abs/2310.08529
  • Code: https://github.com/hustvl/GaussianDreamer

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

  • Paper: https://arxiv.org/abs/2401.04092
  • Code: https://github.com/3DTopia/GPTEval3D

Gaussian Shell Maps for Efficient 3D Human Generation

  • Paper: https://arxiv.org/abs/2311.17857
  • Code: https://github.com/computational-imaging/GSM

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

  • Paper: https://arxiv.org/abs/2312.15980
  • Code: https://github.com/byeongjun-park/HarmonyView

MoMask: Generative Masked Modeling of 3D Human Motions

  • Paper: https://arxiv.org/abs/2312.00063
  • Code: https://github.com/EricGuo5513/momask-codes

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

  • Paper: https://arxiv.org/abs/2312.06725
  • Code: https://github.com/huanngzh/EpiDiff

PEGASUS: Personalized Generative 3D Avatars with Composable Attributes

  • Paper: https://arxiv.org/abs/2402.10636
  • Code: https://github.com/snuvclab/pegasus

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.

  • Paper: https://arxiv.org/abs/2311.16918
  • Code: https://github.com/modelscope/richdreamer

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

  • Paper: https://arxiv.org/abs/2311.17261
  • Code: https://github.com/daveredrum/SceneTex

SceneWiz3D: Towards Text-guided 3D Scene Composition

  • Paper: https://arxiv.org/abs/2312.08885
  • Code: https://github.com/zqh0253/SceneWiz3D

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

  • Paper: https://arxiv.org/abs/2312.06655
  • Code: https://github.com/liuff19/Sherpa3D

Text-to-3D using Gaussian Splatting

  • Paper: https://arxiv.org/abs/2309.16585
  • Code: https://github.com/gsgen3d/gsgen

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

  • Paper: https://arxiv.org/abs/2312.01305
  • Code: https://github.com/ubc-vision/vivid123

6.3D编辑(3D Editing)

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

  • Paper: https://arxiv.org/abs/2311.14521
  • Code: https://github.com/buaacyw/GaussianEditor

7.多模态大语言模型(Multi-Modal Large Language Models)

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

  • Paper: https://arxiv.org/abs/2312.03818
  • Code: https://github.com/SunzeY/AlphaCLIP

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

  • Paper: https://arxiv.org/abs/2311.08046
  • Code: https://github.com/PKU-YuanGroup/Chat-UniVi

Efficient Stitchable Task Adaptation

  • Paper: https://arxiv.org/abs/2311.17352
  • Code: https://github.com/ziplab/Stitched_LLaMA

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

  • Paper: https://arxiv.org/abs/2312.02980
  • Code: https://github.com/Pointcept/GPT4Point

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

  • Paper: https://arxiv.org/abs/2312.14238
  • Code: https://github.com/OpenGVLab/InternVL

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

  • Paper: https://arxiv.org/abs/2311.11860
  • Code: https://github.com/rshaojimmy/JiuTian

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

  • Paper: https://arxiv.org/abs/2311.18651
  • Code: https://github.com/Open3DA/LL3DA

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

  • Paper: https://arxiv.org/abs/2311.16922
  • Code: https://github.com/DAMO-NLP-SG/VCD

OneLLM: One Framework to Align All Modalities with Language

  • Paper: https://arxiv.org/abs/2312.03700
  • Code: https://github.com/csuhan/OneLLM

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

  • Paper: https://arxiv.org/abs/2311.17911
  • Code: https://github.com/shikiw/OPERA

PixelLM: Pixel Reasoning with Large Multimodal Model

  • Paper: https://arxiv.org/abs/2312.02228
  • Code: https://github.com/MaverickRen/PixelLM

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

  • Paper: https://arxiv.org/abs/2312.04302
  • Code: https://github.com/dvlab-research/Prompt-Highlighter

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

  • Paper: https://arxiv.org/abs/2311.06783
  • Code: https://github.com/Q-Future/Q-Instruct

SEED-Bench: Benchmarking Multimodal Large Language Models

  • Paper: https://arxiv.org/abs/2311.17092
  • Code: https://github.com/AILab-CVC/SEED-Bench

VBench: Comprehensive Benchmark Suite for Video Generative Models

  • Paper: https://arxiv.org/abs/2311.17982
  • Code: https://github.com/Vchitect/VBench

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

  • Paper: https://arxiv.org/abs/2312.00784
  • Code: https://github.com/mu-cai/ViP-LLaVA

8.其他多任务(Others)

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

  • Paper: https://arxiv.org/abs/2310.11440
  • Code: https://github.com/evalcrafter/EvalCrafter
    持续更新~

参考

CVPR 2024 论文和开源项目合集(Papers with Code)

相关整理

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/269239
推荐阅读
相关标签
  

闽ICP备14008679号