|
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Ziqi Pang*,
Tianyuan Zhang
*,
Fujun Luan
,
Yunze Man
,
Hao Tan
,
Kai Zhang
,
William T. Freeman
,
Yu-Xiong Wang
In Submission
Project Page
/
Code
/
arXiv
We enable a GPT-style causal transformer to generate images in random orders,
which unlocks a series of new capabilities for decoder-only autoregressive models.
|
|
GLUS: Global-Local Reasoning Unified into
A Single Large Language Model for Video Segmentation
Lang Lin*,
Xueyang Yu*,
Ziqi Pang*,
Yu-Xiong Wang
In Submission
Project Page
We propose a simple yet effective MLLMs for language-instructed video segmentation. It emphasizes global-local video understanding and achieves SOTA performance on multiple benchmarks.
|
|
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang*,
Xin Xu*,
Yu-Xiong Wang
In Submission
Our paper answers several critical question on diffusion models for visual perception:
(1) how to train diffusion-based perception models, (2) how to utilize diffusion models as a unique interactive user interface.
|
|
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs
Bowen Jin,
Ziqi Pang,
Bingjun Guo,
Yu-Xiong Wang,
Jiaxuan You,
Jiawei Han
NeurIPS , 2024
Project Page
/
Code
/
arXiv
Using the relationships between entities, we can better control the generation of images with multi-modal graphs.
|
|
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou*,
Ziqi Pang*,
Yu-Xiong Wang
CVPR , 2024 (Winner at ECCV 2024 VOTS Challenge)
Project Page
/
Code
/
arXiv
Managing memory banks better significantly improves VOS on challenging state-changes and long videos. Similar strategy is also adopted in SAM2 later.
|
|
Frozen Transformers from Language Models are Effective Visual Encoder Layers
Ziqi Pang,
Ziyang Xie*,
Yunze Man*,
Yu-Xiong Wang
ICLR , 2024 (Spotlight)  
Code
/
arXiv
Frozen transformers from language models, though trained solely on textual data, can effectively improves diverse visual tasks by directly encoding visual tokens.
It is an essential step for my research on "generative models benefiting perception."
|
|
MV-Map: Offboard HD-Map Generation with Multi-view Consistency
Ziyang Xie*,
Ziqi Pang*,
Yu-Xiong Wang
ICCV, 2023  
Code
/
arXiv
/
Demo
MV-Map is the first offboard auto-labeling pipeline for HD-Maps, whose crust is to fuse BEV perception results guided by geometric cues from NeRFs.
|
|
Streaming Motion Forecasting for Autonomous Driving
Ziqi Pang,
Deva Ramanan,
Mengtian Li,
Yu-Xiong Wang
IROS, 2023  
Code
/
arXiv
/
Demo
"Streaming forecasting" mitigates the gap between "snapshot-based" conventional motion forecasting and the streaming real-world traffic.
|
|
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking (Alias: PF-Track)
Ziqi Pang,
Jie Li,
Pavel Tokmakov,
Dian Chen,
Sergey Zagoruyko,
Yu-Xiong Wang
CVPR, 2023  
Code
/
arXiv
/
Demo
PF-Track is an vision-centric 3D MOT framework that dramatically decreases ID-Switches by 90% with an end-to-end framework for autonomous driving.
|
|
Embracing Single Stride 3D Object Detector with Sparse Transformer (Alias: SST)
Lue Fan,
Ziqi Pang,
Tianyuan Zhang,
Yu-Xiong Wang,
Hang Zhao,
Feng Wang,
Naiyan Wang,
Zhaoxiang Zhang
CVPR, 2022  
Code
/
arXiv
SST emphasize the small object sizes and sparsity of point clouds. Its sparse transformers enlight new backbones for outdoor LiDAR-based detection.
|
|
SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking
Ziqi Pang,
Zhichao Li,
Naiyan Wang
ECCV Workshop, 2022  
Code
/
arXiv /
Patent
SimpleTrack is simple-yet-effective 3D MOT system. It is one of the most widely adopted 3D MOT baseline worldwide.
|
|
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences (Alias: LiDAR-SOT)
Ziqi Pang,
Zhichao Li,
Naiyan Wang
IROS, 2021  
Code
/
arXiv /
Demo
LiDAR-SOT is a LiDAR-based data flywheeel and auto-labeling pipeline for autonomous driving.
|