Ziqi Pang (庞子奇)

I am a fourth-year CS Ph.D. student focusing on computer vision and machine learning at University of Illinois Urabana-Champaign (UIUC), where my advisor is Prof. Yu-Xiong Wang. Before that, I graduated from Peking University (PKU) with a Bachelor degree in Computer Science.

I interned at Toyota Research Institute (TRI) with Dr. Pavel Tokmakov during my Ph.D. study. Prior to joining UIUC, I interned at Carnegie Mellon University (CMU) with Prof. Martial Hebert, practiced research at Peking University (PKU) with Prof. Shiliang Zhang, and spent an exciting year at TuSimple pushing the boundaries of autonomous driving guided by Dr. Naiyan Wang.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  / 

profile photo
Research

Enhance the Knowledge in Generative Foundation Models for Embodied Perception in Long Videos. I care about embodied perception in 2D, 3D and 4D. I envision generative pre-trained model as the critical component enabling their scaling and self-improvement.

I am actively seeking summer 2025 research internships. Feel free to contact with me if you think I could make a contribution to your team.

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Ziqi Pang*, Tianyuan Zhang *, Fujun Luan , Yunze Man , Hao Tan , Kai Zhang , William T. Freeman , Yu-Xiong Wang
In Submission
Project Page / Code / arXiv

We enable a GPT-style causal transformer to generate images in random orders, which unlocks a series of new capabilities for decoder-only autoregressive models.

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin*, Xueyang Yu*, Ziqi Pang*, Yu-Xiong Wang
In Submission
Project Page

We propose a simple yet effective MLLMs for language-instructed video segmentation. It emphasizes global-local video understanding and achieves SOTA performance on multiple benchmarks.

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang*, Xin Xu*, Yu-Xiong Wang
In Submission

Our paper answers several critical question on diffusion models for visual perception: (1) how to train diffusion-based perception models, (2) how to utilize diffusion models as a unique interactive user interface.

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs
Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han
NeurIPS , 2024
Project Page / Code / arXiv

Using the relationships between entities, we can better control the generation of images with multi-modal graphs.

RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou*, Ziqi Pang*, Yu-Xiong Wang
CVPR , 2024 (Winner at ECCV 2024 VOTS Challenge)
Project Page / Code / arXiv

Managing memory banks better significantly improves VOS on challenging state-changes and long videos. Similar strategy is also adopted in SAM2 later.

Frozen Transformers from Language Models are Effective Visual Encoder Layers
Ziqi Pang, Ziyang Xie*, Yunze Man*, Yu-Xiong Wang
ICLR , 2024 (Spotlight)  
Code / arXiv

Frozen transformers from language models, though trained solely on textual data, can effectively improves diverse visual tasks by directly encoding visual tokens. It is an essential step for my research on "generative models benefiting perception."

MV-Map: Offboard HD-Map Generation with Multi-view Consistency
Ziyang Xie*, Ziqi Pang*, Yu-Xiong Wang
ICCV, 2023  
Code / arXiv / Demo

MV-Map is the first offboard auto-labeling pipeline for HD-Maps, whose crust is to fuse BEV perception results guided by geometric cues from NeRFs.

Streaming Motion Forecasting for Autonomous Driving
Ziqi Pang, Deva Ramanan, Mengtian Li, Yu-Xiong Wang
IROS, 2023  
Code / arXiv / Demo

"Streaming forecasting" mitigates the gap between "snapshot-based" conventional motion forecasting and the streaming real-world traffic.

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking (Alias: PF-Track)
Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang
CVPR, 2023  
Code / arXiv / Demo

PF-Track is an vision-centric 3D MOT framework that dramatically decreases ID-Switches by 90% with an end-to-end framework for autonomous driving.

Embracing Single Stride 3D Object Detector with Sparse Transformer (Alias: SST)
Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
CVPR, 2022  
Code / arXiv

SST emphasize the small object sizes and sparsity of point clouds. Its sparse transformers enlight new backbones for outdoor LiDAR-based detection.

SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking
Ziqi Pang, Zhichao Li, Naiyan Wang
ECCV Workshop, 2022  
Code / arXiv / Patent

SimpleTrack is simple-yet-effective 3D MOT system. It is one of the most widely adopted 3D MOT baseline worldwide.

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences (Alias: LiDAR-SOT)
Ziqi Pang, Zhichao Li, Naiyan Wang
IROS, 2021  
Code / arXiv / Demo

LiDAR-SOT is a LiDAR-based data flywheeel and auto-labeling pipeline for autonomous driving.


Huge thanks to Jon Barron for proving the template for the page.