Withdraw
Loading…
Towards high-quality, accessible, and universal video matting
Li, Jiachen
Loading…
Permalink
https://hdl.handle.net/2142/127197
Description
- Title
- Towards high-quality, accessible, and universal video matting
- Author(s)
- Li, Jiachen
- Issue Date
- 2024-12-03
- Director of Research (if dissertation) or Advisor (if thesis)
- Shi, Humphrey
- Doctoral Committee Chair(s)
- Shi, Humphrey
- Committee Member(s)
- Hwu, Wen-Mei
- Hasegawa-Johnson, Mark
- Xiong, Jinjun
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Video Matting
- Abstract
- Video matting seeks to predict alpha mattes for each frame in a given video sequence. While current approaches leverage deep convolutional neural networks (CNNs) to achieve accurate alpha matte predictions, these methods are typically trained and evaluated on private or inaccessible matting datasets. Furthermore, existing solutions generate only a single alpha matte per frame, without distinguishing between individual human instances present in the video. In this dissertation, we aim to address these limitations in video matting techniques and extend the capabilities of video matting to accommodate more complex scenarios. we first propose VideoMatt, a simple and strong real-time video matting baseline model VideoMatt-S/T, with a fully composited accessible video matting benchmark. Then, we propose VMFormer, which utilizes a transformer-based end-to-end method for video matting that outperforms previous CNN-based state-of-the-art solutions, which sets a new track for video matting solutions and demonstrates the potential of our proposed approach. Considering that both VideoMatt and VMFormer can only handle semantic-aware video matting, we further propose Video Instance Matting (VIM), a new task estimating alpha mattes of each instance at each frame of a video sequence, with corresponding VIM50 benchmark and the baseline model MSG-VIM. VIM extends video matting to an instance-aware problem and improves its applicability. Finally, we introduce Matting Anything, a method to estimate the instance-aware and class-aware alpha matte with one universal model. This approach broadens the matting task to accommodate more general use cases, addressing the evolving needs of modern image and video editing. We also explore future directions, such as developing an end-to-end video editing system and integrating the video matting capacity into the multimodal large language models (LLMs).
- Graduation Semester
- 2024-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/127197
- Copyright and License Information
- Copyright 2024 Jiachen Li
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…