Withdraw
Loading…
Enhancing multi-label object recognition in complex images via region-based continual learning
Chatikyan, Elen
Loading…
Permalink
https://hdl.handle.net/2142/129917
Description
- Title
- Enhancing multi-label object recognition in complex images via region-based continual learning
- Author(s)
- Chatikyan, Elen
- Issue Date
- 2025-07-18
- Director of Research (if dissertation) or Advisor (if thesis)
- Hoiem, Derek W.
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- multi-label object recognition
- region-based learning
- continual learning
- computer vision
- weak supervision
- CLIP
- DINOv2
- segmentation masks
- negative sampling
- open-vocabulary classification
- SAM
- binary cross-entropy loss
- image-level supervision
- COCO dataset
- vision-language models
- Abstract
- This thesis explores a region-based continual learning approach for multi-label object recognition in complex, multi-object scenes, designed to facilitate efficient learning from limited region-level supervision. We build on the AnytimeCL framework [1] by adapting it to a region-aware, multi-label setting. Our approach supports fine-grained learning with limited supervision by training on object-level regions from a controlled, balanced subset of the COCO dataset (14,000 images). Rather than emphasizing extremely low-shot scenarios at this stage, we focus on validating the effectiveness of region-based learning in improving multi-label object detection under moderate supervision. Investigating whether similar gains hold in extremely low-shot settings (e.g., 15–20 examples per class) is left for future work. To improve region-level predictions, we incorporate value-projected features from the final layers of transformer models, following the method proposed by Xiao et al. in their TextRegion work [2], which enhances class-specific region alignment. Our system combines vision-language features from CLIP and DINOv2, utilizing binary cross-entropy loss to support multi-label classification. To strengthen learning under limited supervision, we incorporate negative sampling at both the region and label levels. Areas corresponding to background or unrelated objects are treated as negatives, while unannotated classes are treated as negative labels. Annotated classes are ignored outside their corresponding regions. As a result, the model focuses on fine-grained object representations without requiring dense annotations. At inference time, we employ the Segment Anything Model (SAM)~\cite{kirillov2023segment} with a filtering layer to propose candidate object regions in unannotated images. This enables scalable, region-level predictions without requiring inference-time annotations. We evaluate our approach on a subset of the COCO dataset~\cite{lin2014microsoft}, using both region-level and image-level metrics, including Top-1 accuracy, F1 score, mean average precision (mAP), and subset accuracy. Our findings highlight the effectiveness of combining region-level learning with negative sampling for scalable, fine-grained multi-label recognition under limited supervision.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129917
- Copyright and License Information
- Copyright 2025 Elen Chatikyan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…