Enhancing multi-label object recognition in complex images via region-based continual learning

Chatikyan, Elen

Enhancing multi-label object recognition in complex images via region-based continual learning

Chatikyan, Elen

Permalink

https://hdl.handle.net/2142/129917

Description

Title

Enhancing multi-label object recognition in complex images via region-based continual learning

Author(s)

Chatikyan, Elen

Issue Date

2025-07-18

Director of Research (if dissertation) or Advisor (if thesis)

Hoiem, Derek W.

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Multi-label Object Recognition
Region-based Learning
Continual Learning
Computer Vision
Weak Supervision
Clip
Dinov2
Segmentation Masks
Negative Sampling
Open-vocabulary Classification
Sam
Binary Cross-entropy Loss
Image-level Supervision
Coco Dataset
Vision-language Models

Language

eng

Abstract

This thesis explores a region-based continual learning approach for multi-label object recognition in complex, multi-object scenes, designed to facilitate efficient learning from limited region-level supervision. We build on the AnytimeCL framework [1] by adapting it to a region-aware, multi-label setting. Our approach supports fine-grained learning with limited supervision by training on object-level regions from a controlled, balanced subset of the COCO dataset (14,000 images). Rather than emphasizing extremely low-shot scenarios at this stage, we focus on validating the effectiveness of region-based learning in improving multi-label object detection under moderate supervision. Investigating whether similar gains hold in extremely low-shot settings (e.g., 15–20 examples per class) is left for future work. To improve region-level predictions, we incorporate value-projected features from the final layers of transformer models, following the method proposed by Xiao et al. in their TextRegion work [2], which enhances class-specific region alignment. Our system combines vision-language features from CLIP and DINOv2, utilizing binary cross-entropy loss to support multi-label classification. To strengthen learning under limited supervision, we incorporate negative sampling at both the region and label levels. Areas corresponding to background or unrelated objects are treated as negatives, while unannotated classes are treated as negative labels. Annotated classes are ignored outside their corresponding regions. As a result, the model focuses on fine-grained object representations without requiring dense annotations. At inference time, we employ the Segment Anything Model (SAM)~\cite{kirillov2023segment} with a filtering layer to propose candidate object regions in unannotated images. This enables scalable, region-level predictions without requiring inference-time annotations. We evaluate our approach on a subset of the COCO dataset~\cite{lin2014microsoft}, using both region-level and image-level metrics, including Top-1 accuracy, F1 score, mean average precision (mAP), and subset accuracy. Our findings highlight the effectiveness of combining region-level learning with negative sampling for scalable, fine-grained multi-label recognition under limited supervision.

Graduation Semester

2025-08

Type of Resource

Text

Handle URL

https://hdl.handle.net/2142/129917

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Enhancing multi-label object recognition in complex images via region-based continual learning

Chatikyan, Elen

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In