Withdraw
Loading…
Methods for generating visual programs with optimizable vision models
Levine, Joshua
Loading…
Permalink
https://hdl.handle.net/2142/124357
Description
- Title
- Methods for generating visual programs with optimizable vision models
- Author(s)
- Levine, Joshua
- Issue Date
- 2024-04-29
- Director of Research (if dissertation) or Advisor (if thesis)
- Hoiem, Derek
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Visual Programming
- Visual Question Answering
- Large Language Models
- Computer Vision
- Program Generation
- Language
- eng
- Abstract
- End-to-end vision-language models often fail to handle compositional tasks, necessitating alternative approaches for more complex problem-solving. Leveraging the visual programming paradigm, we propose a novel method for composing foundational vision models through program generation to tackle compositional tasks effectively. We investigate prompting and execution strategies that enable the synthesis of fine-tunable code by trainable large language models aimed at improving the effectiveness of the programs in solving vision-language tasks. Capitalizing on the robust compositional reasoning capabilities of large language models (LLMs), we employ pre-trained LLMs to architect programs constructed using a catalog of pre-defined atomic functions. These atomic functions, implemented with pre-trained vision models, serve as the building blocks for the visual programs generated by our system. Our methodology supports programs in various formats, always offering the flexibility to fine-tune the constituent vision models and the LLM code generator. This study concentrates on image-based question-answering. This focus underscores the critical need for advanced compositional reasoning in interpreting and responding to complex visual queries. Our evaluation encompasses the executability and correctness of the produced programs, providing a comprehensive assessment of our approach's effectiveness. This paper lays the groundwork for a subsequent investigation into the joint training of the LLMs and atomic functions, setting the stage for significant advancements in program generation and compositional reasoning in computer vision.
- Graduation Semester
- 2024-05
- Type of Resource
- Text
- Handle URL
- https://hdl.handle.net/2142/124357
- Copyright and License Information
- Copyright 2024 Joshua Levine
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Siebel School of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…