Files in this item



application/pdfKUMAR-THESIS-2019.pdf (26MB)Restricted to U of Illinois
(no description provided)PDF


Title:Concepts from unclear textual embeddings for text-to-image synthesis
Author(s):Kumar, Maghav
Advisor(s):Schwing, Alexander
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer Vision
deep learning
machine learning
Abstract:Automatically generating images based on a natural language description is a challenging problem with several key applications in the fields of retail, marketing, education and entertainment. In the last few years, some progress has been made in this direction specifically by using Generative Adversarial Networks(GANs). Although current state of the art models can generate images that roughly adhere to the textual description, there still remains a long way to go, in terms of producing high quality images that adhere to the nuances of the sentence. To this end we propose CuteGAN, our simple text-to-image generation approach that encourages the model to leverage the attribute information while also attending to more relevant words in a sentence while generating images. We perform experiments on the competitive CUB-200 and MS-COCO datasets and achieve state-of-the-art performance on standard metrics of inception score and R-precision, indicating that our method produces more photo-realistic images that are better correlated with the text.
Issue Date:2019-12-10
Rights Information:Copyright 2019 Maghav Kumar
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12

This item appears in the following Collection(s)

Item Statistics