Files in this item



application/pdfCHANG-THESIS-2018.pdf (15MB)
(no description provided)PDF


Title:Adopting the two-branch network to video-text tasks
Author(s):Chang, Hsiao-Ching
Advisor(s):Lazebnik, Svetlana
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer vision
Video captioning
Abstract:Modeling visual context and its corresponding text description with a joint embedding network has been an effective way to enable cross-modal retrieval. However, while abundant work has been done for image-text tasks, not much exists with regards to the video domain. We hope to adopt a nonlinear embedding model, the two-branch network, to the video-text tasks in order to show its robustness. Two kinds of tasks are explored, bidirectional video-sentence retrieval and video description generation. For the retrieval task, we use nearest neighbor search to get the corresponding video or text with respect to the query. For video captioning, we incorporate the two-branch network in a traditional LSTM model with an additional embedding loss term in order to demonstrate its ability of preserving a semantic structure between video and text.
Issue Date:2018-04-23
Rights Information:Copyright 2018 Hsiao-Ching Chang
Date Available in IDEALS:2018-09-04
Date Deposited:2018-05

This item appears in the following Collection(s)

Item Statistics