Files in this item

FilesDescriptionFormat

application/pdf

application/pdfCHENG-THESIS-2021.pdf (1MB)
(no description provided)PDF

Description

Title:A deeper look into multi-task learning ability of unified text-to-text transformer
Author(s):Cheng, Xiang
Advisor(s):Zhai, Chengxiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):natural language processing
structure prediction
multi-task learning
Abstract:Structure prediction (SP) tasks are important in natural language understanding in the sense that they provide complex and structured knowledge of the text. Recently, some unified text-to-text transformer models like T5 and TANL have produced competitive results on SP tasks. These models convert SP tasks into a seq2seq problem, where a transformer is used to generate sequences with special tokens representing the extracted spans, labels, and relationships. Compared to many popular Natural Language Understanding models that are designed specifically for the task, the output of the text-to-text transformer is more flexible. With proper format, it could be trained on multiple tasks together and take advantage of the shared knowledge between tasks. To better understand how these models achieve better performance by multi-task learning, we designed several experiments to measure the knowledge transfer ability of a recently proposed model, TANL. In our experiments, we found that the multi-head attention in the decoder can capture the relationship between tasks which leads to performance improvement. Another finding is that TANL may produce many outputs with invalid format when trained from scratch, and starting from a T5 pre-trained model helps to mitigate this problem. Based on these observations and some new intuitions, we proposed an improved version of TANL called SDCT5 (step decomposed and constrained text-to-text Transformer). Preliminary experiment results show that our model can achieve better performance on SP tasks compared to TANL and benefit more from multi-task learning.
Issue Date:2021-04-27
Type:Thesis
URI:http://hdl.handle.net/2142/110569
Rights Information:Copyright 2021 Xiang Cheng
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05


This item appears in the following Collection(s)

Item Statistics