Withdraw
Loading…
Long-form analogy extraction, generation, and evaluation with large language models
Bhavya, -
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/127229
Description
- Title
- Long-form analogy extraction, generation, and evaluation with large language models
- Author(s)
- Bhavya, -
- Issue Date
- 2024-11-27
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Doctoral Committee Chair(s)
- Zhai, ChengXiang
- Committee Member(s)
- Bhat, Suma
- Ji, Heng
- Xiong, Jinjun
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- long-form analogies
- LLM
- generation
- extraction
- evaluation
- ChatGPT
- education
- NLP
- large language model
- natural language processing
- Abstract
- Analogies make connections between two seemingly disparate concepts. For example, “Bohr’s atomic model is like the solar system because the electrons revolve around the nucleus just like the planets revolve around the sun.” In this way, they map an unfamiliar concept (called target) to a more familiar one (called source) and explain the connections or similarity between them. Analogies are highly useful for explaining and illustrating educational concepts, problem solving, inspiring creativity and scientific innovation. Thus, many users, including teachers, students, and writers could benefit from having easy access to suitable analogies. Although analogy identification, and metaphor and simile detection and generation have been studied for a long time in Natural Language Processing, extraction and generation of long-form analogies, i.e., analogies in natural language that are typically a few sentences-long and have a detailed description explaining the similarity between concepts, has not been explored well, which is the focus of this thesis. Although there exist many analogies scattered on the web today, there is very limited work done toward building a dedicated search engine for automatically finding all relevant analogies from Web pages. Moreover, in the absence of suitable analogies on the web, the ability to automatically generate new analogies is also essential, for example, to explain emerging concepts and for inspiring creativity. Finally, in both searching and generating analogies, there is a need to assess to quality and relevance of analogies to identify the suitable ones. Since human evaluation is not always feasible, there is a need to develop automatic metrics for evaluating analogies. In this thesis, we tackle these three challenges, namely, how to extract, generate, and evaluate long-form analogies to enable practical applications, such as, explanation and illustration of concepts in education and creative writing. Firstly, we propose and study a new task, called Analogy Detection and Extraction (AnaDE), which includes three synergistic sub-tasks: (1) detecting documents containing analogies (AnaDet), (2) extracting text segments that make up the analogy (AnaSE), and (3) identifying the concepts being compared (AnaConE). To study these tasks, we scrape a dataset of 3.6k analogies and benchmark the performance of SOTA models. We find that the best model achieves a high performance (94% F1) on AnaDet, suggesting that it could be used for building a large repository of Web pages with analogies. On the other hand, AnaSE and AnaConE are quite challenging for current models (63.82 and 89.08 best F1 respectively) and thus, there is ample opportunity for future research. Secondly, we study how to generate an analogy about a topic by leveraging large language models (LLMs). Specifically, we propose and study the following two tasks: generating an explanation of the similarity between a given pair of target concept and source concept (aka Analogous Explanation Generation or AEG), and generating both a source concept analogous to a given target concept along with an explanation of similarity (aka Analogous Concept Generation or ACG). To study these tasks, we leverage InstructGPT to generate analogies about 108 curated science concepts. Based on manual rating of generated analogies on Amazon Mechanical Turk, we find that the model achieves human-level performance on ACG but struggles on AEG. Our analysis also sheds light on the brittle nature of this model. Building upon this work, we propose and study the Creative Analogy Mining (CAM) framework to mine creative analogies, which consists of the following three stages: LLM-based analogy generation; evaluation of generated analogies based on three criteria (Analogical Style, Meaningfulness, and Novelty) by suitably designed scoring methods; and their iterative refinement by suitably re-prompting the model. Based on a manual evaluation on Amazon Mechanical Turk, we find that CAM can mine 13.7% highly creative analogies. Further, we study how to generate another kind of analogies for education, namely, grade-appropriate analogies. To this end, we introduce ELIx, a novel framework for fine-tuning LLMs with self-generated data, which is iteratively generated based on automatic grade-calibration feedback from readability assessment measures. Our experimental results show that ELIx can help generate more analogies that are both meaningful and grade-appropriate, particularly for lower grade bands, where LLMs struggle the most. After a preliminary exploration of automatic evaluation of generated analogies in the previous works, we do a more in-depth study of this problem. We identify six representative criteria for comprehensive analogy evaluation and manually rate 50 analogies on the identified criteria. Further, we investigate the performance of GPT-4 on the task. We find that while the model performs as well as humans on rating Meaningfulness, it struggles with rating other fine-grained criteria, suggesting room for future research. Based on our qualitative and quantitative insights, we further refine the evaluation criteria to guide future research. Finally, leveraging our developed methods for analogy generation and extraction, we develop a human-machine collaborative system, Analego, that would allow a user to both search and generate analogies. It has 38.5k+ generated analogies about science and computer science concepts, and about 55k analogies extracted from the Common Crawl Web corpus. It also includes useful features for educational applications, such as, designing assignments to critique analogies, and providing feedback on analogies, which could also be used in future to tune the underlying models. Preliminary discussions with teachers and education researchers suggest the promise of Analego, and also provide important lessons for future improvements, such as, exercising caution while deploying it, given the limitations of underlying LLMs.
- Graduation Semester
- 2024-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/127229
- Copyright and License Information
- Copyright 2024 - Bhavya
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…