Long-form analogy extraction, generation, and evaluation with large language models

Bhavya, -

Long-form analogy extraction, generation, and evaluation with large language models

Bhavya, -

Permalink

https://hdl.handle.net/2142/127229

Description

Title

Long-form analogy extraction, generation, and evaluation with large language models

Author(s)

Bhavya, -

Issue Date

2024-11-27

Director of Research (if dissertation) or Advisor (if thesis)

Zhai, ChengXiang

Doctoral Committee Chair(s)

Zhai, ChengXiang

Committee Member(s)

Bhat, Suma
Ji, Heng
Xiong, Jinjun

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Long-form Analogies
Llm
Generation
Extraction
Evaluation
Chatgpt
Education
Nlp
Large Language Model
Natural Language Processing

Language

eng

Abstract

Analogies make connections between two seemingly disparate concepts. For example, “Bohr’s atomic model is like the solar system because the electrons revolve around the nucleus just like the planets revolve around the sun.” In this way, they map an unfamiliar concept (called target) to a more familiar one (called source) and explain the connections or similarity between them. Analogies are highly useful for explaining and illustrating educational concepts, problem solving, inspiring creativity and scientific innovation. Thus, many users, including teachers, students, and writers could benefit from having easy access to suitable analogies. Although analogy identification, and metaphor and simile detection and generation have been studied for a long time in Natural Language Processing, extraction and generation of long-form analogies, i.e., analogies in natural language that are typically a few sentences-long and have a detailed description explaining the similarity between concepts, has not been explored well, which is the focus of this thesis. Although there exist many analogies scattered on the web today, there is very limited work done toward building a dedicated search engine for automatically finding all relevant analogies from Web pages. Moreover, in the absence of suitable analogies on the web, the ability to automatically generate new analogies is also essential, for example, to explain emerging concepts and for inspiring creativity. Finally, in both searching and generating analogies, there is a need to assess to quality and relevance of analogies to identify the suitable ones. Since human evaluation is not always feasible, there is a need to develop automatic metrics for evaluating analogies. In this thesis, we tackle these three challenges, namely, how to extract, generate, and evaluate long-form analogies to enable practical applications, such as, explanation and illustration of concepts in education and creative writing. Firstly, we propose and study a new task, called Analogy Detection and Extraction (AnaDE), which includes three synergistic sub-tasks: (1) detecting documents containing analogies (AnaDet), (2) extracting text segments that make up the analogy (AnaSE), and (3) identifying the concepts being compared (AnaConE). To study these tasks, we scrape a dataset of 3.6k analogies and benchmark the performance of SOTA models. We find that the best model achieves a high performance (94% F1) on AnaDet, suggesting that it could be used for building a large repository of Web pages with analogies. On the other hand, AnaSE and AnaConE are quite challenging for current models (63.82 and 89.08 best F1 respectively) and thus, there is ample opportunity for future research. Secondly, we study how to generate an analogy about a topic by leveraging large language models (LLMs). Specifically, we propose and study the following two tasks: generating an explanation of the similarity between a given pair of target concept and source concept (aka Analogous Explanation Generation or AEG), and generating both a source concept analogous to a given target concept along with an explanation of similarity (aka Analogous Concept Generation or ACG). To study these tasks, we leverage InstructGPT to generate analogies about 108 curated science concepts. Based on manual rating of generated analogies on Amazon Mechanical Turk, we find that the model achieves human-level performance on ACG but struggles on AEG. Our analysis also sheds light on the brittle nature of this model. Building upon this work, we propose and study the Creative Analogy Mining (CAM) framework to mine creative analogies, which consists of the following three stages: LLM-based analogy generation; evaluation of generated analogies based on three criteria (Analogical Style, Meaningfulness, and Novelty) by suitably designed scoring methods; and their iterative refinement by suitably re-prompting the model. Based on a manual evaluation on Amazon Mechanical Turk, we find that CAM can mine 13.7% highly creative analogies. Further, we study how to generate another kind of analogies for education, namely, grade-appropriate analogies. To this end, we introduce ELIx, a novel framework for fine-tuning LLMs with self-generated data, which is iteratively generated based on automatic grade-calibration feedback from readability assessment measures. Our experimental results show that ELIx can help generate more analogies that are both meaningful and grade-appropriate, particularly for lower grade bands, where LLMs struggle the most. After a preliminary exploration of automatic evaluation of generated analogies in the previous works, we do a more in-depth study of this problem. We identify six representative criteria for comprehensive analogy evaluation and manually rate 50 analogies on the identified criteria. Further, we investigate the performance of GPT-4 on the task. We find that while the model performs as well as humans on rating Meaningfulness, it struggles with rating other fine-grained criteria, suggesting room for future research. Based on our qualitative and quantitative insights, we further refine the evaluation criteria to guide future research. Finally, leveraging our developed methods for analogy generation and extraction, we develop a human-machine collaborative system, Analego, that would allow a user to both search and generate analogies. It has 38.5k+ generated analogies about science and computer science concepts, and about 55k analogies extracted from the Common Crawl Web corpus. It also includes useful features for educational applications, such as, designing assignments to critique analogies, and providing feedback on analogies, which could also be used in future to tune the underlying models. Preliminary discussions with teachers and education researchers suggest the promise of Analego, and also provide important lessons for future improvements, such as, exercising caution while deploying it, given the limitations of underlying LLMs.

Graduation Semester

2024-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/127229

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Long-form analogy extraction, generation, and evaluation with large language models

Bhavya, -

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In