Towards adaptive voice-controlled robots

Chang, Peixin

Towards adaptive voice-controlled robots

Chang, Peixin

Permalink

https://hdl.handle.net/2142/127269

Description

Title

Towards adaptive voice-controlled robots

Author(s)

Chang, Peixin

Issue Date

2024-12-05

Director of Research (if dissertation) or Advisor (if thesis)

Driggs-Campbell, Katherine

Doctoral Committee Chair(s)

Driggs-Campbell, Katherine

Committee Member(s)

Hockenmaier, Julia
Chowdhary, Girish
Gupta, Saurabh

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Robotics
Representation Learning
Continual Learning
Reinforcement Learning
Speech Recognition

Abstract

Voice-controlled robots offer a natural interface for non-experts to communicate with robots. Previous methods either rely on modular pipelines, which suffer from cascading errors and poor integration between components, or on end-to-end models that struggle with robustness and generalization, especially when applied to new tasks or environments. Both approaches often demand significant domain expertise and extensive manual tuning after deployment if failures or suboptimal behaviors occur, making them difficult for non-experts to update or adapt the system. These drawbacks undermine seamless human-robot collaboration and limit the adoption of voice-controlled robots in daily life. In this thesis, we address the challenge of enabling voice-controlled robots to adapt and improve after deployment with minimal supervision and assumptions, a challenge frequently overlooked in current research and development. To this end, we propose a novel two-stage pipeline. The first stage involves developing a Visual-Audio Representation (VAR), which unifies speech recognition, natural language understanding, and grounding modules, allowing the robot to ground multimodal inputs. The second stage employs a reinforcement learning policy that uses the learned embeddings and rewards from the VAR, enabling the robot to improve after the deployment. We also introduce Dif-VAR, a data-efficient version of the VAR that allows for intuitive fine-tuning by non-experts with significantly reduced labeling requirements. Our system has been rigorously evaluated using state-of-the-art sound datasets in both simulated and real-world environments, showing robust performance improvements in various navigation and manipulation tasks. The proposed approach allows for continual self-improvement of the robot after the deployment, with minimal data and human intervention, making it a scalable and adaptive solution for voice-controlled robots in everyday scenarios.

Graduation Semester

2024-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/127269

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Towards adaptive voice-controlled robots

Chang, Peixin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In