Withdraw
Loading…
Towards adaptive voice-controlled robots
Chang, Peixin
Loading…
Permalink
https://hdl.handle.net/2142/127269
Description
- Title
- Towards adaptive voice-controlled robots
- Author(s)
- Chang, Peixin
- Issue Date
- 2024-12-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Driggs-Campbell, Katherine
- Doctoral Committee Chair(s)
- Driggs-Campbell, Katherine
- Committee Member(s)
- Hockenmaier, Julia
- Chowdhary, Girish
- Gupta, Saurabh
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Robotics
- Representation Learning
- Continual Learning
- Reinforcement Learning
- Speech Recognition
- Abstract
- Voice-controlled robots offer a natural interface for non-experts to communicate with robots. Previous methods either rely on modular pipelines, which suffer from cascading errors and poor integration between components, or on end-to-end models that struggle with robustness and generalization, especially when applied to new tasks or environments. Both approaches often demand significant domain expertise and extensive manual tuning after deployment if failures or suboptimal behaviors occur, making them difficult for non-experts to update or adapt the system. These drawbacks undermine seamless human-robot collaboration and limit the adoption of voice-controlled robots in daily life. In this thesis, we address the challenge of enabling voice-controlled robots to adapt and improve after deployment with minimal supervision and assumptions, a challenge frequently overlooked in current research and development. To this end, we propose a novel two-stage pipeline. The first stage involves developing a Visual-Audio Representation (VAR), which unifies speech recognition, natural language understanding, and grounding modules, allowing the robot to ground multimodal inputs. The second stage employs a reinforcement learning policy that uses the learned embeddings and rewards from the VAR, enabling the robot to improve after the deployment. We also introduce Dif-VAR, a data-efficient version of the VAR that allows for intuitive fine-tuning by non-experts with significantly reduced labeling requirements. Our system has been rigorously evaluated using state-of-the-art sound datasets in both simulated and real-world environments, showing robust performance improvements in various navigation and manipulation tasks. The proposed approach allows for continual self-improvement of the robot after the deployment, with minimal data and human intervention, making it a scalable and adaptive solution for voice-controlled robots in everyday scenarios.
- Graduation Semester
- 2024-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/127269
- Copyright and License Information
- Copyright 2024 Peixin Chang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…