Withdraw
Loading…
MojoFrame: dataframe library in Mojo language
Huang, Arthur
Loading…
Permalink
https://hdl.handle.net/2142/129945
Description
- Title
- MojoFrame: dataframe library in Mojo language
- Author(s)
- Huang, Arthur
- Issue Date
- 2025-07-18
- Director of Research (if dissertation) or Advisor (if thesis)
- Park, Yongjoo
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Mlir
- Jit Compilation
- Systems For Data Science
- Systems For Machine Learning
- Dataframe
- Database Management Systems
- Language
- eng
- Abstract
- Mojo is an emerging programming language built on MLIR (Multi-Level Intermediate Representation) and JIT compilation. It enables transparent optimizations with respect to the underlying hardware (e.g., CPUs, GPUs), while allowing users to express their logic using Python-like user-friendly syntax. Mojo has been shown to offer great performance in tensor operations; however, its performance has not been tested for relational operations (e.g., filtering, join, and group-by), which are common in data science workflows. To date, no dataframe implementation exists in the Mojo ecosystem. In this work, we introduce the first Mojo-native dataframe library, called MojoFrame, that supports core relational operations and user-defined functions (UDFs). MojoFrame is built on top of Mojo’s tensor to achieve fast operations on numeric columns, while utilizing a cardinality-aware approach to effectively integrate non-numeric columns for flexible data representation. To achieve high efficiency, MojoFrame takes significantly different approaches than existing libraries. MojoFrame supports all operations for TPC-H queries, and achieves up to 2.97× speedup versus existing dataframe libraries in other programming languages. Nevertheless, there remain optimization opportunities for MojoFrame (and the Mojo language), particularly in data loading and dictionary operations.
- Graduation Semester
- 2025-08
- Type of Resource
- Text
- Handle URL
- https://hdl.handle.net/2142/129945
- Copyright and License Information
- Copyright 2025 Arthur Huang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Siebel School of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…