Generalizing yield prediction approaches and evaluating the common factors influencing the models

Zhang, Xiaoyu

Generalizing yield prediction approaches and evaluating the common factors influencing the models

Zhang, Xiaoyu

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/132685

Description

Title

Generalizing yield prediction approaches and evaluating the common factors influencing the models

Author(s)

Zhang, Xiaoyu

Issue Date

2025-12-12

Director of Research (if dissertation) or Advisor (if thesis)

Shajahan, Sunoj

Committee Member(s)

Martin, Nicolas Federico
Alves de OIiveira, Luciano

Department of Study

Engineering Administration

Discipline

Agricultural & Biological Engr

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Yield prediction
Remote sensing
Machine learning
Vegetation indices
Yield monitor

Language

eng

Abstract

Accurate crop yield prediction is one of the key areas in precision agriculture and it has been explored since the 1970s. This study integrates insights from three chapters, including a literature review and two experimental studies. These are used to evaluate how spatial resolution, the proximity of training data, ground truth data error, and model methods affect prediction accuracy when using remotely sensed satellite imagery and a machine learning approach. By integrating time-series satellite imagery, this study has addressed practical and theoretical gaps in predicting corn yield at the sub-field level. A comprehensive literature review outlines the development of prediction techniques, ranging from linear models to advanced machine learning and deep learning frameworks. This review presents multiple data sources and identifies key predictors, including vegetation indices (NDVI, EVI2, GCVI), weather variables, and soil data. It also highlights the advantages of methods such as random forests and the increasing success of neural networks in modeling complex spatial and temporal patterns. Several studies have highlighted the importance of data cleaning, while others have shown issues related to unclear or inconsistent terminology. The study explored the influence of the relationship between the training and test datasets on the model. This study used Sentinel-2 images from multiple fields in Illinois to test how well the model predicts across different spatial and temporal conditions. We evaluated predictions for nearby fields, distant fields, and fields from the same year. The results show that close-range and same-year predictions produce error levels similar to using the full training dataset, which required significantly less data. We applied spatial smoothing, which further improved the model's accuracy by 0.5% to 10.9%. Another key focus of the research is the impact of satellite spatial resolution and yield monitor flow delay correction on prediction accuracy. Images from three platforms, including Planet (3 m/pixel), Sentinel-2 (10 m/pixel), and Landsat-8 (30 m/pixel), were evaluated using a random forest model. The results show that the higher-resolution Planet images did not achieve lower RMSE than the coarser-resolution datasets. This may be caused by increased noise in the imagery or overfitting. The green chlorophyll vegetation index (GCVI) consistently performed better than the normalized difference vegetation index (NDVI), especially during the dense canopy stage. In addition, Improper correction of the yield monitor time delay led to spatial distortion in model predictions. Applying delay correction based on the optimal time shift greatly improved prediction accuracy across all satellite platforms. Overall, this thesis shows that selecting appropriate training data, correcting yield monitor delay, and understanding the relationship between training and prediction field locations can substantially improve sub-field-scale yield prediction. These contributions advance remote sensing-based yield modeling and form a practical basis for future improvements. With this, the thesis successfully addresses its main goal of generalizing yield prediction frameworks and assessing the key factors that influence model outcomes.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132685

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Generalizing yield prediction approaches and evaluating the common factors influencing the models

Zhang, Xiaoyu

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Agricultural and Biological Engineering

Log In