Generalizing yield prediction approaches and evaluating the common factors influencing the models
Zhang, Xiaoyu
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/132685
Description
Title
Generalizing yield prediction approaches and evaluating the common factors influencing the models
Author(s)
Zhang, Xiaoyu
Issue Date
2025-12-12
Director of Research (if dissertation) or Advisor (if thesis)
Shajahan, Sunoj
Committee Member(s)
Martin, Nicolas Federico
Alves de OIiveira, Luciano
Department of Study
Engineering Administration
Discipline
Agricultural & Biological Engr
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Yield prediction
Remote sensing
Machine learning
Vegetation indices
Yield monitor
Language
eng
Abstract
Accurate crop yield prediction is one of the key areas in precision agriculture and it has been explored since the 1970s. This study integrates insights from three chapters, including a literature review and two experimental studies. These are used to evaluate how spatial resolution, the proximity of training data, ground truth data error, and model methods affect prediction accuracy when using remotely sensed satellite imagery and a machine learning approach. By integrating time-series satellite imagery, this study has addressed practical and theoretical gaps in predicting corn yield at the sub-field level.
A comprehensive literature review outlines the development of prediction techniques, ranging from linear models to advanced machine learning and deep learning frameworks. This review presents multiple data sources and identifies key predictors, including vegetation indices (NDVI, EVI2, GCVI), weather variables, and soil data. It also highlights the advantages of methods such as random forests and the increasing success of neural networks in modeling complex spatial and temporal patterns. Several studies have highlighted the importance of data cleaning, while others have shown issues related to unclear or inconsistent terminology.
The study explored the influence of the relationship between the training and test datasets on the model. This study used Sentinel-2 images from multiple fields in Illinois to test how well the model predicts across different spatial and temporal conditions. We evaluated predictions for nearby fields, distant fields, and fields from the same year. The results show that close-range and same-year predictions produce error levels similar to using the full training dataset, which required significantly less data. We applied spatial smoothing, which further improved the model's accuracy by 0.5% to 10.9%.
Another key focus of the research is the impact of satellite spatial resolution and yield monitor flow delay correction on prediction accuracy. Images from three platforms, including Planet (3 m/pixel), Sentinel-2 (10 m/pixel), and Landsat-8 (30 m/pixel), were evaluated using a random forest model. The results show that the higher-resolution Planet images did not achieve lower RMSE than the coarser-resolution datasets. This may be caused by increased noise in the imagery or overfitting. The green chlorophyll vegetation index (GCVI) consistently performed better than the normalized difference vegetation index (NDVI), especially during the dense canopy stage. In addition, Improper correction of the yield monitor time delay led to spatial distortion in model predictions. Applying delay correction based on the optimal time shift greatly improved prediction accuracy across all satellite platforms.
Overall, this thesis shows that selecting appropriate training data, correcting yield monitor delay, and understanding the relationship between training and prediction field locations can substantially improve sub-field-scale yield prediction. These contributions advance remote sensing-based yield modeling and form a practical basis for future improvements. With this, the thesis successfully addresses its main goal of generalizing yield prediction frameworks and assessing the key factors that influence model outcomes.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.