Withdraw
Loading…
Prediction of moisture and protein in corn kernels from multiple origins based on NIR-PLSR with gradient boosting machines for feature selection
Zheng, Runyu
Loading…
Permalink
https://hdl.handle.net/2142/124601
Description
- Title
- Prediction of moisture and protein in corn kernels from multiple origins based on NIR-PLSR with gradient boosting machines for feature selection
- Author(s)
- Zheng, Runyu
- Issue Date
- 2024-05-02
- Director of Research (if dissertation) or Advisor (if thesis)
- Kamruzzaman, Mohammed
- Committee Member(s)
- Allen, Cody M.
- Rausch, Kent D.
- Singh, Vijay
- Department of Study
- Engineering Administration
- Discipline
- Agricultural & Biological Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Gradient Boosting Machine (gbm)
- Feature Selection
- Shapley Additive Explanations (shap)
- Partial Least Squares Regression (plsr)
- Corn Kernels
- Near-infrared (nir) Spectroscopy
- Component Prediction.
- Language
- eng
- Abstract
- Differences in moisture levels and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models to do so are typically trained on samples collected from only a few environments which can lead to underestimation of both the error rates and bias of models. In this study, corn samples grown across an internationally diverse set of environments were assembled. NIR spectroscopy with chemometrics and partial least squares regression (PLSR) was used to determine moisture and protein of this international panel of corn grain samples. The potential of five feature selection methods to improve prediction accuracy by extracting sensitive wavelengths for moisture and protein in corn kernels was assessed. SHapley Additive exPlanations (SHAP) values were used to measure the impact of each feature/wavelength on the model prediction. Gradient boosting machines (GBMs), specifically CatBoost and LightGBM, were effective in selecting crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, and 2174 nm) and protein (887, 1212, 1705, 1891, 2097, and 2456 nm), producing PLSR models with coefficients of determination of validation (R2V) of 0.97 and 0.82, root mean square errors of validation (RMSEV) of 0.45% and 0.51%, and ratios of performance to deviation of validation (RPDV) of 6.20 and 2.41, for kernel protein and kernel moisture content, respectively. SHAP plots revealed the significant contribution of 2174 nm to moisture prediction and 1891 nm to protein prediction as well as their respective influence tendencies. These results illustrate the effectiveness of GBMs in NIR spectroscopy in feature engineering for predicting chemical components in the agriculture and food sectors, including developing a multi-country global calibration model for moisture and protein in corn kernels.
- Graduation Semester
- 2024-05
- Type of Resource
- Text
- Handle URL
- https://hdl.handle.net/2142/124601
- Copyright and License Information
- Copyright 2024 Runyu Zheng
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…