Feature Review

Genomic Prediction of Yield and Protein Traits in Soybean Using Machine Learning Models  

Xingde Wang , Tianxia Guo
Institute of Life Sciences, Jiyang College, Zhejiang A&F University, Zhuji, 311800, Zhejiang, China
Author    Correspondence author
Legume Genomics and Genetics, 2025, Vol. 16, No. 2   
Received: 20 Feb., 2025    Accepted: 06 Apr., 2025    Published: 27 Apr., 2025
© 2025 BioPublisher Publishing Platform
This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract

As a globally significant food and plant protein crop, the yield and protein content of soybeans are the core target traits in breeding. However, due to the influence of the interaction between the genetic background and environment of complex quantitative traits, the efficiency of traditional phenotypic selection and genetic improvement is limited. To enhance breeding efficiency and prediction accuracy, this study explored the applicability and effectiveness of multiple machine learning algorithms in the genomic prediction of soybean yield and protein traits. Based on the genotype (SNP) and phenotypic data of multiple soybean breeding populations in this study, machine learning models such as RR-BLUP, Support vector Machine (SVM), Random Forest (RF), Gradient enhancer (GBM), and Deep neural Network (DNN) were respectively constructed. Combined with feature selection methods such as principal Component Analysis (PCA), LASSO and Boruta, the prediction accuracy and stability of the model are systematically evaluated. The results show that nonlinear models (such as RF and GBM) have better generalization ability for complex traits under multiple environmental conditions. The multi-trait joint prediction strategy further enhanced the model's performance in composite indicators such as protein yield. This study demonstrates the potential of machine learning techniques in the genomic prediction of complex quantitative traits, providing an efficient means for auxiliary selection in soybean breeding and laying the foundation for the construction of intelligent and high-throughput breeding decision-making systems.

Keywords
Soybeans; Genomic prediction; Machine learning; Yield traits; Protein content
[Full-Flipping PDF] [Full-Text HTML]
Legume Genomics and Genetics
• Volume 16
View Options
. PDF
. FPDF(win)
. FPDF(mac)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Xingde Wang
. Tianxia Guo
Related articles
. Soybeans
. Genomic prediction
. Machine learning
. Yield traits
. Protein content
Tools
. Post a comment