Withdraw
Loading…
Applying generative adversarial networks to generate artificial genotype data in livestock
Caballero Vargas, Edgar Giesus
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/130182
Description
- Title
- Applying generative adversarial networks to generate artificial genotype data in livestock
- Author(s)
- Caballero Vargas, Edgar Giesus
- Issue Date
- 2025-07-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Bresolin, Tiago
- Committee Member(s)
- Wheeler, Matthew B
- Roca, Alfred L
- Department of Study
- Animal Sciences
- Discipline
- Bioinformatics
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- generative artificial intelligence models
- synthetic genotypes
- SNP
- Abstract
- Genomics studies in livestock remain limited due to financial, ethical, or privacy considerations that restrict collaboration and data accessibility. To address these challenges, generative artificial intelligence models, such as Generative Adversarial Network (GAN), are being applied to generate biologically plausible synthetic data without compromising these standards. In this study, we trained a Principal Component (PC) Wasserstein GAN (PC-WGAN) with a gradient penalty to synthesize Single Nucleotide Polymorphism (SNP) in the principal component space. We used a simulated genotype dataset from 4,800 individuals and 37,540 SNP from chromosome 1, and retained 796 PC, which explained 90% of the total variance. These PC scores were used to generate synthetic PC scores and inverse transformed to SNP data. The validity of synthetic SNP data was assessed using both quantitative and visual approaches. Quantitatively, the Pearson correlation between the linkage disequilibrium (LD) values and minor allele frequency (MAF) distributions of the real and synthetic SNP datasets was calculated for comparison. Visually, the PC plots, LD decay curves, and MAF histograms were inspected to compare the distributions and structural patterns of the generated and real data. At 150 training epochs, the model effectively captured the major features of the real population, producing synthetic genotypes that overlapped with real genotypes in PC space and preserved long-range LD patterns and MAF distributions. While synthetic genotypes cannot replace real genomic data, our findings demonstrate that PC-WGAN produces biologically plausible artificial genotypes, offering a promising approach for future work in data augmentation, model benchmarking, and privacy-preserving livestock genomic research.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/130182
- Copyright and License Information
- Copyright 2025 Edgar Caballero Vargas
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…