Recent studies show that facial analysis technologies measured up to the capabilities of expert clinicians in syndrome identification6–9. Many syndromes have recognizable facial features² that are highly informative to clinical geneticists3–5. Syndromic genetic conditions, in aggregate, affect 8% of the population¹. It is feasible to use the SNP image graph for the classification of individual genomes. We have developed a human race prediction model with deep convolutional neural network. We then tried to use the SNP image graphs from chromosome 20, almost all CNN models failed to classify the human race category successfully, except the African samples. We further attempted to use SNP image graphs in reduced color representations or images generated by spiral shapes, which also provided good prediction accuracy. Misclassification was often observed between the American and European categories, which could attribute to the ancestral origins. F1 scores of the trained CNN models are 95 to 99%, and validation with additional separate 150 samples indicates a 95.8% accuracy of the CNN model. By using the residual network (ResNet 50) pipeline in CNN algorithm, we have successfully obtained classification models to classify the validation dataset. We first generated SNP image graphs of chromosome 22, which contained about one million SNPs. Five major human races were used for classification categories. We have processed the SNP information from more than 2,500 samples of 1000 genome project. In this preliminary study, we are exploring the deep convolutional neural network algorithm in genome-wide SNP images for the classification of human populations. In recent years, deep-learning based machine learning approaches have achieved great successes in many areas, especially image classifications. There are new challenges in utilizing such large numbers of nucleotide variants for polygenic disease studies. It is essential to utilize these massive nucleotide variations for the discovery of disease genes and human phenotypic traits. With the advancement of NGS platform, large numbers of human variations and SNPs are discovered in human genomes.
0 Comments
Leave a Reply. |