Genetic Chaos

Wednesday, March 23, 2005

Genetic Structure of Human Populations

We studied human population structure using genotypes at 377 autosomal microsatellite loci in 1056 individuals from 52 populations. Within-population differences among individuals account for 93 to 95% of genetic variation; differences among major groups constitute only 3 to 5%. Nevertheless, without using prior information about the origins of individuals, we identified six main genetic clusters, five of which correspond to major geographic regions, and subclusters that often correspond to individual populations. General agreement of genetic and predefined populations suggests that self-reported ancestry can facilitate assessments of epidemiological risks but does not obviate the need to use genetic information in genetic association studies.

PDF file

Supplementary information

DNA Polymorphism in a Worldwide Sample of Human X Chromosomes 

DNA sequence data from humans can provide insight into the history of modern humans and the genetic variability in human populations. We report here a study of human DNA sequence variation at an X-linked noncoding region of 10,346 bp. The sample consists of 62 X chromosomes from Africa, Europe, and Asia. Forty-four polymorphic sites were found among the 62 sequences, resulting in 23 different haplotypes. Statistical analyses of the data led to the following inferences. (1) There is strong evidence of human population expansion in the relatively recent past, and this population expansion has had a significant effect on the pattern of polymorphism at this locus. (2) Non-African populations were unlikely to have been derived from a very small number of African lineages. (3) There was considerable geographic subdivision in the ancient human population, which could be an important reason why many studies failed to detect population expansion. (4) The long-term effective population size of humans is between 12,000 and 15,000. And (5) a non-African specific variant was found at a frequency of 35% in non-Africans, an estimate supported by the genotyping of additional 80 non-African and 106 African X chromosomes. This variant could have arisen in Eurasia more than 140,000 years ago, predating the emergence of modern humans. Moreover, this haplotype and all other haplotypes coalesced to the most recent common ancestor of the sample, which was estimated to be older than 490,000 years. Therefore, this region may have a long history in Eurasia.

PDF file

X chromosome evidence for ancient human histories

Diverse African and non-African samples of the X-linked PDHA1 (pyruvate dehydrogenase E1 alpha subunit) locus revealed a fixed DNA sequence difference between the two sample groups. The age of onset of population subdivision appears to be about 200 thousand years ago. This predates the earliest modern human fossils, suggesting the transformation to modern humans occurred in a subdivided population. The base of the PDHA1 gene tree is relatively ancient, with an estimated age of 1.86 million years, a late Pliocene time associated with early species of Homo. PDHA1 revealed very low variation among non-Africans, but in other respects the data are consistent with reports from other X-linked and autosomal haplotype data sets. Like these other genes, but in conflict with microsatellite and mitochondrial data, PDHA1 does not show evidence of human population expansion.

PDF file

Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies

We have analyzed genetic data for 326 microsatellite markers that were typed uniformly in a large multiethnic population-based sample of individuals as part of a study of the genetics of hypertension (Family Blood Pressure Program). Subjects identified themselves as belonging to one of four major racial/ethnic groups (white, African American, East Asian, and Hispanic) and were recruited from 15 different geographic locales within the United States and Taiwan. Genetic cluster analysis of the microsatellite markers produced four major clusters, which showed near-perfect correspondence with the four self-reported race/ethnicity categories. Of 3,636 subjects of varying race/ethnicity, only 5 (0.14%) showed genetic cluster membership different from their self-identified race/ethnicity. On the other hand, we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity — as opposed to current residence — is the major determinant of genetic structure in the U.S. population. Implications of this genetic structure for case-control association studies are discussed.

PDF file

X-chromosome as a marker for population history: linkage disequilibrium and haplotype study in Eurasian populations

Linkage disequilibrium (LD) structure is still unpredictable because the interplay of regional recombination rate and demographic history is poorly understood. We have compared the distribution of LD across two genomic regions differing in crossing-over activity – Xq13 (0.166 cM/Mb) and Xp22 (1.3 cM/Mb) – in 15 Eurasian populations. Demographic events predicted to increase the LD level – genetic drift, bottleneck and admixture – had a very strong impact on extent and patterns of regional LD across Xq13 compared to Xp22. The haplotype distribution of the DXS1225–DXS8082 microsatellites from Xq13 exhibiting strong association in all populations was remarkably influenced by population history. European populations shared one common haplotype with a frequency of 25–40%. The Volga-Ural populations studied, living at the geographic borderline of Europe, showed elevated LD as well as harboring a significant fraction of haplotypes originating from East Asia, thus reflecting their past migrations and admixture. In the young Kuusamo isolate from Finland, a bottleneck has led to allelic associations between loci and shifted the haplotype distribution, but has much less affected single microsatellite allele frequencies compared to the main Finnish population. The data show that the footprint of a demographic event is longer preserved in haplotype distribution within a region of low crossing-over rate, than in the information content of a single marker, or between actively recombining markers. As the knowledge of LD patterns is often chosen to assist association mapping of common disease, our conclusions emphasize the importance of understanding the history, structure and variation of a study population.

PDF file