Genetic Chaos

Friday, May 04, 2007

Lactase Haplotype Diversity in the Old World

Lactase persistence, the genetic trait in which intestinal lactase activity persists at childhood levels into adulthood, varies in frequency in different human populations, being most frequent in northern Europeans and certain African and Arabian nomadic tribes, who have a history of drinking fresh milk. Selection is likely to have played an important role in establishing these different frequencies since the development of agricultural pastoralism ~9,000 years ago. We have previously shown that the element responsible for the lactase persistence/nonpersistence polymorphism in humans is cis-acting to the lactase gene and that lactase persistence is associated, in Europeans, with the most common 70-kb lactase haplotype, A. We report here a study of the 11-site haplotype in 1,338 chromosomes from 11 populations that differ in lactase persistence frequency. Our data show that haplotype diversity was generated both by point mutations and recombinations. The four globally common haplotypes (A, B, C, and U) are not closely related and have different distributions; the A haplotype is at high frequencies only in northern Europeans, where lactase persistence is common; and the U haplotype is virtually absent from Indo-European populations. Much more diversity is seen in sub-Saharan Africans than in non-Africans, consistent with an "Out of Africa" model for peopling of the Old World. Analysis of recent recombinant haplotypes by allele-specific PCR, along with deduction of the root haplotype from chimpanzee sequence, allowed construction of a haplotype network that assisted in evaluation of the relative roles of drift and selection in establishing the haplotype frequencies in the different populations. We suggest that genetic drift was important in shaping the general pattern of non-African haplotype diversity, with recent directional selection in northern Europeans for the haplotype associated with lactase persistence.

PDF file

Absence of the lactase-persistence-associated allele in early Neolithic Europeans

Lactase persistence (LP), the dominant Mendelian trait conferring the ability to digest the milk sugar lactose in adults, has risen to high frequency in central and northern Europeans in the last 20,000 years. This trait is likely to have conferred a selective advantage in individuals who consume appreciable amounts of unfermented milk. Some have argued for the "culture-historical hypothesis," whereby LP alleles were rare until the advent of dairying early in the Neolithic but then rose rapidly in frequency under natural selection. Others favor the "reverse cause hypothesis," whereby dairying was adopted in populations with preadaptive high LP allele frequencies. Analysis based on the conservation of lactase gene haplotypes indicates a recent origin and high selection coefficients for LP, although it has not been possible to say whether early Neolithic European populations were lactase persistent at appreciable frequencies. We developed a stepwise strategy for obtaining reliable nuclear ancient DNA from ancient skeletons, based on (i) the selection of skeletons from archaeological sites that showed excellent biomolecular preservation, (ii) obtaining highly reproducible human mitochondrial DNA sequences, and (iii) reliable short tandem repeat (STR) genotypes from the same specimens. By applying this experimental strategy, we have obtained high-confidence LP-associated genotypes from eight Neolithic and one Mesolithic human remains, using a range of strict criteria for ancient DNA work. We did not observe the allele most commonly associated with LP in Europeans, thus providing evidence for the culture-historical hypothesis, and indicating that LP was rare in early European farmers.

PDF file

Wednesday, May 02, 2007

Global variation in copy number in the human genome

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

PDF file

Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms

Recent studies have shown that when individuals are grouped on the basis of genetic similarity, group membership corresponds closely to continental origin. There has been considerable debate about the implications of these findings in the context of larger debates about race and the extent of genetic variation between groups. Some have argued that clustering according to continental origin demonstrates the existence of significant genetic differences between groups and that these differences may have important implications for differences in health and disease. Others argue that clustering according to continental origin requires the use of large amounts of genetic data or specifically chosen markers and is indicative only of very subtle genetic differences that are unlikely to have biomedical significance.

We used small numbers of randomly selected single nucleotide polymorphisms (SNPs) from the International HapMap Project to train naïve Bayes classifiers for prediction of ancestral continent of origin. Predictive accuracy was tested on two independent data sets. Genetically similar groups should be difficult to distinguish, especially if only a small number of genetic markers are used. The genetic differences between continentally defined groups are sufficiently large that one can accurately predict ancestral continent of origin using only a minute, randomly selected fraction of the genetic variation present in the human genome. Genotype data from only 50 random SNPs was sufficient to predict ancestral continent of origin in our primary test data set with an average accuracy of 95%. Genetic variations informative about ancestry were common and widely distributed throughout the genome.

Accurate characterization of ancestry is possible using small numbers of randomly selected SNPs. The results presented here show how investigators conducting genetic association studies can use small numbers of arbitrarily chosen SNPs to identify stratification in study subjects and avoid false positive genotype-phenotype associations. Our findings also demonstrate the extent of variation between continentally defined groups and argue strongly against the contention that genetic differences between groups are too small to have biomedical significance.

PDF file