Genetic Chaos

Monday, September 27, 2004

Human genetic diversity: Lewontin’s fallacy

In popular articles that play down the genetical differences among human populations, it is often stated that about 85% of the total genetical variation is due to individual differences within populations and only 15% to differences between populations or ethnic groups. It has therefore been proposed that the division of Homo sapiens into these groups is not justified by the genetic data. This conclusion, due to R.C. Lewontin in 1972, is unwarranted because the argument ignores the fact that most of the information that distinguishes populations is hidden in the correlation structure of the data and not simply in the variation of the individual factors. The underlying logic, which was discussed in the early years of the last century, is here discussed using a simple genetical example.

PDF file

Is the Concept of Race Illegitimate?

PDF file

The many species of humanity

Naming new human species may seem to be a harmless endeavor, of little interest to all but a few specialists playing out the consequences of different evolutionary explanations of phyletic variation, but it has significant implications in how humanity is viewed because studies of race and human evolution are inexorably linked. When essentialist approaches are used to interpret variation in the past as taxonomic rather than populational, as increasingly has been the case, it serves to underscore a typological view of modern human variation. In terms of how they are treated in analysis, there often seems to be no difference between the species, subspecies, or paleodemes of the past and the populations or races whose interrelationships and demographic history are discussed today. This is not inconsequential because both history and current practice shows that science, especially anthropology, is not isolated from society.

PDF file

Deconstructing the relationship between genetics and race

The success of many strategies for finding genetic variants that underlie complex traits depends on how genetic variation is distributed among human populations. This realization has intensified the investigation of genetic differences among groups, which are often defined by commonly used racial labels. Some scientists argue that race is an adequate proxy of ancestry, whereas others claim that race belies how genetic variation is apportioned. Resolving this controversy depends on understanding the complicated relationship between race, ancestry and the demographic history of humans. Recent discoveries are helping us to deconstruct this relationship, and provide better guidance to scientists and policy makers.

PDF file

Evidence for Gradients of Human Genetic Diversity Within and Among Continents

Genetic variation in humans is sometimes described as being discontinuous among continents or among groups of individuals, and by some this has been interpreted as genetic support for "races." A recent study in which >350 microsatellites were studied in a global sample of humans showed that they could be grouped according to their continental origin, and this was widely interpreted as evidence for a discrete distribution of human genetic diversity. Here, we investigate how study design can influence such conclusions. Our results show that when individuals are sampled homogeneously from around the globe, the pattern seen is one of gradients of allele frequencies that extend over the entire world, rather than discrete clusters. Therefore, there is no reason to assume that major genetic discontinuities exist between different continents or "races."

PDF file

Thursday, September 09, 2004

A recent shift from polygyny to monogamy in humans is suggested by the analysis of worldwide Y-chromosome diversity

Molecular genetic data contain information on the history of populations. Evidence of prehistoric demographic expansions has been detected in the mitochondrial diversity of most human populations and in a Y-chromosome STR analysis, but not in a previous study of 11 Y-chromosome SNPs in Europeans. In this paper, we show that mismatch distributions and tests of mutation/drift equilibrium based on up to 166 Y-chromosome SNPs, in 46 samples from all continents, also fail to support an increase of the male effective population size. Computer simulations show that the low nuclear versus mitochondrial mutation rates cannot explain these results. However, ascertainment bias, i.e., when only highly variable SNP sites are typed, may be concealing any Y SNPs evidence for a recent, but not an ancient, increase in male effective population sizes. The results of our SNP analyses can be reconciled with the expansion of male effective population sizes inferred from STR loci, and with mitochondrial evidence, by admitting that humans were essentially polygynous during much of their history. As a consequence, until recently only a few men may have contributed a large fraction of the Y-chromosome pool at every generation. The number of breeding males may have increased, and the variance of their reproductive success may have decreased, through a recent shift from polygyny to monogamy, which is supported by ethnological data and possibly accompanied the shift from mobile to sedentary communities.

PDF file

Estimating the strength of sexual selection from Y-chromosome and mitochondrial DNA diversity

We show that a sex difference in the opportunity for selection results in sex differences in the strength of random genetic drift and thus creates different patterns of genetic diversity for maternally and paternally inherited haploid genes. We derive the effective population size Ne for a male-limited or female-limited haploid gene in terms of I, the "opportunity for selection" or the variance in relative fitness. Because the variance in relative fitness of males can be an order of magnitude larger than that of females, the Ne is much smaller for males than it is for females. We derive both nonequilibrium and equilibrium expressions for F(ST) in terms of I and show how the portion of I owing to sexual selection, Imates, that is, the variation among males in mate numbers, is a simple function of the F's for cytoplasmic (female inherited) and Y-linked (male inherited) genes. Because multiple, transgenerational data are lacking to apply the nonequilibrium expression, we apply only the equilibrium model to published data on Y chromosome and mitochondrial sequence divergence in Homo sapiens to quantify the opportunity for sexual selection. The estimate suggests that sexual selection in humans represents a minimum of 54.8% of total selection, supporting Darwin's proposal that sexual selection has played a significant role in human evolution and the recent proposal regarding a shift from polygamy to monogamy in humans.

PDF file

Sexual selection favors female-biased sex ratios: the balance between the opposing forces of sex-ratio selection and sexual selection

In a verbal model, Trivers and Willard proposed that, whenever there is sexual selection among males, natural selection should favor mothers that produce sons when in good condition but daughters when in poor condition. The predictions of this model have been the subject of recent debate. We present an explicit population genetic model for the evolution of a maternal-effect gene that biases offspring sex ratio. We show that, like local mate competition, sexual selection favors female-biased sex ratios whenever maternal condition affects the reproductive competitive ability of sons. However, Fisherian sex-ratio selection, which favors a balanced sex ratio, is an opposing force. We show that the evolution of maternal sex-ratio biasing by these opposing selection forces requires a positive covariance across environments between the sex-ratio bias toward sons (b) and the mating success of sons (r). This covariance alone is not a sufficient condition for the evolution of maternal sex-ratio biasing; it must be sufficiently positive to outweigh the opposing sex-ratio selection. To identify the necessary and sufficient conditions, we partition total evolutionary change into three components: (1) maternal sex-ratio bias, (2) sexual selection on sons, and (3) sex-ratio selection. Because the magnitude of the first component asymmetrically affects the strength of the second, biasing broods toward females in a poor environment evolves faster than the same degree of bias toward males in a good environment. Consequently, female-biased sex ratios, rather than male-biased sex ratios, are more likely to evolve. We discuss our findings in the context of the primary sex-ratio biases observed in strongly sexually selected species and indicate how this perspective can assist the experimental study of sex ratio evolution.

PDF file

Genetic Evidence for Unequal Effective Population Sizes of Human Females and Males

The time to the most recent common ancestor (TMRCA) of the human mitochondria (mtDNA) is estimated to be older than that of the non-recombining portion of the Y chromosome (NRY). Surveys of variation in globally distributed humans typically result in mtDNA TMRCA values just under 200 thousand years (kya) while those for the NRY range between 46 and 110 kya. A favored hypothesis for this finding is that natural selection has acted on the NRY leading to a recent selective sweep. An alternate hypothesis is that sex-biased demographic processes are responsible. Here we re-examine the disparity between NRY and mtDNA TMRCAs using data collected from individual human populations — a sampling strategy that minimizes the confounding influence of population subdivision in global datasets. We survey variation at 782 bp of the mitochondrial cytochrome c oxidase subunit 3 gene as well as at 26.5 kb of non-coding DNA from the NRY in a sample of 25 Khoisan, 24 Mongolians, and 24 Papua New Guineans. Data from both loci in all populations are best described by a model of constant population size, with the exception of Mongolian mtDNA which appears to be experiencing rapid population growth. Taking these demographic models into account, we estimate the TMRCAs for each locus in each population. A pattern that is remarkably consistent across all three populations is an approximately two-fold deeper coalescence for mtDNA than for the NRY. The oldest TMRCAs are observed for the Khoisan (73.6 kya for the NRY and 176.5 kya for mtDNA) while those in the non-African populations are consistently lower (averaging 47.7 kya for the NRY and 92.8 kya for mtDNA). Our data do not suggest that differential natural selection is the cause of this difference in TMRCAs. Rather, these results are most consistent with a higher female effective population size.

PDF file

Reduced Y-Chromosome, but Not Mitochondrial DNA, Diversity in Human Populations from West New Guinea

To investigate the paternal population history of New Guinea, 183 individuals from 11 regional populations of West New Guinea (WNG) and 131 individuals from Papua New Guinea (PNG) were analyzed at 26 binary markers and seven short-tandem-repeat loci from the nonrecombining part of the human Y chromosome and were compared with 14 populations of eastern and southeastern Asia, Polynesia, and Australia. Y-chromosomal diversity was low in WNG compared with PNG and with most other populations from Asia/Oceania; a single haplogroup (M-M4) accounts for 75% of WNG Y chromosomes, and many WNG populations have just one Y haplogroup. Four Y-chromosomal lineages (haplogroups M-M4, C-M208, C-M38, and K-M230) account for 94% of WNG Y chromosomes and 78% of all Melanesian Y chromosomes and were identified to have most likely arisen in Melanesia. Haplogroup C-M208, which in WNG is restricted to the Dani and Lani, two linguistically closely related populations from the central and western highlands of WNG, was identified as the major Polynesian Y-chromosome lineage. A network analysis of associated Y-chromosomal short-tandem-repeat haplotypes suggests two distinct population expansions involving C-M208—one in New Guinea and one in Polynesia. The observed low levels of Y-chromosome diversity in WNG contrast with high levels of mtDNA diversity reported for the same populations. This most likely reflects extreme patrilocality and/or biased male reproductive success (polygyny). Our data further provide evidence for primarily female-mediated gene flow within the highlands of New Guinea but primarily male-mediated gene flow between highland and lowland/coastal regions.

PDF file

Friday, September 03, 2004

Independent Origins of Indian Caste and Tribal Paternal Lineages

The origins of the nearly one billion people inhabiting the Indian subcontinent and following the customs of the Hindu caste system are controversial: are they largely derived from Indian local populations (i.e. tribal groups) or from recent immigrants to India? Archaeological and linguistic evidence support the latter hypothesis, whereas recent genetic data seem to favor the former hypothesis. Here, we analyze the most extensive dataset of Indian caste and tribal Y chromosomes to date. We find that caste and tribal groups differ significantly in their haplogroup frequency distributions; caste groups are homogeneous for Y chromosome variation and more closely related to each other and to central Asian groups than to Indian tribal or any other Eurasian groups. We conclude that paternal lineages of Indian caste groups are primarily descended from Indo-European speakers who migrated from central Asia ~3,500 years ago. Conversely, paternal lineages of tribal groups are predominantly derived from the original Indian gene pool. We also provide evidence for bidirectional male gene flow between caste and tribal groups. In comparison, caste and tribal groups are homogeneous with respect to mitochondrial DNA variation, which may reflect the sociocultural characteristics of the Indian caste society.

PDF file

The Genetics of Language and Farming Spread in India

PDF file

The Genetic Heritage of the Earliest Settlers Persists Both in Indian Tribal and Caste Populations

Two tribal groups from southern India—the Chenchus and Koyas—were analyzed for variation in mitochondrial DNA (mtDNA), the Y chromosome, and one autosomal locus and were compared with six caste groups from different parts of India, as well as with western and central Asians. In mtDNA phylogenetic analyses, the Chenchus and Koyas coalesce at Indian-specific branches of haplogroups M and N that cover populations of different social rank from all over the subcontinent. Coalescence times suggest early late Pleistocene settlement of southern Asia and suggest that there has not been total replacement of these settlers by later migrations. H, L, and R2 are the major Indian Y-chromosomal haplogroups that occur both in castes and in tribal populations and are rarely found outside the subcontinent. Haplogroup R1a, previously associated with the putative Indo-Aryan invasion, was found at its highest frequency in Punjab but also at a relatively high frequency (26%) in the Chenchu tribe. This finding, together with the higher R1a-associated short tandem repeat diversity in India and Iran compared with Europe and central Asia, suggests that southern and western Asia might be the source of this haplogroup. Haplotype frequencies of the MX1 locus of chromosome 21 distinguish Koyas and Chenchus, along with Indian caste groups, from European and eastern Asian populations. Taken together, these results show that Indian tribal and caste populations derive largely from the same genetic heritage of Pleistocene southern and western Asians and have received limited gene flow from external regions since the Holocene. The phylogeography of the primal mtDNA and Y-chromosome founders suggests that these southern Asian Pleistocene coastal settlers from Africa would have provided the inocula for the subsequent differentiation of the distinctive eastern and western Eurasian gene pools.

PDF file

Negligible Male Gene Flow Across Ethnic Boundaries in India, Revealed by Analysis of Y-Chromosomal DNA Polymorphisms

From the historically prevalent social structure of Indian populations it may be predicted that there has been very little male gene flow across ethnic boundaries. To test this finding, we have analyzed DNA samples of individuals belonging to 10 ethnic groups, speaking Indo-European or Austroasiatic languages and inhabiting the eastern and northern regions of India. Eight Y-chromosomal markers, two biallelic and six microsatellite, were studied. All populations were monomorphic for the deletion allele at the YAP (DYS287) locus and for the 119-bp allele at the DYS288 locus. Y-chromosomal haplotypes were constructed on the basis of one RFLP locus and five microsatellite loci. The haplotype distribution among the groups showed that different ethnic groups harbor nearly disjoint sets of haplotypes. This indicates that there has been virtually no male gene flow among ethnic groups. Analysis of molecular variance revealed that there was significant haplotypic variation between castes and tribes, but nonsignificant variation among ranked caste clusters. Haplotypic variation attributable to differences in geographical regions of habitat was also nonsignificant.

PDF file

High-resolution analysis of Y-chromosomal polymorphisms reveals signatures of population movements from Central Asia and West Asia into India

Linguistic evidence suggests that West Asia and Central Asia have been the two major geographical sources of genes in the contemporary Indian gene pool. To test the nature and extent of similarities in the gene pools of these regions we have collected DNA samples from four ethnic populations of northern India, and have screened these samples for a set of 18 Y-chromosome polymorphic markers (12 unique event polymorphisms and six short tandem repeats). These data from Indian populations have been analysed in conjunction with published data from several West Asian and Central Asian populations. Our analyses have revealed traces of population movement from Central Asia and West Asia into India. Two haplogroups, HG-3 and HG-9, which are known to have arisen in the Central Asian region, are found in reasonably high frequencies (41.7% and 14.3% respectively) in the study populations. The ages estimated for these two haplogroups are less in the Indian populations than those estimated from data on Middle Eastern populations. A neighbour-joining tree based on Y-haplogroup frequencies shows that the North Indians are genetically placed between the West Asian and Central Asian populations. This is consistent with gene flow from West Asia and Central Asia into India.

PDF file

Ethnic India: A Genomic View, With Special Reference to Peopling and Structure

We report a comprehensive statistical analysis of data on 58 DNA markers (mitochondrial [mt], Y-chromosomal, and autosomal) and sequence data of the mtHVS1 from a large number of ethnically diverse populations of India. Our results provide genomic evidence that (1) there is an underlying unity of female lineages in India, indicating that the initial number of female settlers may have been small; (2) the tribal and the caste populations are highly differentiated; (3) the Austro-Asiatic tribals are the earliest settlers in India, providing support to one anthropological hypothesis while refuting some others; (4) a major wave of humans entered India through the northeast; (5) the Tibeto-Burman tribals share considerable genetic commonalities with the Austro-Asiatic tribals, supporting the hypothesis that they may have shared a common habitat in southern China, but the two groups of tribals can be differentiated on the basis of Y-chromosomal haplotypes; (6) the Dravidian tribals were possibly widespread throughout India before the arrival of the Indo-European-speaking nomads, but retreated to southern India to avoid dominance; (7) formation of populations by fission that resulted in founder and drift effects have left their imprints on the genetic structures of contemporary populations; (8) the upper castes show closer genetic affinities with Central Asian populations, although those of southern India are more distant than those of northern India; (9) historical gene flow into India has contributed to a considerable obliteration of genetic histories of contemporary populations so that there is at present no clear congruence of genetic and geographical or sociocultural affinities.

PDF file

Fundamental genomic unity of ethnic India is revealed by analysis of mitochondrial DNA

Mitochondrial DNA (mtDNA) profiles of 23 ethnic populations of India drawn from diverse cultural, linguistic and geographical backgrounds are presented. There is extensive sharing of a small number of mtDNA haplotypes, reconstructed on the basis of restriction fragment length polymorphisms, among the populations. This indicates that Indian populations were founded by a small number of females, possibly arriving on one of the early waves of out-of-Africa migration of modern humans; ethnic differentiation occurred subsequently through demographic expansions and geographic dispersal. The Asian-specific haplogroup M is in high frequency in most populations, especially tribal populations and Dravidian populations of southern India. Populations in which the frequencies of haplogroup M are relatively lower show higher frequencies of haplogroup U; such populations are primarily caste populations of northern India. This finding is indicative of a higher Caucasoid admixture in northern Indian populations. By examining the sharing of haplotypes between Indian and south-east Asian populations, we have provided evidence that south-east Asia was peopled by two waves of migration, one originating in India and the other originating in southern China. These findings have been examined and interpreted in the light of inferences derived from previous genomic and historical studies.

PDF file

Distinctive KIR and HLA diversity in a panel of north Indian Hindus

HLA and KIR are diverse and rapidly evolving gene complexes that work together in human immunity mediated by cytolytic lymphocytes. Understanding their complex immunogenetic interaction requires study of both HLA and KIR diversity in the same human population. Here a panel of 72 unrelated north Indian Hindus was analyzed. HLA-A, B, C, DRB1, DQA1, and DQB1 alleles and their frequencies were determined by sequencing or high-resolution typing of genomic DNA; KIR genotypes were determined by gene-specific typing and by allele-level DNA typing for KIR2DL1, 2DL3, 2DL5, 3DL1, and 3DL2. From HLA analysis, the north Indian population is seen to have several characteristics shared either with Caucasian or East Asian populations, consistent with the demographic history of north India, as well as specific features, including several alleles at high frequency that are rare or absent in other populations. A majority of the north Indian KIR gene profiles have not been seen in Caucasian and Asian populations. Most striking is a higher frequency of the B group of KIR haplotypes, resulting in equal frequencies for A and B group haplotypes in north Indians. All 72 members of the north Indian panel have different HLA genotype and different KIR genotype.

PDF file

Genetic variation of ApoB 3'hyper variable region polymorphism among Brahmins of North India

ApoB 3'hyper variable region (ApoB 3'HVR) is highly polymorphic and hence an informative marker. It could be an ideal candidate to study the genetic heterogeneity among different population groups of the Indian subcontinent. It is one of the markers for which population data are available. This makes the ApoB 3'HVR an ideal locus for a pilot study to investigate the relationships between different populations and the microevolutionary processes leading to their present-day distribution. In the present investigation, we have studied ApoB 3'HVR in three endogamous groups of North India and have compared these populations on the basis of inter- and intra-group diversity. The sub-populations chosen were Bhargavas, Chaturvedis, and non-Bhargava non-Chaturvedi Brahmins of Uttar Pradesh. Nineteen segregating alleles were detected in our population groups. The average observed heterozygosity was quite high (0.717), suggesting high diversity at the ApoB 3'HVR locus. Low value of average GST (0.0126) and FST (0.002) reflects non-significant deviation of heterozygosity between the three subgroups. On comparing the three study groups with ApoB 3'HVR of other Indian and world populations, it was clear that greater diversity was observed for Africans followed by Europeans and Asians. There was relative homogeneity among the continental groups. In our study it was observed that there was high heterozygosity, an extended range of allele size, a quasi unimodal allele size distribution, centred on HVE 37. These findings indicate that our populations may be characterized as ancestral, since similar features are observed in the African population. ApoB 3'HVR polymorphism suggests that despite practising restricted marital patterns, these groups or castes do not significantly differ from each other at the genetic level. This may be because of the fact that divergence time may not be enough to cause genetic variation in these groups. However, it may not be ruled out that the ApoB 3'HVR polymorphism probably predates the divergence of these sub-castes. We are further testing this observation, using mtDNA for maternal lineages and Y-chromosome markers for paternal lineages.

PDF file

Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans


Recent advances in the understanding of the maternal and paternal heritage of south and southwest Asian populations have highlighted their role in the colonization of Eurasia by anatomically modern humans. Further understanding requires a deeper insight into the topology of the branches of the Indian mtDNA phylogenetic tree, which should be contextualized within the phylogeography of the neighboring regional mtDNA variation. Accordingly, we have analyzed mtDNA control and coding region variation in 796 Indian (including both tribal and caste populations from different parts of India) and 436 Iranian mtDNAs. The results were integrated and analyzed together with published data from South, Southeast Asia and West Eurasia.


Four new Indian-specific haplogroup M sub-clades were defined. These, in combination with two previously described haplogroups, encompass approximately one third of the haplogroup M mtDNAs in India. Their phylogeography and spread among different linguistic phyla and social strata was investigated in detail. Furthermore, the analysis of the Iranian mtDNA pool revealed patterns of limited reciprocal gene flow between Iran and the Indian sub-continent and allowed the identification of different assemblies of shared mtDNA sub-clades.


Since the initial peopling of South and West Asia by anatomically modern humans, when this region may well have provided the initial settlers who colonized much of the rest of Eurasia, the gene flow in and out of India of the maternally transmitted mtDNA has been surprisingly limited. Specifically, our analysis of the mtDNA haplogroups, which are shared between Indian and Iranian populations and exhibit coalescence ages corresponding to around the early Upper Paleolithic, indicates that they are present in India largely as Indian-specific sub-lineages. In contrast, other ancient Indian-specific variants of M and R are very rare outside the sub-continent.

PDF file