Informativeness of Genetic Markers for Inference of Ancestry
Inference of individual ancestry is useful in various applications, such as admixture mapping and structured association mapping. Using information-theoretic principles, we introduce a general measure, the informativeness for assignment (In), applicable to any number of potential source populations, for determining the amount of information that multiallelic markers provide about individual ancestry. In a worldwide human microsatellite data set, we identify markers of highest informativeness for inference of regional ancestry and for inference of population ancestry within regions; these markers, which are listed in online-only tables in our article, can be useful both in testing for and in controlling the influence of ancestry on case-control genetic association studies. Markers that are informative in one collection of source populations are generally informative in others. Informativeness of random dinucleotides, the most informative class of microsatellites, is five to eight times that of random single-nucleotide polymorphisms (SNPs), but 2%–12% of SNPs have higher informativeness than the median for dinucleotides. Our results can aid in decisions about the type, quantity, and specific choice of markers for use in studies of ancestry.
PDF file
Inference of individual ancestry is useful in various applications, such as admixture mapping and structured association mapping. Using information-theoretic principles, we introduce a general measure, the informativeness for assignment (In), applicable to any number of potential source populations, for determining the amount of information that multiallelic markers provide about individual ancestry. In a worldwide human microsatellite data set, we identify markers of highest informativeness for inference of regional ancestry and for inference of population ancestry within regions; these markers, which are listed in online-only tables in our article, can be useful both in testing for and in controlling the influence of ancestry on case-control genetic association studies. Markers that are informative in one collection of source populations are generally informative in others. Informativeness of random dinucleotides, the most informative class of microsatellites, is five to eight times that of random single-nucleotide polymorphisms (SNPs), but 2%–12% of SNPs have higher informativeness than the median for dinucleotides. Our results can aid in decisions about the type, quantity, and specific choice of markers for use in studies of ancestry.
PDF file