|
|
bioIgnoramus
Nice one.
Email | Homepage | 01.20.08 - 10:05 am | #
|
razib
these recent STRUCTURE-based studies are suggesting that askhenazi jews are rather distinct from other european populations. is it because of bottleneck effects and/or private alleles or what? or is it middle eastern ancestry? anyone looked close at this?
Email | Homepage | 01.20.08 - 10:36 am | #
|
RKU
This sort of eigenvector-based quantification of genetic-difference space seems like exactly the correct sort of approach to take. Not being a specialist myself, I don't know how common it is, but this is the first time I've seen such a chart, which is very visually informative.
It would be extremely interesting to extend this sort of presentation-graph with a full range of the world's different populations. For example, I'd be pretty curious to know how closely most Middle Easterners clump with the Ashk Jews.
Email | Homepage | 01.20.08 - 10:49 am | #
|
dodo
Wow, that's impressive.
Email | Homepage | 01.20.08 - 10:52 am | #
|
razib
This sort of eigenvector-based quantification of genetic-difference space seems like exactly the correct sort of approach to take. Not being a specialist myself, I don't know how common it is, but this is the first time I've seen such a chart, which is very visually informative.
well, PC diagrams always show populations as they relate via the first two principle components in stuff like history and geography of human genes. i think that showing the all individuals like this is far more powerful, but it's definitely been enabled by the expanded data sets (cavalli-sforza was working in the pre-PCR & bioninformatics era and all that). but it should look familiar as well....
Email | Homepage | 01.20.08 - 10:57 am | #
|
p-ter
is it because of bottleneck effects and/or private alleles or what? or is it middle eastern ancestry? anyone looked close at this?
I don't think anyone's looked closely at it. my guess is middle eastern ancestry, but Asheknazis aren't in the panels people have been using for studies of worldwide population structure. should be pretty easy for someone to plug in now, though.
This sort of eigenvector-based quantification of genetic-difference space seems like exactly the correct sort of approach to take. Not being a specialist myself, I don't know how common it is
as razib pointed out, PCA has been in use for a while, but it's really gained popularity among medical geneticists in the last couple years; almost every large-scale association study now has a plot that looks something like this one in the supplement.
In terms of making plots like this for the entire world, the datasets haven't been there (these data were all collected for medical studies, and so contain extremely limited sampling at the world's diversity), but I suspect they will be at some point.
Email | Homepage | 01.20.08 - 11:18 am | #
|
razib
my guess is middle eastern ancestry
yeah, the thing that makes me concerned is how far they are from greeks. but i guess levantines are further from greeks than anatolians....
but I suspect they will be at some point.
really? i bet you do.
Email | Homepage | 01.20.08 - 11:23 am | #
|
RKU
It really would be interesting if Askh Jews did indeed fall half-way between Europeans and Middle Easterners on such a chart, just as would be predicted from some other ancestry-mix estimates.
BTW, here's another question from an ignorant non-specialist. If people could easily visualize 3-D graphs, would there be much extra value in including a third eigenvector component, or do the first two tend to overwhelmingly dominant the variation? And is the first one much more important than the second, or are they generally of comparable size?
Email | Homepage | 01.20.08 - 2:26 pm | #
|
p-ter
If people could easily visualize 3-D graphs, would there be much extra value in including a third eigenvector component, or do the first two tend to overwhelmingly dominant the variation?
in this case, you probably wouldn't get more from including a third axis--the authors claim the vast majority of the variance is in the first two PCs. In general, though, you probably do get some mileage from additional PCs, it all depends on the actual structure of the data.
Email | Homepage | 01.20.08 - 3:52 pm | #
|
razib
it all depends on the actual structure of the data.
wut about its use? i.e., seems like the first two PCs would do what you wanted it to do for most attempts to infer population structure if you're interested in demographic history, but perhaps some diseases might benefit from really fine grained sketching out of substructure.
Email | Homepage | 01.20.08 - 4:22 pm | #
|
Rich Lawler
This is an interesting study. I want to comment on something Razib brought up, regarding the link to "Lewontin's fallacy." I looked at Edwards' paper as well as the post on Wikipedia, but my question is this: isn't principal components simply an elaborate correlation analysis (you define a variable, or variables, that are analogous to the regression lines in in multivariate regressions, and those variables--components--capture the correlational properties among the variables in the analysis)? If it is, then it seem that pointing out that certain alleles are correlated is pretty simplistic, and to suggest it's a fallacy is not entirely accurate. Lewontin showed that most variation is found among individuals within a particular population or ethnic group, but to suggest that he overlooked the correlation structure of genes is simply not true (he did, in fact, author a statistics textbook--mostly bivariate, but I'm sure he was aware of PCA). SOME amount of correlation structure exists among alleles and PCA can certainly reveal this, but most PCA-based studies of allelic correlations reveal clines (either sharp or shallow), as evidenced in Cavalli-Sforza's tome on human genetic variation. [That some of this correlation structure can be used to build phylogenetic trees of human groups was not the focus of Lewontin's study.] It would seem that calling something a fallacy should apply to analyses that are at least 85% WRONG, not 15% RIGHT. To me it's an empirical question as to how much genetic variation falls within versus between groups but the most complete studies still suggest that Lewontin was basically correct, no? [I could be wrong on this but I imagine if you aligned the genome sequences of 5 (randomly picked) sub-Saharan africans, 5 (randomly picked) Europeans, and 5 (randomly picked) Asians, most of the genetic variation would be found between individuals and not between the three groups (?)].
It's pretty obvious--even Lewontin would agree--that one can find sets of alleles that pin-point ancestry. We know this because we can not only use alleles to pin-point population/geographic ancestry, but also pin-point paternity. If we utilize alleles with maximal variation between groups and/or individuals, then it only follows that our correlational analysis will divide these groups (for the most part) along these lines. And although most of these types of studies show that clinal variation is the norm (as in C-S's book), programs like STRUCTURE will divide this variation into discrete groups--this is because we a priori tell the program to do so, and it spits back posterior probabilities about which a priori number of groups that we fed into the program is best supported by the data.
Anyways, my sense of Lewontin's analysis was that he was trying to use genetic data to make a case against the traditional tripartite division of humans into three races: black, white, asian. And within these divisions, his analysis seems to hold. However, recent analyses seem to conflate Lewontin's use of "races" with words such as "ancestry" "population cluster" etc. These things should not be conflated. They also seem not to rigorously define "genetic variation", and thus finding that some highly-variable genetic markers track population membership is nothing new, since we can use these same markers to track membership down to the level of families. Suffice to say, interpretations of PCA are sometimes subjective, as in the figure that P-ter posted, is that a "definite" division between two groups, or simply a steep cline that reveals a bit of blending.
I'm not up on this literature and mostly I'm posting this to see what folks think...
Email | Homepage | 01.20.08 - 4:30 pm | #
|
razib
rich,
i asked cavalli-sforza about these issues, he said:
Edwards and Lewontin are both right. Lewontin said that the between populations fraction of variance is very small in humans, and this is true, as it should be on the basis of present knowledge from archeology and genetics alike, that the human species is very young. It has in fact been shown later that it is one of the smallest among mammals. Lewontin probably hoped, for political reasons, that it is TRIVIALLY small, and he has never shown to my knowledge any interest for evolutionary trees, at least of humans, so he did not care about their reconstruction. In essence, Edwards has objected that it is NOT trivially small, because it is enough for reconstructing the tree of human evolution, as we did, and he is obviously right.
link.
here is what a.w.f. edwards said in response to a GNXP interview:
I can only speak for myself as to why it took me so long. Others closer to the field will have to explain why the penny did not drop earlier, but the principal cause must be the huge gap in communication that exists between anthropology, especially social anthropology, on the one hand, and the humdrum world of population and statistical genetics on the other. When someone like Lewontin bridges the gap, bearing from genetics a message which the other side wants to hear, it spreads fast - on that side. But there was no feedback. Others might have noticed Lewontin's 1972 paper but I had stopped working in human and population genetics in 1968 on moving to Cambridge because I could not get any support (so I settled down to writing books instead). In the 1990s I began to pick up the message about only 15% of human genetic variation being between, as opposed to within, populations with its non-sequitur that classification was nigh impossible, and started asking my population-genetics colleagues where it came from. Most had not heard of it, and those that had did not know its source. I regret now that in my paper I did not acknowledge the influence of my brother John, Professor of Genetics in Oxford, because he was independently worrying over the question, inventing the phrase 'the death of phylogeny' which spurred me on.
Eventually the argument turned up unchallenged in Nature and the New Scientist and I was able to locate its origin. I only started writing about it after lunch one day in Caius during which I had tried to explain the fallacy across the table to a chemist, a physicist, a physiologist and an experimental psychologist - all Fellows of the Royal Society - and found myself faltering. I like to write to clear my mind. Then I met Adam Wilkins, the editor of BioEssays, and he urged me to work my notes up into a paper.
I have had no adverse reaction to it at all, but plenty of plaudits from geneticists, many of whom told me that they too had been perplexed. Perhaps the communication gap is still too large, or just possibly the point has been taken. After all, Fisher made it in 1925 in Statistical Methods which was written for biologists so it is hardly new.
link.
i assume lewontin always understood the issues at at hand. but this is the same man who refuses to accept any sociobiological theories which can not specifically explain his behavior, as opposed to aggregates. as for the point about clines, this is complicated, and i don't think saying "it is clinal" is the last word (though it is a big improvement on the fantastical platonism which it has pushed aside). consider the sharp drop in frequencies for some alleles which run up against geographic barriers, or, the fact that across some geographic expanses allele frequency may drop linearly but the regions are VERY sparsely populated so the density of individuals at various locations on diagrams such as the one above varies a great deal (think of a straight line from the volga to manchuria).
Email | Homepage | 01.20.08 - 4:49 pm | #
|
p-ter
i.e., seems like the first two PCs would do what you wanted it to do for most attempts to infer population structure...
well, sure, if you're looking at European-Americans. If you were to take a sample of all Americans, you'd probably get a couple PCs separating out the people of different continental origin, then some further down the list that look like the plot from this paper. The first two are obvious here only because they've already removed a lot of structure from the population (ie. it's only European-ancestry individuals).
Email | Homepage | 01.20.08 - 5:03 pm | #
|
razib
ah, right. i was thinking of narrowing down populations even further (e.g., sephard vs. ashkenazi jews).
Email | Homepage | 01.20.08 - 5:08 pm | #
|
gcochran
I believe that Lewontin was trying to say that different human populations aren't significantly different But since we live in phenotype space, not gene space, the argument is untrue. As far as any given trait is concerned, the populations are just as different as the measurements say they are, regardless of how genetic variance is distributed.
70% of genetic variation in dogs is within-breed, yet between-breed variation is considerably larger than within-breed variation for many traits.
As for the idea that human populations can only differ significantly in traits that wouldn't upset a cultural anthropologist - there's no reason for that to be the case and it doesn't look as if it is.
Email | Homepage | 01.20.08 - 5:13 pm | #
|
agnostic
traits that wouldn't upset a cultural anthropologist
When the next anthropology department splits, like they did at Stanford, they should adopt the titles "Upsetting" and "Uplifting."
Email | Homepage | 01.20.08 - 7:38 pm | #
|
yo
A paper worth checking out if you can:
Chakraborty, R. (1982). "Allocation versus variation: The issue of genetic differences between human racial groups.". American Naturalist 120: 403-404.
Email | Homepage | 01.20.08 - 7:59 pm | #
|
yo
what the hell -- here's the full text
In developing indicators for measuring genetic variation between and within
populations, Lewontin (1972), Nei and Roychoudhury (1972, 1974, 1981),
Cavalli-Sforza (1974), and Latter (1973, 1980), among others, all used elec-
trophoretic as well as immunologic gene frequency data from major racial groups
to examine the extent of between- versus within-group variation. Even though
they used measures that are quite dissimilar in statistical and interpretive terms,
all concluded that genic variation between populations (in Nei's terminology as
measured by the net codon differences per locus; see Nei [1975]) is smaller than
that within populations. Mitton (1977), Spielman and Smouse (1976), Smouse and
Spielman (1977), and Smouse et al. (1982) disagreed with this conclusion. The
reasoning behind their objections stems mainly from taxonomic implications of
the gene diversity analysis.
The debate resulted from confusing the two different but relevant questions: (1)
What is the average gene difference between races relative to the genic variation
within races? (2) Can a reliable classification of races be made with the spectrum
of known polymorphic genetic markers? While seeking an answer to the second
question, Mitton (1977) claimed that Nei's and Lewontin's measures were insen-
sitive. Nei and Roychoudhury's studies (1972, 1974, 1981), however, were an
attempt to answer the first question. As Nei (1981) points out, Lewontin's con-
cluding remarks (1972, p. 397) are not directly the result of his or Nei's statistical
analysis, but reflect Lewontin's "humanistic view" Nei (1981, p. 88). While
attempting to answer the first question, I demonstrated the statistical fallacy of
Mitton's measure (Chakraborty 1978). Furthermore, there seems to be a logical
relationship between the multilocus and the single-locus measure on theoretical
grounds (Chakraborty 1980). Since the multilocus measure does not refer to average
genic difference on a per-locus basis, it is not the most appropriate vehicle for
seeking an answer to the first question.
Smouse et al. (1982) provide the detailed statistical principles for solving the
second problem. Indeed, in spite of the small per-locus genic differences between
races, the multidimensional phenotypic or gene frequency constellations of differ-
ent races turn out to be largely nonoverlapping when a sufficient number of loci
are considered for analysis. The situation is exemplified in Nee1 (1981). In terms of
amino acid sequence data or DNA restriciion maps, a considerable amount of
observable phenotypic nonoverlap is seen in racial difference studies (Brown
1980). Nevertheless, per-codon nucleotide difference estimates between races still
remain small compared to those within races.
From an evolutionary viewpoint, the history of racial separation is too short for
the accumulation of any larger gene difference between races. Moreover, with the
ongoing magnitude of admixture, this difference will probably diminish relative to
the accumulation of differences between individuals of the same race. Neverthe-
less, the classification of human ethnic or racial groups remains a viable, impor-
tant feature in understanding the nature and mechanism of human evolution.
Email | Homepage | 01.20.08 - 8:06 pm | #
|
Rich Lawler
Razib,
How ironic, I probably read "10 question for AWF Edwards..." but failed to remember it (or re-read it) when writing what I did above. That said, he seems to be focusing on the issue of using genetic variation for (phylogenetically) classifying human groups (and somehow implying that 15% is more than enough, i.e., non-trivial). Obviously, there is no benchmark percentage of variation that needs to be met in order to classify human groups. If that is Edwards' only point, then obviously he's correct, and his point is pretty canonical. Any amount of diagnostic variation can be used to separate groups. One can even use a single trait to distinguish some major groups (i.e., the petrosal bone of the auditory bulla separates primates from non-primates).
That said, I need to re-read Lewontin's paper both to make sure that I'm not attributing specific points to Lewontin that he didn't make, and also because if Lewontin is stating that one CAN'T classify human groups because 85% of the variation is between individuals, well, that's just not true. All we need to do is to find that 15% of variation that does distinguish "us from them."
It seems, however, that Lewontin wasn't interested in cladograms of human groups but was more interested in asking the question: what social significance should we attribute to the 85% genetic similarity versus 15% genetic difference among human groups? ...A question that obviously still has people talking.
Email | Homepage | 01.20.08 - 8:21 pm | #
|
David B
I hope people are reading the paper and not just looking at the pretty picture. Here is the main conclusion:
"We conclude that the top two principal components of genetic ancestry in the IBD dataset roughly correspond to a continuous cline from northwest to southeast European ancestry and an orthogonal discrete separation between Ashkenazi Jewish and southeast European ancestry (Figure 1E)... Our results are consistent with a previous study in which Ashkenazi Jewish and southeast European samples occupied similar positions on the northwest-southeast axis, although there was insufficient data in that study to separate these two populations [7]. A historical interpretation of this finding is that both Ashkenazi Jewish and southeast European ancestries are derived from migrations/expansions from the Middle East and subsequent admixture with existing European populations [12,13]."
Email | Homepage | 01.21.08 - 2:14 am | #
|
Jason
gcochran's point about phenotypic vs. genotypic space is good. There is little doubt that there is a lot of substructure in human populations. The substructure is caused by a lot of things including geographic barriers to gene flow, cultural barriers to gene flow, differential selection pressures in local environments, genetic drift, the genetic bottleneck associated with the out of africa event, and local founder events.
The substructure caused by many of these processes is only useful for inferring population history because the variation is neutral. However, substructure caused by differential selection pressures is the variation that Lewontin was afraid of. Yet we know it has happened. For example, there is no doubt that genes effecting skin color have been under different selective pressures in different geographical locations.
What about other genetic variation that causes phenotypic variation due to adaptation to local environments? Of course it exists. This is, in a lot of ways, the really interesting variation.
It is worth noting that this variation may be largely independent of other population substructure. Genes under selection may pass between demes with levels of gene flow insufficient to homogenize the populations.
It is also worth noting that the substructure in one gene under selection may not be well correlated with that of another gene under selection. For this reason classifying populations by phenotype (which, again, is really what Lewontin was afraid of) may not work, for there is absolutely no reason to expect that phenotypes will always partition populations the same way. Of course many do, which is why we have the classical divisions of race. Still, the classical divisions of race have shifted over the years when different phenotypes have been emphasized.
In the end Lewontin and Edwards are both arguing about the 85%. Lewontin correctly points out that there is a lot of variation between any two unrelated people. Edwards correctly points out that despite this due to differences in the frequencies of the common variation between populations a lot of structure is evident. The real interesting stuff is the 15%, the non-common variation, because some portion of this must be caused by adaptation to local environments. I think this is the direction the study of human genomic diversity is heading.
Email | Homepage | 01.21.08 - 10:30 am | #
|
razib
It is worth noting that this variation may be largely independent of other population substructure. Genes under selection may pass between demes with levels of gene flow insufficient to homogenize the populations.
jason
http://www.gnxp.com/blog/2007/09/new-races-of-
man.php
also, have you read loren reiseberg's work?
http://www.bio.indiana.edu/facultyresearch/
faculty/Rieseberg.html
Email | Homepage | 01.21.08 - 11:22 am | #
|
Jason
Razib,
I quite like the idea of selected variants introgressing across demes. However, I also think it is worth bearing in mind the ubiquity of homoplasy in phenotypes. Lactase persistence seems to evolve whenever the need arises and the necessary variation can not be found through gene flow. Neanderthals and modern humans independently have evolved a low activity variant of MC1R (modern humans several times). I would bet homoplasy when it comes to phenotypic similarity between Europeans and Ainu.
Email | Homepage | 01.21.08 - 11:48 am | #
|
Jason
A note on phenotypic homoplasy:
Cliff Jolly, in The Seed Eaters, noted the great usefulness in understanding function of phenotypes from analogy rather than homology specifically because ancestry is removed from the equation. We can largely rule out phylogenetic baggage as an explanation for the evolution of light skin color because it has evolved at least twice independently. I guess Felsenstein's independent contrasts formalizes this idea...
Email | Homepage | 01.21.08 - 12:01 pm | #
|
razib
jason, agreed.
I would bet homoplasy when it comes to phenotypic similarity between Europeans and Ainu.
yes, although contrast effect was at work too i think (the ainu looked a lot more european when set next to the typical japanese). also, not sure that the ainu characters are that derived.
Email | Homepage | 01.21.08 - 12:03 pm | #
|
Comment Preview:
|
|
|
Commenting by HaloScan.com
|