Posts Tagged ‘rarevariant’

Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SR. Exome sequencing and the genetic…

Sunday, July 20th, 2014

#Exome sequencing & #genetic basis of complex traits Key pt: amt of rare variants exceeds that from neutral model

Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SR. Exome sequencing and the genetic basis of complex traits. Nature Genetics (2012) 44: 623-630


This article serves as part review, and part research article, focusing on using exome sequencing to detect associations between variants and complex traits.

An important fact they point out, with a wide range of implications for studying disease, is that the number of rare variants exceeds the number predicted by the neutral model. Figure 1 illustrates nicely this excess of rare variants.

I agree with their statement that the majority of these mutations are not “neutral”. They attribute this excess to population expansion or purifying selection, but a plausible explanation that explains this excess, which is found in all organisms regardless of demographic history, is linked selection.

The authors compare statistics derived before and after filtering exome sequencing data of 438 individuals (HIV and Scizophrenia data-sets), illustrating the importance of filtering in obtaining high quality calls. WGS (CGI data on 37 individuals) was used as a benchmark for the number of called SNP counts of different categories (silent, missense, nonsense).

They then proceed to analyze the affect of population stratification on significance values by combining different ratios of individuals from the European-American HIV cohort and the Swedish schizophrenia cohort. (Theory predicts that older populations should have more rare variants because recombination has had more time to break up linkage blocks, and because newer populations have most likely gone through homogenizing bottlenecks.) They find that calculating p-values using a permutation test provides fewer type I errors (false positives), and that this technique can competently deal with population
stratification when conducting association studies.

Mapping rare and common causal alleles for complex human diseases

Saturday, February 1st, 2014

Mapping rare & common causal alleles for complex human diseases: great primer, describing yin & yang of #RVAS v #GWAS

Found this a very illuminating primer, particularly relevant to understanding rare variants.

Soumya Raychaudhuri
Cell. 2011 September 30; 147(1): 57-69.

Some particularly useful quoted snippets below.


De novo mutations occurring spontaneously in individuals are constantly and rapidly introduced into any population. …Most of these mutations are quickly filtered out or lost by genetic drift and will never achieve appreciable allele frequencies. I illustrate this concept by a simulation in which de novo neutral mutations (conferring no effect on fitness) are introduced into a population of 2,000 diploid individuals. In 31 generations 95% of these mutations disappear from the general population, and not one of these mutations achieves an allele frequency of >1% in 200 generations (see Figure S1).

Common variant associations to phenotype are often facile to find. Their high frequencies allow case-control studies to be adequately powered to detect even modest effects. Their high r2 to other proximate common variants allows for association signals to be discovered by genotyping the marker directly, or other nearby correlated markers. But mapping those associated variants to the specific variant that functionally influence disease risk can be challenging since the statistical signals invoked by inter-correlated variants are difficult to disentangle.

On the other hand, individual rare variant associations are
challenging to find. Their low frequency renders current cohorts underpowered to detect all but the strongest effects, and lack of correlation to other markers often prevents them from being picked up by a standard genotyping marker panels. But, once a rare associated variant is identified, mapping the causal rare variants is relatively facile since recent ancestry is likely to limit the number of inter-correlated markers.

For rare variant associations, the field has not yet defined accepted standards for statistical significance that account for the burden of multiple hypothesis testing. Since there are many more rare variants than common ones, and they are not typically inter-correlated with each other, a more stringent threshold may be necessary than applied for common variants. One conservative approach is to correct for the total number of bases genome-wide, ie p=0.05/3000000000 ~ 10-11 as a significance threshold.

If a genomic region is critical to disease pathogenesis rare mutations may modulate disease susceptibility. Then many affected individuals may have rare mutations more frequently in that region, though the mutations may be different from and unrelated to one another. This concept has sparked interest in the genetics community, and workers in statistical genetics have devised strategies to examine rare variants in aggregate across a target region (Bansal et al., 2010). These “burden” tests assess if rare variants within a specific region are distributed in a non-random way, suggesting that they might be playing a roll in disease pathogenesis (see Figure 3B).


HoxB13 in prostate cancer

Friday, December 20th, 2013

mis-sense change (G84E) in HOXB13 was found overall in 1.4% of prostate cancer cases and in 0.1% of unaffected controls