Archive for the ‘SciLit’ Category

PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis

Monday, March 3rd, 2014

PARADIGM-SHIFT predicts… function of mutations in… #cancers using pathway[s]. #Network-based gene prioritization
http://bioinformatics.oxfordjournals.org/content/28/18/i640

Detection and replication of epistasis influencing transcription in humans : Nature : Nature Publishing Group

Sunday, March 2nd, 2014

http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13005.html

Genome Biology | Full text | Identification of fusion genes in breast cancer by paired-end RNA-sequencing

Friday, February 28th, 2014

http://genomebiology.com/2011/12/1/R6https://code.google.com/p/fusioncatcher

Transcription and translation of pseudogenes

Sunday, February 23rd, 2014

http://www.nature.com/nmeth/journal/v11/n1/full/nmeth.2732.html

From the paper:
QT:{{”
2. Pseudogenes represent less than 0.1% of the total search space, yet a surprisingly large number, 36%, of human novel peptides mapped to pseudogenes (Fig. 2b). These findings are supported by recent peptide-level evidence of pseudogenes in mouse6. In humans, the observation of lineage- and cancer-specific expression of pseudogenes at the RNA level indicates biological relevance17. Our data suggest that pseudogenes may be not only transcribed but also translated. An interesting particular example was the pseudogene MYH16, identified by 20 peptides (Fig. 3), which were validated by LC-MS using synthetic peptides (Supplementary Fig. 15). The protein-coding capacity of MYH16 was previously shown to have been lost through double base deletion (resulting in a premature stop codon) during divergence of the human lineage from other primates18. However, our data show that, in the A431 cell line, the MYH16gene is actively encoding a shorter protein isoform with its translation initiation site downstream from the aforementioned double base deletion.
“}}

plant phylotypic stage

Friday, February 21st, 2014

http://www.ncbi.nlm.nih.gov/pubmed/22951968

NA12878 high confidence calls

Thursday, February 20th, 2014

Integrating genotype from many callers & indication of where they differ. Might be useful for the personal diploid genome.
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2835.html

PLOS Biology: Best Practices for Scientific Computing

Monday, February 10th, 2014

Best Practices for Scientific #Computing. Well known but useful pts, eg vers. control, asserts, interface comments…
http://www.plosbiology.org/article/info:doi%2F10.1371%2Fjournal.pbio.1001745
http://www.plosbiology.org/article/info:doi%2F10.1371%2Fjournal.pbio.1001745

What is a support vector machine?

Wednesday, February 5th, 2014

What is a support vector machine? A nice overview w/o equations, just pictures. Great for #teaching!
http://www.nature.com/nbt/journal/v24/n12/abs/nbt1206-1565.html #SVM .@GenomeNathan YES, but see
http://noble.gs.washington.edu/papers/noble_what.html …, which has an expanded, “free” version.
http://noble.gs.washington.edu/papers/noble_what.html
http://www.nature.com/nbt/journal/v24/n12/abs/nbt1206-1565.html

How Information Theory Handles Cell Signaling and Uncertainty

Tuesday, February 4th, 2014

Matthew D. Brennan, Raymond Cheong, and Andre Levchenko

Science. 2012 October 19; 338(6105): 10.1126/science.1227946. doi: 10.1126/science.1227946
PMCID: PMC3820285
NIHMSID: NIHMS512743

How Information Theory Handles #Cell Signaling & Uncertainty… really well since it’s ideal for noisy communication
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820285/?report=classic

Mapping rare and common causal alleles for complex human diseases

Saturday, February 1st, 2014

Mapping rare & common causal alleles for complex human diseases: great primer, describing yin & yang of #RVAS v #GWAS
http://www.cell.com/retrieve/pii/S0092867411010695

Found this a very illuminating primer, particularly relevant to understanding rare variants.

Soumya Raychaudhuri
Cell. 2011 September 30; 147(1): 57-69.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198013/

Some particularly useful quoted snippets below.

QT:{{”

De novo mutations occurring spontaneously in individuals are constantly and rapidly introduced into any population. …Most of these mutations are quickly filtered out or lost by genetic drift and will never achieve appreciable allele frequencies. I illustrate this concept by a simulation in which de novo neutral mutations (conferring no effect on fitness) are introduced into a population of 2,000 diploid individuals. In 31 generations 95% of these mutations disappear from the general population, and not one of these mutations achieves an allele frequency of >1% in 200 generations (see Figure S1).

Common variant associations to phenotype are often facile to find. Their high frequencies allow case-control studies to be adequately powered to detect even modest effects. Their high r2 to other proximate common variants allows for association signals to be discovered by genotyping the marker directly, or other nearby correlated markers. But mapping those associated variants to the specific variant that functionally influence disease risk can be challenging since the statistical signals invoked by inter-correlated variants are difficult to disentangle.

On the other hand, individual rare variant associations are
challenging to find. Their low frequency renders current cohorts underpowered to detect all but the strongest effects, and lack of correlation to other markers often prevents them from being picked up by a standard genotyping marker panels. But, once a rare associated variant is identified, mapping the causal rare variants is relatively facile since recent ancestry is likely to limit the number of inter-correlated markers.

For rare variant associations, the field has not yet defined accepted standards for statistical significance that account for the burden of multiple hypothesis testing. Since there are many more rare variants than common ones, and they are not typically inter-correlated with each other, a more stringent threshold may be necessary than applied for common variants. One conservative approach is to correct for the total number of bases genome-wide, ie p=0.05/3000000000 ~ 10-11 as a significance threshold.

If a genomic region is critical to disease pathogenesis rare mutations may modulate disease susceptibility. Then many affected individuals may have rare mutations more frequently in that region, though the mutations may be different from and unrelated to one another. This concept has sparked interest in the genetics community, and workers in statistical genetics have devised strategies to examine rare variants in aggregate across a target region (Bansal et al., 2010). These “burden” tests assess if rare variants within a specific region are distributed in a non-random way, suggesting that they might be playing a roll in disease pathogenesis (see Figure 3B).

“}}