Archive for the ‘SciLit’ Category

Why Most Published Research Findings are false

Saturday, February 7th, 2015

Why Most Published Research Findings are False http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 Evaluating 2×2 confusion matrix, effects of bias & multiple studies

PLoS Medicine | www.plosmedicine.org 0696
August 2005 | Volume 2 | Issue 8 | e124

QT:{{"
Published research fi ndings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false fi ndings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research fi ndings are false. Here I will examine the key


Research fi ndings are defi ned here as any relationship reaching formal statistical signifi cance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null fi ndings. As has been shown previously, the probability that a research fi nding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical signifi cance [10,11]. Consider a 2 × 2 table in which research fi ndings are compared against the gold standard of true relationships in a scientifi c fi eld. In a research fi eld both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the fi eld. R

is characteristic of the fi eld and can vary a lot depending on whether the fi eld targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fi elds where either there is only one true relationship (among many that can be hypothesized) or the power is similar to fi nd any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study fi nding a true relationship refl ects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists refl ects the Type I error rate, α. Assuming that c relationships are being probed in the fi eld, the expected values of the 2 × 2 table are given in Table 1. After a research fi nding has been claimed based on achieving formal statistical signifi cance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research fi nding is thus
"}}

Google Scholar Wins Raves—But Can It Be Trusted?

Saturday, February 7th, 2015

#Google Scholar Wins Raves—But Can It Be
Trusted?http://www.sciencemag.org/content/343/6166/14 #Citation spam possible, fake papers artificially inflating H-index

Science 3 January 2014:
Vol. 343 no. 6166 p. 14
DOI: 10.1126/science.343.6166.14

NEWS & ANALYSIS
SCIENTIFIC PUBLISHING
Google Scholar Wins Raves—But Can It Be Trusted?
John Bohannon

Google Scholar is picking up adherents in the scientific community. But the search service’s ascendancy is not going unchallenged.

PLOS Genetics: A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations

Saturday, February 7th, 2015

Massively Parallel Pipeline to Clone DNA Variants & Examine…Disease
Mutations http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004819 CloneSeq leverages NextGen sequencing

With the advance of sequencing technologies, tens of millions of genomic variants have been discovered in the human population. However, there is no available method to date that is capable of determining the functional impact of these variants on a large scale, which has increasingly become a huge bottleneck for the development of population genetics and personal genomics. Clone-seq and comparative interactome-profiling pipeline is a first to address this issue.

Can be coupled to many readouts.

Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics (2010) 86: 832-838.

Sunday, February 1st, 2015

Pooled association tests for rare variants in exon-resequencing http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032073 Simulation shows advantage of mult. rarity thresholds

Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ,
Sunyaev SR. Pooled association tests for rare variants in
exon-resequencing studies. American Journal of Human Genetics (2010)
86: 832-838.

SUMMARY

Multiple studies indicate strong association between rare variants and
resulting phenotype. This paper describes a population-genetics
simulation framework to study the influence of variant allele
frequency on the corresponding phenotype. In a prior study, causal
relationship between variants and phenotype was resolved by performing
association test on set of variants having allele frequency below a
fixed threshold. However, here it is observed that simulation
frameworks based on a variable allele frequency threshold provide
higher accuracy in association test compared to the fixed allele
frequency model. In addition, inclusion of predicted functional
effects of variants (Polyphen-2 scores) increases the accuracy of the
variable frequency threshold model. Overall, this paper describes a novel methodology, which can be
used to explore the association between rare variants and various
diseases.

The landscape of long noncoding RNAs in the human transcriptome : Nature Genetics : Nature Publishing Group

Wednesday, January 28th, 2015

Landscape of lncRNAs in the human #transcriptome
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3192.html Derived from RNAseq read assembly; not much overlap w/ @GencodeGenes

Matthew K Iyer,
Yashar S Niknafs,
Rohit Malik,
Udit Singhal,
Anirban Sahu,
Yasuyuki Hosono,
Terrence R Barrette,
John R Prensner,
Joseph R Evans,
Shuang Zhao,
Anton Poliakov,
Xuhong Cao,
Saravana M Dhanasekaran,
Yi-Mi Wu,
Dan R Robinson,
David G Beer,
Felix Y Feng,
Hariharan K Iyer
& Arul M Chinnaiyan

Nature Genetics (2015) doi:10.1038/ng.3192Received 20 June 2014 Accepted 18 December 2014

Reconciling differential gene expression data with molecular interaction networks

Wednesday, January 28th, 2015

Reconciling differential gene expression
w/…#networkshttp://bioinformatics.oxfordjournals.org/content/29/5/622 Propagating this across interactions finds perturbed pathways

This paper basically propagates scores of disease-related highly differentially expressed genes (-log10 p) over human protein interaction network, calculates new scores using four major algorithms (Vanilla, PageRank, GeneMANIA, Heat Kernel), re-ranks genes based on the new scores and then finds enriched pathways among top-ranking genes. Compared with traditional ways by ranking highly differentially expressed genes based on p-values without any network information, the approach not only recovered canonical pathways but also discovered novel ones such as an insulin-mediated glucose transport pathway in Huntington’s disease. The authors also explored differences among four algorithms and identified the top-ranking genes specifically found by particular algorithms. In short, the paper provides a valuable framework for integrating networks and gene expression data. Their analysis for comparing four major algorithms is also helpful.

http://bioinformatics.oxfordjournals.org/content/29/5/622

Distributed Information Processing in Biological and Computational Systems

Monday, January 26th, 2015

Distributed Info. Processing in Biological & Computational #Systems http://cacm.acm.org/magazines/2015/1/181614-distributed-information-processing-in-biological-and-computational-systems/fulltext Contrasts in strategies to handle node failures

QT:{{"
While both computational and biological systems need to address these similar types of failures, the methods they use to do so differs. In distributed computing, failures have primarily been handled by majority voting methods,37 by using dedicated failure detectors, or via cryptography. In contrast, most biological systems rely on various network topological features to handle failures. Consider for example the use of failure detectors. In distributed computing, these are either implemented in hardware or in dedicated additional software. In contrast, biology implements implicit failure detector mechanisms by relying on backup nodes or alternative pathways. Several proteins have paralogs, that is, structurally similar proteins that in most cases originated from the same ancestral protein (roughly 40% of yeast and human proteins have at least one paralog). In several cases, when one protein fails or is altered, its paralog can automatically take its place24 or protect the cell against the mutation.26 Thus, by preserving backup functionality in the protein interaction.


While we discussed some reoccurring algorithmic strategies used within both types of systems (for example, stochasticity and feedback), there is much more to learn in this regard. From the distributed computing side, new models are needed to address the dynamic aspects of communication (for example, nodes joining and leaving the network, and edges added and being subtracted), which are also relevant in mobile computing scenarios. Further, while the biological systems we discussed all operate without a single centralized controller, there is in fact a continuum in the term “distributed.” For example, hierarchical distributed models, where higher layers “control” lower layers with possible feedback, represent a more structured type of control system than traditional distributed systems without such a hierarchy. Gene regulatory networks and neuronal networks (layered columns) both share such a hierarchical structure, and this structure has been well-conserved across many different species, suggesting their importance to computation. Such models, however, have received less attention in the distributed computing literature.

"}}

Pgenes make proteins

Saturday, January 24th, 2015

Bioinformatics (2015) 31 (1): 33-39. doi: 10.1093/bioinformatics/btu615

Making novel proteins from #pseudogenes
http://bioinformatics.oxfordjournals.org/content/31/1/33.short Outcomes in 16 cases where one gets stable & functional translated products

http://bioinformatics.oxfordjournals.org/content/31/1/33.short

WASP: allele-specific software for robust discovery of molecular quantitative trait loci | bioRxiv

Monday, January 19th, 2015

WASP: allele-specific software for robust discovery of molecular quantitative trait loci
Bryce van de Geijn, Graham McVicker, Yoav Gilad, Jonathan Pritchard

doi: http://dx.doi.org/10.1101/011221
http://biorxiv.org/content/early/2014/11/07/011221

QT:{{”
Mapping of reads to a reference genome is biased by sequence polymorphisms6. Reads which contain the non-reference allele may fail to map uniquely or map to a different (incorrect) location in the genome6.
“}}

No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans : Nature Genetics : Nature Publishing Group

Sunday, January 18th, 2015

Removing deleterious mutations in Europeans [v] Africans
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3186.html Comparing nonsynonymous freq. betw. populations HT @obahcall