Archive for the ‘SciLit’ Category

NEJM: Record-Breaking Performance in a 70-Year-Old Marathoner

Sunday, April 14th, 2019

We determined the physiological profile of a 70-year-old male marathoner who ran the event in 2:54:23…

LDL 84mg/dL and HDL 66mg/dL, quite impressive…

Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia | Nature Communications

Sunday, April 7th, 2019

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. – PubMed – NCBI

Sunday, April 7th, 2019

A Decade of GWAS Results in Lung Cancer | Cancer Epidemiology, Biomarkers & Prevention

Monday, April 1st, 2019

The first GWAS on lung cancer were reported in 2008. Three independent studies identified a susceptibility locus on chromosome 15q. Hung and colleagues (14) found two SNPs strongly associated with lung cancer on chromosome 15q25. Further genotyping in this region revealed many SNPs in tight linkage disequilibrium (LD) showing evidence of association. Six genes are located in this region including three nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, and CHRNB4). Interestingly, no appreciable variation in the risk was found across smoking categories or histologic subtypes of lung cancer. In a second GWAS, a SNP within the CHRNA3gene was strongly associated with smoking quantity and nicotine dependence (15). The same SNP was also strongly associated with lung cancer. The results suggest that the variant on chromosome 15q25 confers risk of lung cancer through its effect on tobacco addiction.

Deep learning and process understanding for data-driven Earth system science | Nature

Tuesday, March 5th, 2019
Perspective | Published: 13 February 2019
Deep learning and process understanding for data-driven Earth system science Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais & Prabhat
Nature volume 566, pages195–204 (2019)

Figure 3 presents a system-modelling view that seeks to integrate machine learning into a system model. As an alternative perspective, system knowledge can be integrated into a machine learning frame- work. This may include design of the network architecture36,79, physical constraints in the cost function for optimization58, or expansion of the training dataset for undersampled domains (that is, physically based data augmentation)80.

Surrogate modelling or emulation
See Fig. 3 (circle 5). Emulation of the full (or specific parts of) a physical model can be useful for computational efficiency and tractability rea- sons. Machine learning emulators, once trained, can achieve simulations orders of magnitude faster than the original physical model without sacrificing much accuracy. This allows for fast sensitivity analysis, model parameter calibration, and derivation of confidence intervals for the estimates.

(2) Replacing a ‘physical’ sub-model with a machine learning model
See Fig. 3 (circle 2). If formulations of a submodel are of semi-empirical nature, where the functional form has little theoretical basis (for example, biological processes), this submodel can be replaced by a machine learning model if a sufficient number of observations are available. This leads to a hybrid model, which combines the strengths of physical modelling (theoretical foundations, interpretable compartments) and machine learning (data-adaptiveness).

Integration with physical modelling
Historically, physical modelling and machine learning have often been treated as two different fields with very different scientific paradigms (theory-driven versus data-driven). Yet, in fact these approaches are complementary, with physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data and are amenable to finding unexpected patterns (surprises).

A success story in the geosciences is weather
prediction, which has greatly improved through the integration of better theory, increased computational power, and established observational systems, which allow for the assimilation of large amounts of data into the modelling system2
. Nevertheless, we can accurately predict the evolution
of the weather on a timescale of days, not months.

# REFs that I liked
ref 80

ref 57
Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

# some key BULLETS

• Complementarity of physical & ML approaches
–“Physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data”

• Hybrid #1: Physical knowledge can be integrated into ML framework –Network architecture
–Physical constraints in the cost function
–Expansion of the training dataset for undersampled domains (ie physically based data augmentation)

• Hybrid #2: ML into physical – eg Emulation of specific parts of a physical for computational efficiency

Artificial intelligence alone won’t solve the complexity of Earth sciences

Tuesday, March 5th, 2019


Sunday, March 3rd, 2019

A lineage-resolved molecular atlas of C elegans embryogenesis at #singlecell resolution, w/ @JIsaacMurray, @JunhyongKim, @ColeTrapnell & B Waterston Compares the known cell lineage of the worm to trees based on UMAP cell-type clusters. Remarkable agreement

A single-cell molecular map of mouse gastrulation and early organogenesis | Nature

Friday, March 1st, 2019

The single-cell transcriptional landscape of mammalian organogenesis

Friday, March 1st, 2019

Using single-cell combinatorial indexing, we profiled the
transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment.

Small research teams ‘disrupt’ science more radically than large ones

Friday, March 1st, 2019

“The authors describe and validate a citation-based index of ‘disruptiveness’ that has previously been proposed for patents6. The intuition behind the index is straightforward: when the papers that cite a given article also reference a substantial proportion of that article’s references, then the article can be seen as consolidating its scientific domain. When the converse is true — that is, when future citations to the article do not also acknowledge the article’s own intellectual forebears — the article can be seen as disrupting its domain.

The disruptiveness index reflects a characteristic of the article’s underlying content that is clearly distinguishable from impact as conventionally captured by overall citation counts. For instance, the index finds that papers that directly contribute to Nobel prizes tend to exhibit high levels of disruptiveness, whereas, at the other extreme, review articles tend to consolidate their fields.”