Posts Tagged ‘dsg’

Artificial intelligence alone won’t solve the complexity of Earth sciences

Monday, September 2nd, 2019

Artificial intelligence alone won’t solve the complexity of Earth sciences http://www.nature.com/articles/d41586-019-00556-5

Quantifying the impact of public omics data.

Sunday, August 11th, 2019

similar idea to quantifying the value of the data
https://www.ncbi.nlm.nih.gov/pubmed/31383865

A Decade Ago, a Scientist Promised a Brain Simulation in a Decade

Saturday, August 3rd, 2019

QT:{{”
“In a recent paper titled “The Scientific Case for Brain Simulations,” several HBP scientists argue that big simulations “will likely be indispensable for bridging the scales between the neuron and system levels in the brain.” In other words: Scientists can look at the nuts and bolts of how neurons work, and they can study the behavior of entire organisms, but they need simulations to show how the former creates the latter. The paper’s authors draw a comparison to weather forecasts, in which an understanding of physics and chemistry at the scale of neighborhoods allows us to accurately predict temperature, rainfall, and wind across the whole globe.”
“}}

https://www.theatlantic.com/science/archive/2019/07/ten-years-human-brain-project-simulation-markram-ted-talk/594493/

Deep learning and process understanding for data-driven Earth system science | Nature

Tuesday, March 5th, 2019

https://www.nature.com/articles/s41586-019-0912-1
Perspective | Published: 13 February 2019
Deep learning and process understanding for data-driven Earth system science Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais & Prabhat
Nature volume 566, pages195–204 (2019)

QT:[[”
Figure 3 presents a system-modelling view that seeks to integrate machine learning into a system model. As an alternative perspective, system knowledge can be integrated into a machine learning frame- work. This may include design of the network architecture36,79, physical constraints in the cost function for optimization58, or expansion of the training dataset for undersampled domains (that is, physically based data augmentation)80.

Surrogate modelling or emulation
See Fig. 3 (circle 5). Emulation of the full (or specific parts of) a physical model can be useful for computational efficiency and tractability rea- sons. Machine learning emulators, once trained, can achieve simulations orders of magnitude faster than the original physical model without sacrificing much accuracy. This allows for fast sensitivity analysis, model parameter calibration, and derivation of confidence intervals for the estimates.

(2) Replacing a ‘physical’ sub-model with a machine learning model
See Fig. 3 (circle 2). If formulations of a submodel are of semi-empirical nature, where the functional form has little theoretical basis (for example, biological processes), this submodel can be replaced by a machine learning model if a sufficient number of observations are available. This leads to a hybrid model, which combines the strengths of physical modelling (theoretical foundations, interpretable compartments) and machine learning (data-adaptiveness).

Integration with physical modelling
Historically, physical modelling and machine learning have often been treated as two different fields with very different scientific paradigms (theory-driven versus data-driven). Yet, in fact these approaches are complementary, with physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data and are amenable to finding unexpected patterns (surprises).

A success story in the geosciences is weather
prediction, which has greatly improved through the integration of better theory, increased computational power, and established observational systems, which allow for the assimilation of large amounts of data into the modelling system2
. Nevertheless, we can accurately predict the evolution
of the weather on a timescale of days, not months.
“]]

# REFs that I liked
ref 80

ref 57
Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

# some key BULLETS

• Complementarity of physical & ML approaches
–“Physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data”

• Hybrid #1: Physical knowledge can be integrated into ML framework –Network architecture
–Physical constraints in the cost function
–Expansion of the training dataset for undersampled domains (ie physically based data augmentation)

• Hybrid #2: ML into physical – eg Emulation of specific parts of a physical for computational efficiency

Artificial intelligence alone won’t solve the complexity of Earth sciences

Tuesday, March 5th, 2019

https://www.nature.com/articles/d41586-019-00556-5

Google’s DeepMind aces protein folding | Science | AAAS

Saturday, December 15th, 2018

Thought @RobertFService’s news piece had a good angle on this: AlphaFold won a lot but by small margins
https://www.ScienceMag.org/news/2018/12/google-s-deepmind-aces-protein-folding Google’s DeepMind aces protein folding CC @wgibson

Why “Many-Model Thinkers” Make Better Decisions

Saturday, November 24th, 2018

Why “Many-Model Thinkers” Make Better Decisions
https://HBR.org/2018/11/why-many-model-thinkers-make-better-decisions Intuitive description of #MachineLearning concepts. Focuses on practical business contexts (eg hiring) & explains how #ensemble models & boosting can make better choices

QT:{{”
“The agent based model is not necessarily better. It’s value comes from focusing attention where the standard model does not.

The second guideline borrows the concept of boosting, …Rather than look for trees that predict with high accuracy in isolation, boosting looks for trees that perform well when the forest of current trees does not.

A boosting approach would take data from all past decisions and see where the first model failed. …The idea of boosting is to go searching for models that do best specifically when your other models fail.

To give a second example, several firms I have visited have hired computer scientists to apply techniques from artificial intelligence to identify past hiring mistakes. This is boosting in its purest form. Rather than try to use AI to simply beat their current hiring model, they use AI to build a second model that complements their current hiring model. They look for where their current model fails and build new models to complement it.”
“}}

Cloud computing for genomic data analysis and collaboration | Nature Reviews Genetics

Tuesday, October 30th, 2018

https://www.nature.com/articles/nrg.2018.8

A brief history of data science

Saturday, September 22nd, 2018

https://twitter.com/YaleData/status/1043196384403443712

overview of the GPCR literature as indexed on Wikidata

Friday, February 9th, 2018

https://twitter.com/EvoMRI/status/960633150115282944
Here’s an overview of the GPCR literature as indexed on Wikidata: https://tools.wmflabs.org/scholia/topic/Q38173