Posts Tagged ‘cbb752’

Midsummer Course Sharpens Skills in Informatics and Data Science | Yale School of Medicine

Sunday, August 11th, 2019

https://medicine.yale.edu/news/article.aspx?id=20962

Excellent review for cbb752 students

Monday, April 1st, 2019

Balanced perspective on history and future of genomic medicine by Jay Shendure https://www.cell.com/cell/fulltext/S0092-8674(19)30152-7

Deep learning and process understanding for data-driven Earth system science | Nature

Tuesday, March 5th, 2019

https://www.nature.com/articles/s41586-019-0912-1
Perspective | Published: 13 February 2019
Deep learning and process understanding for data-driven Earth system science Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais & Prabhat
Nature volume 566, pages195–204 (2019)

QT:[[”
Figure 3 presents a system-modelling view that seeks to integrate machine learning into a system model. As an alternative perspective, system knowledge can be integrated into a machine learning frame- work. This may include design of the network architecture36,79, physical constraints in the cost function for optimization58, or expansion of the training dataset for undersampled domains (that is, physically based data augmentation)80.

Surrogate modelling or emulation
See Fig. 3 (circle 5). Emulation of the full (or specific parts of) a physical model can be useful for computational efficiency and tractability rea- sons. Machine learning emulators, once trained, can achieve simulations orders of magnitude faster than the original physical model without sacrificing much accuracy. This allows for fast sensitivity analysis, model parameter calibration, and derivation of confidence intervals for the estimates.

(2) Replacing a ‘physical’ sub-model with a machine learning model
See Fig. 3 (circle 2). If formulations of a submodel are of semi-empirical nature, where the functional form has little theoretical basis (for example, biological processes), this submodel can be replaced by a machine learning model if a sufficient number of observations are available. This leads to a hybrid model, which combines the strengths of physical modelling (theoretical foundations, interpretable compartments) and machine learning (data-adaptiveness).

Integration with physical modelling
Historically, physical modelling and machine learning have often been treated as two different fields with very different scientific paradigms (theory-driven versus data-driven). Yet, in fact these approaches are complementary, with physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data and are amenable to finding unexpected patterns (surprises).

A success story in the geosciences is weather
prediction, which has greatly improved through the integration of better theory, increased computational power, and established observational systems, which allow for the assimilation of large amounts of data into the modelling system2
. Nevertheless, we can accurately predict the evolution
of the weather on a timescale of days, not months.
“]]

# REFs that I liked
ref 80

ref 57
Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

# some key BULLETS

• Complementarity of physical & ML approaches
–“Physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data”

• Hybrid #1: Physical knowledge can be integrated into ML framework –Network architecture
–Physical constraints in the cost function
–Expansion of the training dataset for undersampled domains (ie physically based data augmentation)

• Hybrid #2: ML into physical – eg Emulation of specific parts of a physical for computational efficiency

What is the Difference Between a Parameter and a Hyperparameter?

Thursday, February 28th, 2019

https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/

This Person Does Not Exist

Thursday, February 28th, 2019

https://thispersondoesnotexist.com/

Vertabelo – Design Your Database Online

Wednesday, January 23rd, 2019

https://www.vertabelo.com/

(4) MIT Computational Biology: Genomes, Networks, Evolution, Health – Fall 2018 – 6.047/6.878/HST.507 – YouTube

Saturday, December 22nd, 2018

https://www.youtube.com/playlist?list=PLypiXJdtIca6GBQwDTo4bIEDV8F4RcAgt

Explaining Odds Ratios

Saturday, November 17th, 2018

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2938757/

Comparing Classifiers · Martin Thoma

Thursday, June 7th, 2018

Great talk today @Yale by @MooreJH. He describes flow of calculations in biomed. #DataScience, including feature construction, machine learning & downstream interpretation.

Great slide on ML derived from
https://martin-thoma.com/comparing-classifiers

Carl Zimmer To Speak At Bio-IT World, Tackle Heredity, Genes, And How Our Understanding Of The Two Is Changing – Bio-IT World

Saturday, May 12th, 2018

http://www.bio-itworld.com/2018/05/09/carl-zimmer-to-speak-at-bio-it-world-tackle-heredity-genes-and-how-our-understanding-of-the-two-is-changing.aspx

QT:{{”
“It was a huge amount of fun watching them take that raw data and put it through their own pipelines,” Zimmer told me, but he also felt uncomfortable pointing out discrepancies to the scientists he worked with. “I still remember, I was sitting down with Chris Mason at Weill Cornell. He and his students were so enthusiastically going through their findings with me… and they showed me, among other things, how many SNPs I had. Not too long beforehand I’d gone through the same experience with Mark Gerstein and his team at Yale, and their numbers for my SNPs were off by hundreds of thousands. … It was a little awkward with Chris, but I just said, ‘Hey, I got a very different number from Mark Gerstein,’ and Chris just shrugged and said, ‘Oh yeah, that happens.’”

It turns out, there’s a lot about our current understanding of our genes and how we pass them on that isn’t perfectly clear cut. “}}