Posts Tagged ‘#datamining’

The Downside of Baseball’s Data Revolution—Long Games, Less Action – WSJ

Tuesday, October 31st, 2017

The Downside of #Baseball’s Data Revolution – Long Games, Less Action It’s now a game for stat analysis, not thrills

Quick comment on AI for pharma?

Tuesday, July 18th, 2017

Please find the article at link:

Is big pharma really on cusp of AI shake-out?

By: Pharma IQ
Posted: 07/14/2017


The promises of “disruptive technologies” have failed to live up to expectations in the past. For example, the development of ‘high throughput screening’ – a process that employs robotics to conduct millions of chemical, genetic and pharmacological tests in rapid time – in the 1990s failed to significantly reduce R&D inefficiencies and offered sporadic success rates.

“The major cost in drug R&D is last-phase clinical trials,” said Dr Mark Gerstein, professor of biomedical informatics at Yale University. “It is not clear whether AI can be as useful for these as it has been in target selection for the initial phases.”

“One of the first principles of data mining is that history is a good predictor of the future. AI has a track record of not living up to its expectations and therefore caution about how great its impact will be in the healthcare industry is now warranted.”

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing (The Datasaurus Dozen) | Autodesk Research

Tuesday, May 16th, 2017

great viz

An Introduction to Statistical Learning: with Applications in R – Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani – Google Books

Saturday, February 13th, 2016

They’re Watching You at Work

Sunday, September 21st, 2014

They’re Watching You at Work Will HR analytics be a corporate big brother or personal coach? #Datamining & #Privacy

My public notes from KDD 2014

Sunday, August 31st, 2014 (need password)

PLOS Computational Biology: Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

Sunday, August 31st, 2014

– apply to metabric consortium
– 17K clin feat. + ~50K gene exp. + ~30K CNVs ==to-predict==> 10yr survival – uses CI instead of AUC for real valued predictions
– combine collaboration & competition to beat the baseline (cox regression on only clinical features)
– mol. feat. on their own don’t work well due to the curse of dimensionality – features more important than the learning method

Pandey mentions: Cancer Survival Analysis through
Competition-Based…Modeling, using Human #Ensembles #kdd2014

IEEE Xplore Abstract – A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Sunday, August 24th, 2014

Pandey mentions: Comparative Analysis of #Ensemble Classifiers [eg mean agg. or stacking]…in Genomics #kdd2014

performance-diversity tradeoff: should one incl. higher performance, lower diversity ones…. but still adding diversity is good

related to

Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems

Sunday, July 13th, 2014

Rich C, Alexandru N-M, Geoff C, Alex K (2004) Ensemble selection from libraries of
models. Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada: ACM.

Thomas GD (2000) Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems: Springer-Verlag.

.@deniseOme Good ref is TG Dietterich #Ensemble Methods in
#MachineLearning MCS ’00 Not rel. to @ensembl #ismb #afp14

ref 17 & 18

Information Fiduciary: Solution to Facebook digital gerrymandering | New Republic

Saturday, June 14th, 2014

Facebook Could Decide an Election—Without You Ever Finding Out. @zittrain advocates regulating digital gerrymandering