Posts Tagged ‘mining’

Human Mobility Characterization from Cellular Network Data | January 2013 | Communications of the ACM

Thursday, February 14th, 2013

nice maps

http://cacm.acm.org/magazines/2013/1/158775-human-mobility-characterization-from-cellular-network-data/fulltext

Thoughts on “A few useful things to know about machine learning”

Thursday, February 14th, 2013

Some thoughts on a good paper giving intuition on machine learning approaches

http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755

In particular, the paper gives good intuition about:

– overfitting (e.g. how it’s related to multiple testing & bias v variance)
– the curse of dimensionality (in high-D all neighbors look the same)
– the non-practicality of theoretical guarantees
– how different frontiers can give the same prediction
– ensembles (which reduce variance greatly without increasing bias that much)
– ensembles vs Bayesian model averaging (which essentially select the best model)

A few useful things to know about machine learning

Saturday, February 9th, 2013

homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755

Digging for Drug Facts | October 2012 | Communications of the ACM

Saturday, February 9th, 2013

http://cacm.acm.org/magazines/2012/10/155549-digging-for-drug-facts/fulltext

Inside the Secret World of the Data Crunchers Who Helped Obama Win

Sunday, November 11th, 2012

http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

Competing on Analytics – Harvard Business Review

Sunday, November 11th, 2012

http://www2.mccombs.utexas.edu/faculty/Maytal.Saar-Tsechansky/Teaching/Documents/Harvard%20Business%20Review%20Online%20%20Competing%20on%20Analytics.htm http://hbr.org/2006/01/competing-on-analytics/ar/1

An early paper on big data analytics

Exploring the human genome with functional maps.

Sunday, November 11th, 2012

This paper has: (1) Large-scale datasets compiled from literature and databases, (2) comprehensive gold standards for positive and negative samples, (3) a classifier algorithm (regularized Bayesian), and (4) further analysis beyond “functional prediction”, including an interaction network. It predicts a list of genes having some possible functions, and the authors have experimentally validated them.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694471/

Genome Res. 2009 Jun;19(6):1093-106. Epub 2009 Feb 26.
Exploring the human genome with functional maps.
Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG.

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.

Monday, November 5th, 2012

This paper introduces a new method for detecting copy number variants in cancer genomes that addresses deficiencies of previous detection methods. The new method, dubbed HHCRF by the authors, adds the use of sequential correlations in selecting classification features for inferring copy numbers and identifying clinically relevant genes. This improvement results in higher accuracy on noisy data, and the identification of more clinically relevant genes, relative to previous methods. These results were obtained by testing HHCRF on both simulated array-CGH microarray data, and on actual breast cancer, uveal melanoma, and bladder tumor datasets.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677736/
Bioinformatics. 2009 May 15;25(10):1307-13. Epub 2008 Dec 3. Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.
Barutcuoglu Z, Airoldi EM, Dumeaux V, Schapire RE, Troyanskaya OG.

Article: Graph startup Neo raises $11M as specialized databases take hold

Sunday, November 4th, 2012

http://gigaom.com/data/graph-startup-neo-raises-11m-as-specialized-databases-take-hold
see open-source graph nosql DB : http://neo4j.org/

Article: Graph startup Neo raises $11M as specialized databases take hold

Saturday, November 3rd, 2012

http://gigaom.com/data/graph-startup-neo-raises-11m-as-specialized-databases-take-hold