Posts Tagged ‘data’

Big Data’s Promise and Limitations : The New Yorker

Saturday, May 4th, 2013

http://www.newyorker.com/online/blogs/elements/2013/04/steamrolled-by-big-data.html

Facebook ‘Likes’ reveal more about you than you think | Detroit Free Press | freep.com

Sunday, March 17th, 2013

http://www.freep.com/usatoday/article/1975777

Twitter users forming tribes with own language, tweet analysis shows

Sunday, March 17th, 2013

http://m.guardiannews.com/news/datablog/2013/mar/15/twitter-users-tribes-language-analysis-tweets

Thoughts on “A few useful things to know about machine learning”

Thursday, February 14th, 2013

Some thoughts on a good paper giving intuition on machine learning approaches

http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755

In particular, the paper gives good intuition about:

– overfitting (e.g. how it’s related to multiple testing & bias v variance)
– the curse of dimensionality (in high-D all neighbors look the same)
– the non-practicality of theoretical guarantees
– how different frontiers can give the same prediction
– ensembles (which reduce variance greatly without increasing bias that much)
– ensembles vs Bayesian model averaging (which essentially select the best model)

Illumina Platinum Genomes

Sunday, February 10th, 2013

http://www.illumina.com/platinumgenomes/
A family trio (NA12877, NA12878, and NA12882) sequenced on a HiSeq 2000 system. An individual (NA18507) sequenced on a HiSeq 2500 system.

A few useful things to know about machine learning

Saturday, February 9th, 2013

homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755