PacBio Blog: Data Release: ~54x Long-Read Coverage for PacBio-only De Novo Human Genome Assembly

Sunday, August 31st, 2014

We are pleased to make publicly available a new shotgun sequence dataset of long PacBio® reads from a human DNA sample. We previously released sequence data using Single Molecule, Real-Time (SMRT®) Sequencing of ~10x coverage of this sample, sufficient for
reference-based detection of structural variation. Today we expand on that release with additional data that increases the total sequencing coverage to ~54x. This long-read data has enabled the generation of the first de novohuman genome assembly from PacBio-only sequence reads. Download the 54x long-read coverage dataset.

The dataset was generated from sequencing a well-studied human cell line (CHM1htert), which is being utilized as part of a National Institutes of Health project to sequence and assemble an alternate reference genome (the “platinum genome”). This NIH project is being led by Rick Wilson from Washington University at St. Louis and Evan Eichler from the University of Washington in collaboration with investigators from the National Center for Biotechnology Information. “}}

Pacbio MCF-7 transscriptome dataset

Wednesday, July 30th, 2014



Stanford Large Network Dataset Collection

Monday, December 9th, 2013

.@randal_olson @thatdnaguy Lots of interesting #network datasets available from . Thanks for pointing out this site!

CCLE WGS dataset

Monday, December 9th, 2013

This is the URL for ccle with WGS data:

They have an XML describe all the datasets and there are WGS as well. (This should be publicly available WGS.)


The Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, the Novartis Institutes for Biomedical Research, and the Genomics Institute of the Novartis Research Foundation. CCLE will conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models, develop an integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and translate cell line
integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.

NOTE: these data sets do NOT require any special access or authorization status. However, you will need to use the public access token/key URL, or your secure key file that you may have downloaded for projects like TCGA that use a secure key file.