Archive for the ‘critsum0mg’ Category

Landscape of somatic retrotransposition in human cancers. – PubMed – NCBI

Friday, May 27th, 2016

Landscape of somatic retrotransposition in human cancers
http://science.sciencemag.org/content/337/6097/967.long 194 insertions in 43 WGS, mostly L1s w. ~50% near genes

Landscape of Somatic Retrotransposition in Human Cancers

Eunjung Lee1,2,
Rebecca Iskow3,
Lixing Yang1,
Omer Gokcumen3,
Psalm Haseley1,2,
Lovelace J. Luquette III1,
Jens G. Lohr4,5,
Christopher C. Harris6,
Li Ding6,
Richard K. Wilson6,
David A. Wheeler7,
Richard A. Gibbs7,
Raju Kucherlapati2,8,
Charles Lee3,
Peter V. Kharchenko1,9,*,
Peter J. Park1,2,9,*,
The Cancer Genome Atlas Research Network

Science 24 Aug 2012:
Vol. 337, Issue 6097, pp. 967-971
DOI: 10.1126/science.1222077

The paper describes the analysis of transposable elements (TE) insertions at single nucleotide resolution in 43 high coverage whole genome datasets from five cancer types. The authors developed a computational method that uses as input paired-end whole genome sequence data from tumor and normal sample aligned against a reference genome and a custom repeat assembly of TE sequences to detect the position and mechanism of TE insertion. The method identified 194 TE insertions (183 L1s, 10 Alus and 1 ERV). The diversity in the frequency of TE insertions in the same cancer type (ranging from 45-60 to 106 events per tumour) suggests the presence of tumour subtypes with respect to TE activity.

By intersecting the 194 TE with genome annotation, the authors found that 64 TE are in known genes (in UTRs and introns), most of which are implicated in tumour suppressor functions. Also, the TE events targeted genes that are frequently/recurrently mutated, suggesting that TE insertions can potentially contribute to cancer development. Gene expression analysis showed that TE insertion results in significantly decreasing the expression levels for the host gene. TE orientation also has an impact on the expression level, with antisense insertion being less disruptive.

Comparing the germline and somatic insertion sites shows notable differences. Germline L1s are significantly more depleted from genes compared to somatic L1s. Somatic L1s are significantly overrepresented within regions of DNA hypomethylation suggesting the DNA
hypomethylation promoted L1 integration.

Lalonde E*, Ishkanian AS*, ….P’ng C, Collins CC, Squire JA, Jurisica I, Cooper C, Eeles R, Pintilie M, Dal Pra A, Davicioni E, Lam WL, Milosevic M, Neal DE, van der Kwast T, Boutros PC, Bristow RG (2014) “Tumour genomic a nd microenvironmental heterogeneity as integrated predictors for prostate cancer recurrence: a retrospective study” La ncet Oncology 15(13):1521-1532 (PMID: 25456371)

Tuesday, May 17th, 2016

Genomic & microenvironmental heterogeneity as integrated predictors for prostate #cancer recurrence
http://www.ncbi.nlm.nih.gov/pubmed/25456371 CNVs & hypoxia

* Lalonde E*, Ishkanian AS*, ….P’ng C, Collins CC, Squire JA, Jurisica I, Cooper C, Eeles R, Pintilie M, Dal Pra A, Davicioni E, Lam WL, Milosevic M, Neal DE, van der Kwast T, Boutros PC, Bristow RG (2014) “Tumour genomic and microenvironmental heterogeneity as integrated predictors for prostate cancer recurrence: a retrospective study” Lancet Oncology 15(13):1521-1532 (PMID: 25456371)

The novelty of the paper is that it is the first study integrating DNA-based signatures and microenviroment-based signature for cancer prognosis. The authors found four prognostic indices, i.e. cancer genomic subtype (generated from clusters of CNV profiles), genomic instability (represented by the percentage of genome alteration), DNA signature (276 genes identified from random forests), and tumor hypoxia (the microenvironment signature), to be effective in predicting patient survival in different groups. Standard clinical univariate and multivariate analyses were performed.

Cell lineage analysis in human brain using endogenous retroelements. – PubMed – NCBI

Saturday, May 7th, 2016

Cell-lineage analysis in human #brain using endogenous retroelements http://www.cell.com/neuron/abstract/S0896-6273(14)01137-4 Tracing L1 insertions w/ #singlecell sequencing

Using single cell WGS of 16 neuronal cells the authors investigated two somatic insertions of L1Hs elements in an adult human brain. Using these results the authors infer that L1 somatic insertions are infrequent and ALUs and SVAs somatic retrotransposition are extremely rare. Assessing two L1Hs insertions in 32 samples across different regions of this same adult brain, they found that while one insertion was spatially restricted (2x1cm region), the other was found across all samples of the adult brain (but not found in other tissues such as Heart, Lung, etc.). The more restricted one (L1Hs#1) is inferred to have happened during the Fetal stage (first trimester) while the broader one happened earlier, approximately 2 weeks
post-fertilization. Overall the paper is clear, concise, and simple. It answers an interesting biological question: Can retrotransposition be used as a marker of cell clonal expansion? It does, although the retrotransposition frequency is very small and SNVs might support better results for the same analysis due to their higher frequency..

TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. – PubMed – NCBI

Sunday, February 21st, 2016

TIGRA: Targeted Iterative Graph Routing Assembler for breakpoint[s ]http://GENOME.CSHLP.org/content/24/2/310.long key steps: read extraction & de Bruijn #assembly

This presents a breakpoint assembler used for many projects including 1000 Genomes. It uses a targeted iterative graph routing approach. The program consists of two steps: read extraction and then assembly. The assembly step uses a de Bruin graph-based approach to create contigs from the selected reads. A shortcoming of TIGRA is it depends on the success of the first step of the program, selection of reads that span breakpoints. Thus TIGRA is sensitive to the breakpoint annotation accuracy input. Breakpoints determined from discordant paired-end or split-end alignments and by predictors like breakdancer, delly, genomestrip are excellent for TIGRA, but those determined only by read-depth such as CNVnator and RDX are poor performers.

As input TIGRA requires putative breakpoints annotation/prediction (preferably at nucleotide level or at least within 100bp resolution) and BAM files (sequence reads aligned to reference genome).
In the read extraction TIGRA tries to select all the reads that are likely associated with the breakpoint as long ass they have at least one ned or subsegment that is confidently mapped. For known SV types, TIGRA extract reads selectively to reduce the over representation of the reference allele. The assembly step uses the a de Bruin graph-based approach to create contigs from the selected reads. For this TIGRA first uses an iterative procedure to explore multiple k-mers and thus increases the chance of assembling of low coverage reads. Next it records alternative path in the contain graph

Boutros PC…., van der Kwast T, Bristow RG* (2015) “Spatial genomic heterogeneity within localized, mult i-focal prostate cancer” Nature Genetics 47(7):736-745 (PMID: 26005866)

Monday, January 25th, 2016

Spatial genomic heterogeneity w/in…prostate #cancer
http://www.nature.com/ng/journal/v47/n7/full/ng.3315.html WGS analysis of many sites suggests divergent tumor evolution

Boutros…, van der Kwast, Bristow (2015) “Spatial genomic
heterogeneity within localized, multi-focal prostate cancer” Nature Genetics 47(7):736-745 (PMID: 26005866)

This work represents the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcomes at the level of whole-genome sequencing (WGS). Five patients, with index tumors of Gleason score 7, were subjected to a WGS protocol with spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity. In their analysis, Boutros et al, discovered recurrent amplification of MYCL, which is associated with TP53 loss. This finding is one of the first clear functional distinctions between MYC family members in prostate cancer and suggests that MYCL amplification may be preferentially localized in the index lesion. Overall, the authors believe their results are useful in the development of prognostic biomarkers that are necessary to achieve personalized prostate cancer medicine. It is important to note that such diagnostic biopsy protocols can miss regions of more aggressive cancers resulting in the patient being under-staged.

Ewing AD*, Houlahan KE…..Stuart JM, Boutros PC (2015) “Combining accurate tumour genome simulation with crow d-sourcing to benchmark somatic single nucleotide variant detection” Nature Methods 12(7):623-630 (PMID: 25984700)

Monday, December 28th, 2015

Tumor genome simulation w/ #crowdsourcing to benchmark…SNV detection http://www.nature.com/nmeth/journal/v12/n7/full/nmeth.3407.html Addresses lack of gold standards & privacy

Ewing, Houlahan…..Stuart, Boutros (2015) “Combining accurate
tumour genome simulation with crowd-sourcing to benchmark somatic
single nucleotide variant detection” Nature Methods 12(7):623-630
(PMID: 25984700)

A crowdsourced benchmark of somatic mutation detection algorithms was
introduced for the ICGC-TCGA DREAM challenge. This has the advantage
of dealing with the lack of gold standard data and the issue of
sharing private genomic data. All groups worked on three different
simulated tumor-normal pairs generated with BAMSurgeon, by directly
adding synthetic mutations to existing reads. An ensemble of
pipelines outperforms the best individual pipeline in all cases,
assessed on the basis of recall, precision and F-score.
Parameterization and genomic localization both have an effect on
pipeline performance, while characteristics of prediction errors
differed for most pipelines.

Bias from removing read duplication in ultra-deep sequencing experiments

Friday, December 25th, 2015

Bias from removing read duplication [eg from PCR amplification] in ultra-deep #sequencing
http://bioinformatics.oxfordjournals.org/content/early/2014/01/02/bioinformatics.btt771 pot. overcorrection issues

Zhou et al.

Bias from removing read duplication in ultra-deep sequencing experiments

Estimating variant allele frequency and copy number variations can be approached by counting reads. In practice, read counting is
complicated by bias from PCR amplification and from sampling coincidence. This paper assessed the overcorrection introduced while removing read duplicates. The overcorrection is a particular concern when the sequencing is ultra-deep and the insert size is short and non-variant.

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. – PubMed – NCBI

Tuesday, July 21st, 2015

http://Oqtans.org: RNAseq…in the cloud by @gxr http://bioinformatics.oxfordjournals.org/content/30/9/1300.long Distributing a tool in many ways: AMI, GIT, Galaxy workflow, &c
Nice illustration how to distribute a tool in many forms —

AMI, GIT, Galaxy workflow + more.

* Sreedharan et al. Oqtans: the RNA-seq workbench in the cloud for
complete and reproducible quantitative transcriptome analysis.

The authors describe an open source transcriptome analysis software
package, Oqtans. The package contains a variety of existing analysis
tools (from short-read alignment, transcript quantification and
expression analysis) assembled into a comprehensive workflow. The
package can be either run locally or as a virtual machine in the cloud
using the AWS. One innovative feature is the availability of comparing
the efficiency of the integrated tools on the same data set. Oqtans is
a highly modular software package that can be easily extended. It also
offers the possibility to create customized workflows based on the
integrated tools available.

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group

Saturday, May 30th, 2015

#Machinelearning applications in…genomics
http://www.nature.com/nrg/journal/v16/n6/full/nrg3920.html Nice overview of key distinctions betw generative & discriminative models

In their review, “Machine learning in genetics and genomics”, Libbrecht and Noble overview important aspects of application of machine learning to genomic data. The review presents illustrative classical genomics problems where machine learning techniques have proven useful and describes the differences between supervised, semi-supervised and unsupervised learning as well as generative and discriminative models. The authors discuss considerations that should be made when selecting the right machine learning approach depending on the biological problem and data at hand, provide general practical guidelines and suggest possible solutions to common challenges.

Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection. PLoS Genet. 2012

Sunday, April 12th, 2015

Changes in [DHS] #regulatory element activity…[over 3 primates] associated w/ altered…expression & pos. selection
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002789

DHS across 3 primates finds species specific sites associated with differential expression & positive selection

Shibata Y, Sheffield NC, Fedrigo O, Babbitt CC, Wortham M, Tewari AK, London D, Song L, Lee BK, Iyer VR, Parker SC, Margulies EH, Wray GA, Furey TS, Crawford GE*. Extensive evolutionary changes in regulatory element activity during human origins are
associated with altered gene expression and positive selection. PLoS Genet. 2012 Jun; 8(6):e1002789. doi: 10.1371/journal.pgen.1002789. Epub 2012 Jun 28. PubMed PMID: 22761590; PubMed Central PMCID: PMC3386175

SUMMARY (from csds):

The study is focused on analyzing genotype-phenotype correlation by looking at the evolution of DHS sites across three primate genomes: human, chimp and macaque. By comparing the data they were able to identify common DHS sites across the three species (sites that show similar DHS levels) and also species-specific sites. All the assays were supported by ChiP experiments. The study identified >2000 regulatory elements that were gained/lost since the divergence of
human and chimp. Looking at DNase and RNAseq data the authors show that the enrichment of regulatory elements next to genes with species-specific expression, suggests that the gain or loss of DHS sites impacts transcript abundance. The human DHS sites were enhanced for chromatin marks predictive of enhancers, while common regions were preferentially associated with promoters and insulators. By looking at species specificity, they found that species-specific DHS gains are cell type specific while both species specific DHS gains and losses are subject to positive selection. The common DHS sites are conserved and are suggested to have roles involving transcription and general housekeeping.