Alignment-free sequence comparison: benefits, applications, and tools
Monday, November 13th, 2017Might be useful for noncoding comparisons
Alignment-free seq. comparison: benefits, apps & tools
https://GenomeBiology.biomedcentral.com/articles/10.1186/s13059-017-1319-7 Great tidbits, viz: Shannon asked von Neumann what to call his info measure – “Why don’t you call it entropy…no one understands entropy…so in any discussion, you’ll be in a position of advantage.”
QT:{{”
“Reportedly, Claude Shannon, who was a mathematician working at Bell Labs, asked John von Neumann what he should call his newly developed measure of information content; “Why don’t you call it entropy,” said von Neumann, “[…] no one understands entropy very well so in any discussion you will be in a position of advantage […]” []. The concept of Shannon entropy came from the observation that some English words, such as “the” or “a”, are very frequent and thus unsurprising” ….
“The calculation of a distance between sequences using complexity (compression) is relatively straightforward (Fig. ). This procedure takes the sequences being compared (x = ATGTGTG and y = CATGTG) and concatenates them to create one longer sequence (xy = ATGTGTGCATGTG). If x and y are exactly the same, then the complexity (compressed length) of xy will be very close to the complexity of the individual x or y. However, if x and y are dissimilar, then the complexity of xy (length of compressed xy) will tend to the cumulative complexities of x and y.”
…
“Intriguingly, BLOSUM matrices, which are the most commonly used substitution matrix series for protein sequence alignments, were found to have been miscalculated years ago and yet produced significantly better alignments than their corrected modern version (RBLOSUM) []; this paradox remains a mystery.”
“}}