Posts Tagged ‘textmining’

Small research teams ‘disrupt’ science more radically than large ones

Friday, March 1st, 2019

“The authors describe and validate a citation-based index of ‘disruptiveness’ that has previously been proposed for patents6. The intuition behind the index is straightforward: when the papers that cite a given article also reference a substantial proportion of that article’s references, then the article can be seen as consolidating its scientific domain. When the converse is true — that is, when future citations to the article do not also acknowledge the article’s own intellectual forebears — the article can be seen as disrupting its domain.

The disruptiveness index reflects a characteristic of the article’s underlying content that is clearly distinguishable from impact as conventionally captured by overall citation counts. For instance, the index finds that papers that directly contribute to Nobel prizes tend to exhibit high levels of disruptiveness, whereas, at the other extreme, review articles tend to consolidate their fields.”

How to identify anonymous prose – Johnson

Saturday, November 3rd, 2018

How to identify anonymous prose Interesting parallels between #textmining & genome seq. analysis (eg finding characteristic k-mers for a bacterial species)

Reading by the Numbers: When Big Data Meets Literature

Sunday, November 12th, 2017

Reading by the Numbers: When #BigData Meets Literature Distant reading as a complement to close reading for literary texts. Perhaps a useful dichotomy for biosequences too!

“Literary criticism typically tends to emphasize the singularity of exceptional works that have stood the test of time. But the canon, Mr. Moretti argues, is a distorted sample. Instead, he says, scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

“We know how to read texts,” he wrote in a much-quoted essay included in his book “Distant Reading,” which won the 2014 National Book Critics Circle Award for Criticism. “Now let’s learn how to not read them.””


Wikipedia shapes language in scientific papers

Friday, October 27th, 2017

"Wikipedia is one of the world’s most popular websites, but scientists rarely cite it in their papers. Despite this, the online encyclopedia seems to be shaping the language that researchers use in papers, according to an experiment showing that words and phrases in recently published Wikipedia articles subsequently appeared more frequently in scientific papers"

“Thompson and co-author Douglas Hanley, an economist at the University of Pittsburgh in Pennsylvania, commissioned PhD students to write 43 chemistry articles on topics that weren’t yet on Wikipedia. In January 2015, they published a randomized set of half of the articles to the site. The other half, which served as control articles, weren’t uploaded.

Using text-mining techniques to measure the frequency of words, they found that the language in the scientific papers drifted over the study period as new terms were introduced into the field. This natural drift equated to roughly one new term for every 250 words, Thompson told Nature. On top of those natural changes in language over time, the authors found that, on average, another 1 in every 300 words in a scientific paper was influenced by language in the Wikipedia article.”


#Wikipedia shapes lang. in science Seeding it with new pages & watching them evolve (v ctrls) as a type of soc. expt

What the Enron E-mails Say About Us

Sunday, August 6th, 2017

Mark as Read highlights #Enron email as a canonical corpus for #textmining, w/ >3K academic papers published on this

A scored human protein-protein interaction network to catalyze genomic interpretation : Nature Methods : Nature Research

Friday, December 9th, 2016

Scored…PPI #network to catalyze genomic interpretation >500k links from lit. mining; up weights small-scale expt

Who’s downloading pirated papers?

Monday, May 2nd, 2016

“Bill Hart-Davidson, MSU’s associate dean for graduate education, suggests that the likely answer is “text-mining,” the use of computer programs to analyze large collections of documents to generate data. When I called Hart-Davidson, I suggested that the East Lansing Sci-Hub scraper might be someone from his own research team. But he laughed and said that he had no idea who it was. But he understands why the scraper goes to Sci-Hub even though MSU subscribes to the downloaded ” “}}

Who’s downloading pirated papers? Everyone freely available data on @scihub usage

Research profiles: A tag of one’s own : Naturejobs

Saturday, October 10th, 2015

A tag of one’s own a convincing case for signing up for an ORCHID identifier & linking it to your papers

Yahoo To Shut Down Qwiki, Yahoo Education And The Yahoo Directory | TechCrunch

Friday, October 3rd, 2014

Yahoo To Shut Down…Directory Total victory for #textmining (ie Google) over manual #ontologies for web organization

What Can Article-Level Metrics Do for You?

Monday, September 1st, 2014

What Can Article-Level Metrics Do for You Wide distribution of #cites for @PLOSBiology papers; median 19 but 10% >50