Monday, May 16, 2011
Google is not infallible
This blogger uses Google’s Ngram search engine (which looks like it’s a search engine limited to books catalogued by Google) to look for the word biotechnology. And he gets an anomolous bump between 1902 and 1910. My natural inclination is to think this is a Y2K issue (e.g., the year 08 improperly noted as 1908 instead of 2008). That’s probably the big reason.
But, I also limited the look for books from 1890-1910, and came up with this search list. The number 3 hit was this book. And one of the readers helpfully noted:
“This volume is from 1993. Some idiot looked at the art on the cover and had assumed it’s the actual date.”
The number 1 hit is a book by “University of Maryland, College Park. Institute for Philosophy and Public Policy”, which Google flagged as 1900. Call me crazy, but that sounds like a name you’d invent in the 21st century. Doing a quick Google search brought me to their current website. It’s a quarterly publication with Volume 21 in Fall of 2001. Volume 10-13 would therefore be from 1990 - 1993.
I’m not sure how much human intervention there is in Google’s date process, but obviously it’s not enough. Any Ngram hit that comes back from earlier than 1912 should be treated as (far?) more likely to be wrong than a hit from the 21st century.
UPDATE: Google Books responds, via Mike/1:


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date