Amazon: archiving rare books
The commercial race to digitize the World’s libraries has, for the most part, been a two horse race with Microsoft and Google battling to sign up the worlds libraries and gain the rights to archive and index their rare, obscure or out of print titles.
Yesterday, in what seems like a natural fit, online retailer Amazon (which started famously as an online book store) joined the race. In partnership with high-speed scanning company Kirtas Technologies (the same company that has helped Microsoft), Amazon will begin to archive rare titles from public and university libraries special collections.
Unlike Google and Microsoft, which are largely focusing on making the literary content available freely, Amazon will use the archives it creates to offer reproductions for sale.
The project will be administered through Amazon’s print-on-demand publishing service BookSurge which specializes in printing and selling out of print titles. Amazon bought BookSurge in 2005.
The initial partners in the project will be the Emory University libraries, the University of Maine and the public libraries of Cincinnati, Ohio and Toronto (Canada). The libraries will receive an undisclosed share of revenue from any titles sold. Only books that are in the public domain, or for which the libraries own the rights, will be included.
Missing from Amazon’s new effort, and also unavailable to Google and Microsoft;s digitization projects will be many of the papers of Leonardo Da Vinci. In an unrelated story that is tangentially related, Wired reported yesterday on a European Union funded project to archive and make publicly available as many as 12,000 pages of Leonardo Da Vinci’s papers (which are organized in manuscripts called Codices).
(It’s possible Microsoft may have 72 Da Vinci pages in its collection too, but that’s only if Bill Gates is feeling charitable. Bill Gates is the world’s only private owner of a Da Vinci Codex. Gates bought the Leicester Codex in 1994 for nearly $31m. And absent Microsoft getting an exclusive on it, the Da Vinci papers are one jewel the three companies won’t be able to fight over.)
The Da Vinci archive, which is called the e-Leo project, will eventually have a searchable digital archive of much of Da Vinci known works including the Madrid Codices, the Codex Atlanticus, the Windor Folios and notebooks from the Institut de France. All are planned to be available for free search.
The project is terrific example not of just how much material the three companies are racing to archive, but also, of the technical difficulties that can be involved when trying to create modern reproductions of antiquities.
For the e-Leo project, Italian engineers had to first deal with character recognition scanning of 15th century Italian documents hand written in Da Vinci’s famous mirror-image, backwards short hand. After laboriously making sure the text was properly scanned, text-mining company Synthema then spent months with academics and engineers to create a framework for semantically searching the documents. Expertise in 15th century Italian was an absolute prerequisite.
Google will face similar challenges with some of the documents it has gained access to through partnership with some Indian universities.
For Da Vinci fans, English indexes are expected in a few months, with an English index of Da Vinci drawings expected in about a year. Eventually, it is hoped that the document archive will also have multiple language translations but there is no timeline for that.