
Google has
announced the release of the source of an old OCR software called Tesseract in
source.
"In a nutshell, we are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing."