July 05, 2010, at 10:23 AM
by ddv -
Added lines 8-9:
June 08, 2010, at 07:22 AM
by ddv -
Added lines 8-11:
March 29, 2010, at 02:56 PM
by ddv -
Added lines 8-11:
February 01, 2010, at 09:56 AM
by 96.61.159.1 -
Changed lines 8-9 from:
to:
February 01, 2010, at 09:37 AM
by 96.61.159.1 -
Added lines 7-11:
Added lines 70-72:
December 03, 2009, at 01:53 PM
by 65.82.99.253 -
Added lines 6-8:
Added line 10:
October 31, 2009, at 09:20 PM
by 96.61.159.1 -
Added lines 6-8:
September 07, 2009, at 02:34 PM
by Del -
Changed lines 6-7 from:
to:
August 15, 2009, at 12:35 PM
by 96.61.159.1 -
Added lines 6-10:
June 05, 2009, at 11:42 AM
by 65.82.99.253 -
Added lines 6-7:
May 03, 2009, at 07:12 AM
by 69.128.207.178 -
Added lines 6-7:
April 01, 2009, at 02:17 PM
by Del -
Added lines 6-7:
March 01, 2009, at 09:29 PM
by Del -
Added lines 6-7:
February 09, 2009, at 01:44 PM
by 170.190.43.192 -
Added lines 6-7:
January 01, 2009, at 10:48 PM
by Del -
Changed lines 6-7 from:
to:
December 29, 2008, at 08:15 AM
by Del -
Added lines 38-39:
December 04, 2008, at 07:49 PM
by 69.128.205.204 -
Added lines 6-8:
November 03, 2008, at 11:18 AM
by Del -
Changed lines 22-23 from:
to:
November 01, 2008, at 12:26 PM
by ddv -
Added lines 6-7:
October 10, 2008, at 03:42 PM
by Del -
Added lines 4-5:
Changed lines 7-8 from:
to:
September 29, 2008, at 08:48 PM
by Del -
Added line 3:
Changed lines 5-6 from:
to:
September 29, 2008, at 06:46 PM
by Del -
Added lines 3-4:
September 26, 2008, at 10:43 AM
by Del -
Changed line 28 from:
to:
September 26, 2008, at 10:42 AM
by Del -
Changed line 28 from:
to:
September 26, 2008, at 10:42 AM
by Del -
Changed line 28 from:
to:
September 26, 2008, at 10:42 AM
by Del -
Added lines 1-2:
(:table border=0 cellpadding=5 cellspacing=0:)
(:cell:)
Deleted lines 8-9:
Deleted lines 20-22:
Changed lines 27-31 from:
to:
(:cell:)
(:Google2:)
(:tableend:)
August 30, 2008, at 05:14 PM
by Del -
Changed lines 21-23 from:
to:
August 10, 2008, at 10:23 PM
by Del -
Deleted lines 14-15:
Changed lines 20-21 from:
to:
August 10, 2008, at 10:22 PM
by Del -
Changed lines 15-16 from:
to:
August 10, 2008, at 10:21 PM
by Del -
Added lines 15-16:
August 02, 2008, at 09:44 AM
by 207.88.181.2 -
Added lines 1-2:
July 08, 2008, at 11:31 AM
by 65.82.99.1 -
Added lines 1-2:
May 19, 2008, at 07:23 AM
by 71.88.206.87 -
Changed lines 1-3 from:
(:include ScholarlyThinker2008-05 lines=4:) ... more ...
to:
May 19, 2008, at 06:45 AM
by 71.88.206.87 -
Changed lines 1-2 from:
to:
(:include ScholarlyThinker2008-05 lines=4:) ... more ...
(:Google2:)
March 06, 2008, at 09:39 AM
by ddv -
Added lines 3-4:
June 18, 2007, at 08:26 AM
by 160.36.183.130 -
Added lines 16-17:
May 21, 2007, at 09:48 AM
by 71.80.38.184 -
Deleted line 0:
NOTES
Changed line 2 from:
to:
Changed line 18 from:
to:
May 21, 2007, at 09:07 AM
by 71.80.38.184 -
Added line 2:
Deleted lines 16-17:
May 21, 2007, at 09:04 AM
by 71.80.38.184 -
Changed lines 16-17 from:
to:
Changed line 20 from:
to:
March 07, 2007, at 02:39 PM
by ddv -
Deleted lines 2-3:
Added lines 15-20:
March 01, 2007, at 01:57 PM
by ddv -
Changed lines 14-16 from:
to:
February 26, 2007, at 09:55 PM
by ddv -
Added lines 12-13:
February 26, 2007, at 12:10 AM
by ddv -
Added lines 10-11:
February 24, 2007, at 09:27 AM
by ddv -
Added line 9:
February 15, 2007, at 05:02 PM
by ddv -
Changed lines 7-8 from:
to:
February 08, 2007, at 10:21 AM
by ddv - move Google content to new page
Changed lines 8-22 from:
Google's progress on book digitization
I had not thought about this projct or the CCEL project for quite awhile. Looked at the Current Cites newletter this morning and read 2 articles on digitization. Interesting to see that CCEL has revamped their site into the Dupal content management system and that they are using a load balanced 2 server setup to handle traffic.
There are major differences between the Google and CCEL approaches however. Note that Google is simply digitally photographing (scanning??? in a new way!) pages of pages and then OCRing the text to create a digital archive. A concise timeline story is at the New Yorker:
GOOGLE’S MOON SHOT - The quest for the universal library. by JEFFREY TOOBIN "Every weekday, a truck pulls up to the Cecil H. Green Library, on the campus of Stanford University, and collects at least a thousand books, which are taken to an undisclosed location and scanned, page by page, into an enormous database being created by Google. The company is also retrieving books from libraries at several other leading universities, including Harvard and Oxford, as well as the New York Public Library. At the University of Michigan, Google’s original partner in Google Book Search, tens of thousands of books are processed each week on the company’s custom-made scanning equipment.
Google intends to scan every book ever published, and to make the full texts searchable, in the same way that Web sites can be searched on the company’s engine at google.com. At the books site, which is up and running in a beta (or testing) version, at books.google.com, ..."
(Sidenote: what does Google do for backup purposes? Disaster recovery must be an interesting process. But at the same time - the content is static once created - so a mirror site is easily feasible.)
CCEL on the other hand starts where Google is finished. Thml is applied to the documents while the OCR text is hand edited and corrected. Hopefully OCR is accurate but I can attest to the fact that numerous editing changes are required from the CCEL book that I completed in 1999 to 2001 (St. Francis of Sales - Treatise on the Love of God). The end result is a fully tagged document that can be searched (intelligently with tags), linked, converted into multiple formats while retaining the meta structure of the document, and serves as digital content. And additionally the scanned image is also available as well (this is particularly interesting with very old documents that contain artwork, unusual title pages, or even to look at the fonts and layout used).
2007-02-08
to:
February 08, 2007, at 10:18 AM
by ddv - change format to an index page for notes
Added lines 3-7:
February 08, 2007, at 09:48 AM
by ddv -
Changed line 16 from:
to:
February 08, 2007, at 09:47 AM
by ddv -
Changed lines 12-13 from:
(Sidenote: what do they do for backup purposes? Disaster recovery must be an interesting process. But at the same time - the content is static once created - so a mirror site is easily feasible.)
to:
(Sidenote: what does Google do for backup purposes? Disaster recovery must be an interesting process. But at the same time - the content is static once created - so a mirror site is easily feasible.)
February 08, 2007, at 09:31 AM
by ddv - Google/CCEL digitization
Added lines 1-17:
NOTES
Google's progress on book digitization
I had not thought about this projct or the CCEL project for quite awhile. Looked at the Current Cites newletter this morning and read 2 articles on digitization. Interesting to see that CCEL has revamped their site into the Dupal content management system and that they are using a load balanced 2 server setup to handle traffic.
There are major differences between the Google and CCEL approaches however. Note that Google is simply digitally photographing (scanning??? in a new way!) pages of pages and then OCRing the text to create a digital archive. A concise timeline story is at the New Yorker:
GOOGLE’S MOON SHOT - The quest for the universal library. by JEFFREY TOOBIN "Every weekday, a truck pulls up to the Cecil H. Green Library, on the campus of Stanford University, and collects at least a thousand books, which are taken to an undisclosed location and scanned, page by page, into an enormous database being created by Google. The company is also retrieving books from libraries at several other leading universities, including Harvard and Oxford, as well as the New York Public Library. At the University of Michigan, Google’s original partner in Google Book Search, tens of thousands of books are processed each week on the company’s custom-made scanning equipment.
Google intends to scan every book ever published, and to make the full texts searchable, in the same way that Web sites can be searched on the company’s engine at google.com. At the books site, which is up and running in a beta (or testing) version, at books.google.com, ..."
(Sidenote: what do they do for backup purposes? Disaster recovery must be an interesting process. But at the same time - the content is static once created - so a mirror site is easily feasible.)
CCEL on the other hand starts where Google is finished. Thml is applied to the documents while the OCR text is hand edited and corrected. Hopefully OCR is accurate but I can attest to the fact that numerous editing changes are required from the CCEL book that I completed in 1999 to 2001 (St. Francis of Sales - Treatise on the Love of God). The end result is a fully tagged document that can be searched (intelligently with tags), linked, converted into multiple formats while retaining the meta structure of the document, and serves as digital content. And additionally the scanned image is also available as well (this is particularly interesting with very old documents that contain artwork, unusual title pages, or even to look at the fonts and layout used).
'2007-02-08