2006 Education Award Laureate
Internet Archive
Project Overview:
The Internet Archive was founded to build an ‘Internet library,' with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format. The Internet Archive is saving knowledge. It puts books, audio files, moving images, software, and billions of web pages into digital format and makes them free and searchable to anyone with an Internet connection. The Internet Archive has also developed high-density storage, where it keeps these vast quantities of data.
Problem Addressed:
Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world. Many early movies were recycled to recover the silver in the film. The Library of Alexandria - an ancient center of learning containing a copy of every book in the world - was eventually burned to the ground. Even now, at the turn of the 21st century, no comprehensive archives of television or radio programs exist. But without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures. And paradoxically, with the explosion of the Internet, we live in what Danny Hillis has referred to as our "digital dark age."
Technology Solution:
There are four basic areas in which technology is used. The Petabox (a peta is 1 million gigabytes) system was developed by Internet Archive to store and access vast amounts of information. This system hosts all of their information and forms the backbone of their system architecture. The Wayback Machine is a user interface technology to over 55 billion web pages and indexes the collection into a searchable database. Heritrix is the Internet Archive's open source, extensible web-scale, archival-quality web-crawler project for the specific purpose of archiving websites and to support multiple different use cases including focused and broad crawling. The Internet Archive Books Project uses its own Scribe book scanning technology to scan books that are in the Public Domain taking high quality, cost-effective, non-destructive reproductions of each book and converting the text and images into formats for the Flip Book Viewer.