Inside CDL

The Web-at-Risk: Preserving Our Nation's Cultural Heritage

About Web-at-Risk

Web-at-Risk is a four and one half-year effort led by the California Digital Library (CDL) to develop tools that enable librarians and archivists to capture, curate, preserve, and provide access to web-based government and political information.  The primary focus of the collection is state and local government information, but may include web documents from federal and international government as well as non-profit sources.

Web-at-Risk received one of eight grants awarded by the National Digital Information Infrastructure and Preservation Program (NDIIPP).  The work is undertaken by the CDL and its partners New York University and the University of North Texas, with additional support from Stanford University, the San Diego Supercomputing Center, and the Library of Congress.  The University of California libraries are also participating, with staff from the Berkeley, Davis, Los Angeles, Riverside, San Diego, San Francisco, Santa Barbara, and Santa Cruz campuses lending their domain expertise.

Web Archiving Tools - Filling a Need

The need for web archiving tools stems from the ephemeral nature of web resources, especially local government and political information. Print publications now often appear in digital format on the web. As changes are made to web sites these publications are susceptible to disappearing. Librarians need a new suite of tools to fulfil their historic mission of preserving our cultural and political heritage.

The Web Archiving Service (WAS)

To address these concerns, the CDL has built the Web Archiving Service (WAS), a web application designed to capture, curate, and preserve Internet content.  The Web Archiving Service is being developed in two phases - the first phase focuses on the curatorial tools needed to build web archives, and the second focuses on providing public access to built collections.

The WAS feature set includes the ability to capture sites, to search and browse captured content, to compare sites over time, and to build collections of selected content.

Project Deliverables 2005-2008

While the Web Archiving Service is a major focus of the Web-at-Risk grant work, the project encompasses more than just software development work.  Below are some of the deliverables completed thus far:

The Web at Risk: A Distributed Approach to Preserving our Nation’s Political and Cultural Heritage.  Interim Report from the California Digital Library. [PDF]

A summary of grant work between 2005 and 2008.  This report includes evaluations of each pilot release of the web archiving service, and links to detailed technical documentation.

Web-at-Risk Needs Assessment Summary Report [PDF]

A summary of the needs assessment work that took place at the outset of the grant.  This included several focus groups of librarians and archivists as well as surveys and interviews to determine what librarians need from web archiving tools.

Web Archiving Service Guide [PDF]

Provides an in-depth overview for capturing sites, controlling capture settings, analyzing results and building collections.

Web Archiving Service Collection Planning Guidelines [DOC]

The Collection Planning Guidelines are used to help Web Archiving Service users plan their collection activity.

Future Work: 2008-2009

Research and development work for phase 2 will continue through the summer of 2009.  As part of this work, the CDL will be conducting interviews to determine how web archives will be used by researchers and scholars.  Faculty and researchers wishing to participate in these interviews can contact washelp@ucop.edu for further information.

Further Information

Web-at-Risk wiki
News and reports from Web-at-Risk grant activity.

National Digital Information Infrastructure Preservation Project
Information from the project’s funders.

Web-at-Risk Project Partners [PDF]

Web-Based Government Information: Evaluating Solutions for Capture,
Curation, and Preservation.  [PDF]

Web-at-Risk Collections: You Tube Video
An overview of what places web publications at risk, and a glimpse at the collections that the CDL’s work is enabling.