The Sunlight Foundation Blog
 
  • Web Harvest Archive

    POSTED BY
    John Wonderlich

    I’m glad to have just found the archive of old Web sites from members of Congress, maintained by the Center for Legislative Archives under the National Archives and Records Administration (NARA). (more after the jump.)

    The collection seems well organized and easy to peruse, with solid explanations of their methodology and disclaimers about what’s available based on the crawling.

    My main suggestion is that the archiving happen with greater frequency, perhaps coordinated in order to capture the greatest amount of material possible, and for those responsible for the Web Harvest to coordinate with the CAO, systems administrators, and vendors to be sure that the digital records management practices used in organizing member sites encourages easy crawling and archiving by NARA and CLA.

    The House has a document laying out best practices for documents management for House offices; I wonder if the digital materials management should be expanded to include digital materials availability, perhaps including standards like sitemapping, in order to ensure the preservation of member sites?

    My other suggestion is to increase the exposure of the captured sites, perhaps encouraging links from the bioguides, or current member sites, and to ensure that the collection itself is crawlable through search engine indexing practices.

    Posted: December 4, 2007 - 12:00 pm. Tags: , , , , , ,

The Site may contain links to Internet sites that are not operated by Sunlight Foundation. These links are provided as a service and do not imply any endorsement of the activities or content of these sites, nor any association with their operators. Sunlight Foundation does not control these Internet sites and is not responsible for their content, security, or privacy practices. We urge you to review the privacy policy posted on web sites you visit before using the site or providing personal information.


This work by Sunlight Foundation is licensed under a Creative Commons Attribution 3.0 United States License.