The Sunlight Foundation Blog
 
  • Digital Preservation Under Threat?

    POSTED BY
    John Wonderlich

    Via dotgovwatch, it looks like the National Archives is discontinuing their Web Harvest program:

    For the first time since the Internet began, the National Archives and Records Administration (NARA) will not record a snapshot of Executive Branch websites at the end of a Presidential administration.

    In the article, Coby Logen notes that the valuable work of non-profits like archive.org shouldn’t entirely supplant the work of the government. Federal agencies exist to protect the public interest, through a public mechanism. Our national government has a responsibility to protect and document its history. They are uniquely positioned to do so; no one else has both the reliable public mandate and the public accountability necessary for protecting historical documents.

    Federal Web sites are historical documents, and NARA’s Web Harvest program should be enthusiastically supported. Digital records management should enable easier and cheaper preservation, and brings the promise of more meaningful disclosure and access to both current and historical documents.

    The fact that digital preservation is done by others outside NARA isn’t an excuse for NARA to abdicate their responsibility, but an argument that they should be capable of fulfilling it.

    As Members of Congress and Federal Agencies increasingly move their work online, robust digital archiving will only become more important, so we can understand how our government is performing its duties.

    0 Comments

    Posted: April 9th, 2008 Tags: ,
  • An Old Report Made New

    POSTED BY
    John Wonderlich

    I’ve been on a mission, since November 14th, to find a digital copy of S.Pub 102-20, a reference document from 1990 giving a very comprehensive analysis of all public congressional information, from an archival perspective. I’ve finally managed to digitize a copy (after some quality time at the scanner). It is a large file. (Click here to download a PDF.)

    The preface describes it as a "study of the archival sources that document the operations of Congress." The "archival sources" described in this document comprise the entire body of public congressional information, the substance of both administrative minutiae, and legislative substance. Just as we are interested in the capacity of the public to be conscious of its legislature, we should be interested in the legislature’s capacity to take stock of itself, to engage in constructive introspection. (more)

    I came across this document being repeatedly cited while reading the yearly reports of the Advisory Committee on the Preservation of the Records of Congress, and still find rich irony in the fact that the document itself wasn’t available in a digital form. That’s not to say anything against the Advisory Committee, which seems to be an outgrowth or a result of the task force that wrote S.Pub 102-20, and also inspired H.R. 5241 from the 101st Congress, a bill reorganizing the National Archives, among other things. The Advisory Committee seems to be among the very best of examples of an organization created to meet an emergent need, cutting across jurisdictions and what one of its members recently described to me as "negotiated terrain" (a description I very much liked).

    The complex problem of coordinating congressional information is difficult, but not for the usual reasons. As far as preservation goes, the administrative coordination is already in place, and it seems that the research (and even enforcement ) about disclosure mechanisms has been in place for quite some time. What has been lagging is not administrative will, but the digital culture and popular expectations that make IT investment a real priority.

    This is clearly changing, as new staffers expect to represent their members of Congress online without encountering arcane restrictions, as citizens expect to encounter government information and services through the same search engines they use for research and shopping, and a new brand of journalism is springing up that depends not on cultivating trusted sources through personal relationships, but on careful consideration of primary sources–exactly those "archival sources" this document so comprehensively describes.

    While some disclosure will be resisted for as long as the benefits of secrecy outweigh the outcry over obstruction, and privileged access will always be at odds with the broader public interest, it is good to see that a detailed anatomy of congressional information has already been constructed in great detail. The question that remains is how well will Congress adapt to new expectations of information access — a question that necessarily comes along with a digitally empowered citizenry.

    0 Comments

    Posted: December 19th, 2007 Tags: , , ,
  • Web Harvest Archive

    POSTED BY
    John Wonderlich

    I’m glad to have just found the archive of old Web sites from members of Congress, maintained by the Center for Legislative Archives under the National Archives and Records Administration (NARA). (more after the jump.)

    The collection seems well organized and easy to peruse, with solid explanations of their methodology and disclaimers about what’s available based on the crawling.

    My main suggestion is that the archiving happen with greater frequency, perhaps coordinated in order to capture the greatest amount of material possible, and for those responsible for the Web Harvest to coordinate with the CAO, systems administrators, and vendors to be sure that the digital records management practices used in organizing member sites encourages easy crawling and archiving by NARA and CLA.

    The House has a document laying out best practices for documents management for House offices; I wonder if the digital materials management should be expanded to include digital materials availability, perhaps including standards like sitemapping, in order to ensure the preservation of member sites?

    My other suggestion is to increase the exposure of the captured sites, perhaps encouraging links from the bioguides, or current member sites, and to ensure that the collection itself is crawlable through search engine indexing practices.

    0 Comments

    Posted: December 4th, 2007 Tags: , , , , , ,
  • Bibliographic Control, Agile Government

    POSTED BY
    John Wonderlich

    I found this post from the Library of Congress blog yesterday, and it has me thinking about a bunch of other things I’ve been intending to write about. The LOC is accepting public commentary on their draft plan for the Working Group for the Future of Bibliographic Control. The draft is full of noteworthy observations about decentralized information management and the Internet, and I’m going to excerpt from it generously below.

    First, however, I’d like to point to another report I recently came across, called Agile Government: A Provocation Paper. Prepared in conjunction between Demos and the State of Victoria (an Australian state), the paper applies the concept of agility (as often applied to software development) to public sector planning. Agility focuses on the productively dynamic aspects of management, development, and administration, stressing iterativeness and flexibility over comparatively static organizational models.

    While I’m personally unconvinced by the idea of agility as a fundamental organizational principle, and prefer to think of it as a helpful rubric or theme, the concept does provide a helpful lens with which to view other government documents which are broad in scope.

    For example, when I wrote recently about the National Archives’ public comment period for their partnership plan for digitization, I was most impressed by the public, iterative nature of the projects’ planning. A superior plan will presumably result if the plan is skillfully subjected to multiple periods of public inspection and re-editing. This process’s constructive aspects are echoed in both the organization of the Open House Project report, and in the legislative process itself.

    I’m wondering about the history of public administration’s public components, that is, when did certain plans start to be subjected to public commentary? For how long has the federal regulatory process been subject to public commentary? For how long have legislative support agencies been publishing 10 year visions and yearly updates? Perhaps most importantly, what is the best way to optimize and institutionalize the benefits of publishing organizational plans? Is a statutory mandate necessary, or are modern expectations of effective management practices sufficient? Does the public benefit from a required level of visionary reporting from its institutions, or should the agencies report on their activities in whatever manner best fits their needs? Does the reporting of such documents currently outpace the public demand for such information?

    I hope it doesn’t: I’d prefer to think that the GPO’s or NARA’s visions for digitization don’t go unnoticed, that the LoC’s enormous yearly updates are appreciated for their scope and detail, that the strategic visions (of approximate 10 year length) of GPO, NARA, the LOC, or the frequent reports from the CAO, CRS, NARA, or various related Inspectors General (GPO, LoC) are at least perused by the parties affected. Surely a demand for information and awareness necessary for a government to be agile must be coupled with a community of information consumers who are aware of that information?

    Back to the topic at hand, the Library of Congress has a public report currently in a public comment phase, looking for feedback on their vision about the future of the Library (and especially their bibliographic activities, especially as it relates to cataloging and metadata) in an age of digital Internet publishing and information dissemination. While the report deals extensively with the minutae of the LOC’s information management practices, it also provides repeated insight into the Library’s view of a rapidly changing information ecology, reinforcing many observations and concerns shared throughout the Open House Project (and broader) community. (also adding to a previous plan by the LOC regarding bibliographic control.)

    From the introduction:

    The future of bibliographic control will be collaborative, decentralized, international in scope, and Web-based. Its realization will occcus in cooperation with the private sector, and with the active collaboration of library users.

    Page 2:

    Recognize that people are not the only users of the data that we produce in the name of bibliographic control, but so too are machine applications that interact with those data in a variety of ways.

    Page 3:

    In 1902, the Library of Congress began producing catalog cards for purchase so that libraries that purchased the same book could buy catalog cards from the Library of Congress… The service continues to this day, although now bibliographic data are in machine-readable form and are shared over networks.

    Page 4:

    The economics of creating LC’s products have changed dramatically since the time when the Library was producing cards for library catalogs.

    …it receives no funding specifically directed at providing bibliographic services for U.S. libraries.

    Page 7:

    …it is necessary to embrace a view of bibliographic control as a distributed activity, not a centralized one. Data about collection usage–such as inclusion in curricula or bibliographies, citation links, circulation and sales figuresl–are all valuable bits of information in the universe of bibliographic control.

    Page 8:

    All possible means of collaboration should be considered. …needs to consider carefully when it is appropriate to distribute effort and when to discontinue it.

    Page 9:

    …the standards landscape in the library field is murky, with many different organizations working on similar standards in a non-coordinated fashion. LC should consider sharing the standards effort within the community and collaborating with other interested institutions to create a rational and efficient means of managing the standards needed for information exchange.

    Page 21:

    2.4.1 LC: Study possibilities for computational access to digital content. Use this information in developing new rules and best practices.

    Page 22:

    The use of language strings such as personal or corporate names as identifiers hinders data exchange across languages and across different information communities.

    3.1.1 Develop a More Flexible, Extensible Metadata Carrier

    Page 24:

    New discovery environments are emerging that extract and merge data from several library systems.

    0 Comments

    Posted: December 2nd, 2007 Tags: , , ,
  • Library of Congress Website Upgrade

    POSTED BY
    John Wonderlich

    Via the Library of Congress blog, it looks like the LOC Website will be getting an upgrade in the coming weeks. They make a good point about choosing between providing RSS feeds and email updates, noting that many more people use email than RSS:

    While only a fraction of people on the Web use RSS feeds, something like 100 percent of them use email, and this is just another part of our efforts to get information to people in the way that is most useful to them. You can get a sense for how the email updates will function by looking at the FBI’s Web site.

    Happily, they’re not choosing between the two, and have a pretty broad set of RSS feeds already on offer on their RSS page.

     

    Of particular note on the existing RSS feeds are the LOC blog feed (whereby I noticed this update post), the digital preservation feed, a legislation update feed from the Copyright office (I wonder if other agencies are doing this?), and a feed of Federal Register items relevant to the Copyright office. (I’m also curious as to the degree of automation in gathering agency specific items to form these feeds. How are they set up, and from where are they gathered? What would it take to reproduce this in other agencies?) Looks like NARA’s got three feeds set up: news and events, today’s document, and Federal Register Documents on public inspection.

    The GAO has a great offering as well.
    Could the GPO or CAO be close behind?

    Know of any other forward looking government information sources?

    (Crossposted from the Open House Project blog.)

    0 Comments

    Posted: August 30th, 2007 Tags: , , , ,

The Site may contain links to Internet sites that are not operated by Sunlight Foundation. These links are provided as a service and do not imply any endorsement of the activities or content of these sites, nor any association with their operators. Sunlight Foundation does not control these Internet sites and is not responsible for their content, security, or privacy practices. We urge you to review the privacy policy posted on web sites you visit before using the site or providing personal information.


This work by Sunlight Foundation is licensed under a Creative Commons Attribution 3.0 United States License.