Sunlight Foundation

 

Making Government Transparent and Accountable

The Sunlight Foundation uses cutting-edge technology and ideas to make government transparent and accountable. Underlying all of our efforts is a fundamental belief that increased transparency will improve the public's confidence in government

 

The Sunlight Foundation Blog

  • Improvements Needed For High Value Datasets On Data.gov

    This morning a number of organizations — POGO, OMB Watch, CREW, National Security Archive, the Center for Democracy and Technology  and the Open The Government coalition– and Sunlight sent a letter to Vivek Kundra, Federal CIO, about improvements needed to the release of High Value Datasets on Data.gov. Here are the core recommendations included. Please tell us what you think in the comments below.

    As advocates for government openness, we support the Administration’s efforts to provide the public with access to information through Data.gov. We are eager to work with you to ensure the success of Data.gov and, in that spirit, write to raise our concerns with the datasets submitted by agencies to fulfill their requirement under the Open Government Directive to post three high value datasets by January 22, and to offer constructive suggestions for improving their usefulness.

    As an overall recommendation, we urge you to add public representatives to the Open Government Initiative interagency working committee and ask the committee to address the problems and recommendations identified below.

    Release Format and Usability by the Public

    We understand one of the primary purposes of Data.gov is to enable the technology community and transparency advocates to most effectively use the data to make a direct impact on the daily lives of the American people. The format of the data plays a key role in its usability; many within the community of advocates who re-use and repackage government data would prefer data in CSV format, rather than the XML format in which many of the posted databases are provided. Accordingly, we recommend that you strike an appropriate balance between formats (such as XML) that serve the coding community and web-based presentations by agencies that can be used and understood by the general public.

    In addition, some of the currently posted files are quite large, ranging upward to several hundred megabytes. Their large size undermines their usefulness for most people or organizations. The large number of currently posted datasets also makes it difficult to find a particular database of interest. We therefore recommend that if a Data.gov dataset is available from an agency through a web-based interface, Data.gov link to that interface on the dataset’s Data.gov landing page. For a consumer looking for information on a car seat, for example, it would be far easier to search the Department of Transportation’s online database rather than scrolling through screen after screen of raw data in XML format. Additionally, as agencies continue to post datasets to Data.gov, efforts should be made to identify those of greatest public interest that lack such interfaces and develop web interfaces that allow the data to be explored online.

    Further, while we agree there is value in aggregating government data in a single site, it is questionable how much the collocation of the currently posted information on Data.gov actually benefits the public. The site is not searchable by topic and does not provide any way to bring together data from different sources on similar topics.

    As an enhancement to the organization of the site, we recommend that you use tagging or metadata to enable the public to bring together information on a topic. The thesaurus that USA.gov uses provides a useful example of the needed vocabulary.

    Value of Data

    The release of the datasets also has prompted discussions about the value and the quality of the released data, and the additional value provided by access to existing data in a new format. We believe repackaging old information is of marginal value, yet that is what many agencies have done with their recent postings on Data.gov. According to the Sunlight Foundation, of 58 datasets posted by major agencies, only 16 were previously unavailable in some format online. This leaves the impression that agencies posted easily available data, the proverbial low-hanging fruit, rather than seriously considering which of their datasets truly are of high value. While these initial postings can be considered a test run, more attention needs to be directed toward ensuring the overall quality and usefulness of the data.

    In addition, sustained attention should be paid to the possibility of making some of the datasets available as feeds that are constantly up to date, rather than as static datasets that are pulled down and then reposted on an occasional basis. We recommend that agencies be required to explain why the data is high value by having them designate which of the “high value criteria” the data meets: information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation. Similarly, we recommend requiring agencies to indicate whether a high value dataset was previously unavailable, available only with a FOIA request, available only for purchase, or available, but in a less user-friendly format. Going forward, this will make it much easier to track how agencies are complying with the other requirements of the Open Government Directive. While we appreciate the value of data that furthers the mission of an agency, we believe it is equally important to make available to the public data that holds an agency accountable for its policy and spending decisions. We hope to see more datasets of this type available in the near future.

    Quality

    As is to be expected in efforts of this type, there were a number of glitches–datasets that could not be downloaded or, once downloaded, could not be opened (the Central Contractor Registration FOIA extract from the General Services Administration seems to have caused several users problems). Additionally, some datasets were incomplete (the Hazard Grant Mitigation Program data released by FEMA is missing 23 years of data between 1966 and 1989). Even more troubling, some did not have header rows, and for those that did, their Data.gov pages did not always link to code sheets explaining what those header rows meant. Without this information, the data cannot be used.

    We therefore urge the implementation of a responsive feedback mechanism that allows the public to alert an agency that a specific dataset is not working, lacks information, or is missing explanatory material and provides a response to the concerns within a specified time. One way to address this may be to include an agency contact with the ability to resolve any database problems or provide information about the database. The interagency working group could sample the quality of these agency-specific dialogues to ensure that they are having an impact and to develop recommendations on best practices to improve the responsiveness. Additionally, we strongly recommend that all datasets on Data.gov be directly associated with their code sheets.

    Finally, we are concerned with the current lack of public notice when data is removed from the site. We respectfully urge you to note all raw tools and data that are removed from Data.gov, and to provide an explanation for their removal.

    Many of the concerns outlined above apply across all or many of the agencies’ datasets. Accordingly, we think that standards for handling these types of problems can easily be addressed through the interagency working group and then disseminated amongst the agencies.

  • Hearing on Contractor Database Transparency

    If you’ve ever tried to research federal contracts you’ll find that the databases used to house those contracts online are not so great. Sen. Claire McCaskill held a hearing yesterday titled, “Improving Transparency and Accessibility of Federal Contracting Databases.” Nancy Scola wrote up the hearing and it isn’t pretty:

    All told, there are a million lines of code involved. But there’s really no all told here, because the databases don’t talk to one another. For example, FPDS, the Federal Procurement Data System doesn’t communicate with EPLS, which stands for Excluded Parties List. Which means that theUSASpending.gov website — heralded as the American public’s window into the inner-workings of government, but powered by FPDS — doesn’t even know that contractors contained within it have been banished from government service for defrauding the United States government or otherwise behaving badly. What’s more, on some of these legacy systems, a search for Contractor X, Inc. won’t return results for Contractor X Inc. The shorthand for that particular wrinkle came to be known, during the hearing, as “the comma problem.”

    In fact, GAO’s William Woods explained to the senators, the poor state of those databases meant that when his agency was asked by Congress to detail how many contractors were billing the United States government for work in Afghanistan and Iraq, the government watchdog group was forced by technology to admit its ignorance. “We could not answer those questions,” said Woods. How many KBRs are at work in American war zones, being paid with taxpayer dollars? How many Blackwaters? Dunno.

    The biggest problem, however, didn’t turn out to be the current state of disrepair, but rather the inability to figure out what to do with the whole disclosure regime. To the surprise of almost everyone in the committee room, the General Services Administration (GSA) has been working to create a more sensible contractor disclosure regime with a more accessible public face. It was difficult for federal Chief Information Officer Vivek Kundra to identify who exactly would be overseeing the — yes — contract to revamp the databases. Ultimately that responsibility came down to either the GSA, the Office of Management and Budget or the Office of Federal Procurement.

    As Scola writes, “Senator Robert Bennett spoke for many of us today when he sat up on the dais in room 342 of the Dirksen Senate Office Building and rubbed his temples over, and over, and over, and over again.”

  • Gov 2.0 Summit: Ellen Miller and Vivek Kundra

    Sunlight’s Ellen Miller talks to federal CIO Vivek Kundra at the recent Gov 2.0 summit.

  • Real-Time Data Program Wins Innovation Award

    I know this is a couple days old, but it hasn’t been mentioned here yet. The District of Columbia’s real-time online data disclosure project was one of six winners of the Innovations in American Government awards given out by the Harvard Kennedy School’s Ash Institute for Democratic Governance and Innovation. The project was spearheaded by then-D.C. Chief Technology Officer (CTO) and current federal Chief Information Officer (CIO) Vivek Kundra. You can see the two sites singled out for praise below:

    According to the Ash Institute, “this is the first initiative in the country that makes virtually all current district government operational data available to the public in its raw form rather than in static, edited reports.” Real-time data disclosure is becoming far more common in cities across the nation with San Francisco introducing DataSF.org and the New York City legislature examining open data legislation. (Vancouver, Canada has also endorsed the release of city data in raw form.)

    Real-time, raw data disclosure is the cutting edge in transparency and government innovation. While the federal government has released Data.gov, a raw data site similar to D.C.’s, there are countless sets of public data compiled by the federal government that are in one or more of the following three categories: 1) Not online; 2) Not in a structured format; 3) Not compiled and disclosed in real-time. As many public data sets as possible should meet these three criteria. For some data it is unreasonable to ask for real-time disclosure. These sets should then, at least, meet the first two.

    Side note: It’s great to see my city defy our Rodney Dangerfield-like existence and finally get some respect.

  • This Week in Transparency – July 17, 2009

    Here are a few of the more interesting media mentions of Sunlight and our friends and allies from the week:

    Jeff Jacoby, columnist for The Boston Globe, mentioned ReadTheBill.org in a piece he wrote calling on congressional lawmakers read legislation before they vote on it. Glenn Reynolds, at his Instapundit blog, linked to Jacoby’s column. Andrew Sullivan’s blog, The Daily Dish, followed by linking to Reynolds.

    In Washington Monthly’s July/August edition, Charles Homans wrote about the Obama administration’s “experiments with data-driven democracy.” The article centers on the work of Vivek Kundra, the White House’s chief information officer, and mentions both the District of Columbia’s Apps for Democracy contest and Sunlight’s Apps for America contest. Homans quotes Clay Johnson, Sunlight Labs’ director, saying Kundra has his work cut out for him. “I have nothing but respect for what he’s trying to do. But it’s a hard job, and it’s going to take some time for this to actually happen right. I mean years.” While discussing Kundra’s launch of Data.gov, Homans again quotes Clay, “The top data source is on the world’s copper smelters, which isn’t going to tell us very much about what’s going on inside of our government.”

    As Ellen Miller, Sunlight’s director, wrote earlier this week, “When it comes to following the money that’s flowing to power on Capitol Hill, no one does it better than the Center for Responsive Politics.” For instance, MAPLight.org used CRP data to show how money watered down the energy bill, the American Clean Energy and Security Act of 2009 (HR 2454). With Congress debating health care reform, Forbes used CRP data to show how America’s Health Insurance Plans, the political advocacy and trade group for the health insurance industry, has spent nearly $10 million on lobbying Congress in the past two years. Robert J. S. Ross, writing at The Huffington Post, quotes CRP about how the insurance industry has contributed $568 million to political campaigns since 1998. CNN’s Jonathan Mann used CRP data in noting how doctors have spent roughly two-thirds of a billion dollars lobbying lawmakers in the last 10 years.

    (Continue reading…)

  • Watching Government Opacity Melt Away, “Right before our Eyes!”

    Vivek Kundra, federal CIO, and Macon Phillips, White House new media director, unveiled Office of Management and Budget’s IT dashboard this morning at the Personal Democracy Forum Conference in New York City. And the PDF attendees gave him a well-deserved standing ovation.

    The dashboard was built to monitor more than $70 billion in government information technology spending, showing all contracts within every agency, and is one of the features of the redesigned USASpending.gov, re-launched early this morning.

    During the presentation, Kundra mentioned that launching a platform that will allow the government to tap into the best thinking and the best ideas. And Phillips added that it’s just the beginning. Kundra also admitted that announcing that the federal data will be available online to the public has spurred government bureaucrats to start cleaning it up, proving the rule that sunlight is the best disinfectant. The initial dashboard is for IT expenditures only. And I’d add, however, that if you want the data on the government investments in General Motors or AIG you’ll need to go to SubsidyScope.com.

    In the question session, Andrew Rasiej, PDF co-founder and Sunlight senior technology advisor, asked Kundra if we should redefine “public” as “searchable, accessible and readable online. Kundra replied with an affirmative absolutely “yes.” As Jay Rosen, N.Y.U. journalism prof,  tweeted, “What we’ve been watching with CIO Vivek Kundra at #pdf09 is the undoing of the opacity agenda of the Bush years, right before our eyes!”

    NextGov.com’s Gautham Nagesh noted today that the site’s new visualization tools are a definite improvement. “It’s now possible with just a few clicks to see how much money an agency has invested in IT projects and what percentage of those projects are behind schedule or over budget,” Nagesh wrote.

    We are told that OMB will be holding a press conference this afternoon at 3:30 (Eastern Time) to highlight the redesigned USASpending.gov and the IT dashboard.

    Check it out!

  • Personal Democracy Forum: We.gov

    Personal Democracy Forum kicks off Monday in New York. This will be PDF’s sixth event, with this year’s theme being “We.gov,” as in all the ways that we, the people are using technology and new media to transform politics, campaigns, media, governance and civic action. This is one conference I never miss willingly (I think I’ve only missed one!) and I’m honestly not that much of a conference-goer. I think of it as my annual “brain food.” I can’t wait.

    A “two-day tech + politics brainfest” is how Tim O’Reilly described PDF last week.  PDF will be tracking the state-of-the-art online politics, exploring government 2.0., looking at the new tools for organizing that are being used, as well as looking at the future of political journalism, blogging and networked media.

    I’m excited to see old and new friends, many who are keynote speakers. A radically truncated list includes emerging technology expert (and Sunlight board member) Esther Dyson; senior fellow at Demos and PDF senior editor Allison Fine; now-former Washington Post “White House Watch” blogger Dan Froomkin (Dan posted his last earlier today…A must read!); New York State Senate CIO Andrew Hoppin (I blogged about him earlier today); journalism prof and Buzzmachine.com blogger Jeff Jarvis; Obama administration CIO Vivek Kundra; Craigslist founder (and Sunlight board member) Craig Newmark; law professor Beth Noveck; “Here Comes Everybody” author Clay Shirky; campaign re-inventor Joe Trippi and “The Cluetrain Manifesto” co-author and blogger David Weinberger. Really there are too many good people coming and speaking to mention

    Congratulations, in advance to Andrew Rasiej and Micah Sifry, PDF’s co-founders, and Sunlight’s senior technology advisors. It’s going to be a very exciting couple of days.

    Maybe you can join at the last minute.

  • Weekly Media Roundup – May 22, 2009

    Here are a few of the more interesting media mentions of Sunlight and our friends and
    grantees from this week:

    Thursday’s launch by the Obama administration of Data.gov, the repository for all the information the federal government collects, generated a number of good press mentions. Vivek Kundra, President Obama’s new Chief Information Officer, built and manages the Web site, which developers can access data to create applications for the Web and handheld devices. The Washington Post’s Kim Hart wrote about the launch and quotes Ellen Miller, Sunlight’s executive director, saying it “demonstrates the acceptance of the notion that providing raw data is inherent to establishing trust in agencies.” Ellen said that the administration is redefining public information. “To be truly public, it needs to be available online. That’s a dramatic shift.” Hart also quotes Patrice McDermott, director of OpenTheGovernment.org, saying most federal agencies have not traditionally emphasized openness. “It’s not what Congress has told them to do in the past, and it’s not their culture. There’s going to have to be some real pressure on agencies to do this.” Hart also mentions Sunlight LabsApps for America 2 contest, and writes that it is modeled after the Apps for Democracy contest started by Kundra when he was the District of Columbia’s chief technology officer. Richard Waters at the Financial Times (subscription required) wrote about the launch and the contest, and quotes Ellen saying the launch represents “a sea-change in how government views its information.”

    Wired’s Kim Zetter and Wired Science’s Alexis Madrigal both have articles about Data.gov that mention Sunlight and the Apps for America 2 contest. Madrigal also quotes Ellen, “Data.gov says that our information is your information,” and that “it represents this enormous change in attitude about what public means. It means it’s online. It’s means it’s available. I think it’s a dramatic breakthrough in the role of government.”

    Federal News Radio’s Jason Miller produced a story on Data.gov, and includes an mp3 of his interview with Kundra who mentions the Apps for America 2 contest. Chris Dorobek, co-anchor of Federal News Radio’s afternoon drive program, interviewed Ellen about the launch and posted the audio. Jon Gordon with American Public Media’s “Future Tense” interviewed Clay Johnson, Sunlight Labs’ director, about Data.gov. Clay said the site represents “a good first step” by the administration.

    (Continue reading…)

  • Our One Click Future

    Last Thursday, Richard MacManus, founder and editor of ReadWriteWeb, posted an interesting piece titled “Understanding the New Web Era: Web 3.0, Linked Data, Semantic Web” where he explains how he sees the Web evolving with the three trends converging. And this morning, MacManus posted another more focused piece discussing Linked Data, where the Web allows users to connect related data that wasn’t previously linked. MacManus sums up the concept nicely: “Linked Data allows you to discover, connect to, describe, and re-use all kinds of data. It is to data what the World Wide Web was to documents back in the 90’s.” Data exists to be used, he wrote. “Linked Data enables data to be opened up and connected so that people can build interesting new things from it.”

    MacManus embeds a TED presentation by Tim Berners-Lee, the inventor of the Web and director of the World Wide Web Consortium, and the leading evangelist for Linked Data. Here’s a YouTube video of Berners-Lee’s TED presentation:

    Several weeks ago, Berners-Lee opened “A National Dialogue” discussion on what Open Linked Data is and why it’s important, including the governmental implications.

    Linked Data has great implications for federal government data, but obviously, its promise reaches far beyond the confines of government. But here at Sunlight, making government data open, online and usable is our goal. And so is connecting the dots. Last month, Sunlight Labs envisioned an OpenData.gov, the new central repository for government data and research that new federal CIO Vivek Kundra is working on. We are all eagerly awaiting to see what Vivek unveils soon. But as Berners-Lee says is his Linked Open Data mantra…”Raw Data Now!

    Think of LinkedData as our one click future.

  • “Powerful New Instrument For Change”

    Over the weekend, The Boston Globe published an important op-ed about President Obama’s transparency and the right-to-know agenda, written by Mary Graham, co-director of the Transparency Policy Project at the Harvard Kennedy School. Repairing current yet “broken” transparency policies should be President Obama’s first priority, Graham writes, and by doing so he would create a “powerful new instrument for change.”

    Current transparency policies don’t really work very well. The assumptions that led to them  are correct, that is, citizens too often make crucial health care, investment and other matters,without the input of reliable information. Graham argues for more facts to be “presented in standardized, timely, and understandable ways so people can compare mortgage lenders, credit card deals, surgery outcomes, and more.” Transparency policies fail today because they don’t allow accurate comparisons, they’re vulnerable to politics and conflicts of interests and disclosure rules rarely keep pace with new risks. And I’d add, an awful lot of that information isn’t available online and little is available in real time. It isn’t disclosure if it’s not online.

    She advises the new Administration to communicate transparency policies in common and clear language so they can be understood by ordinary citizens. The Admnistration should mandate that the people within government designing the policies communicate and collaborate with each other. And the agencies should find ways to track unforeseen risks.

    I would add a few other agenda items for the executive branch that are vital to fostering true transparency. In the Web 2.0 era data must be interoperable. In other words, all government databases must be made to work together. We believe that the administration needs to set up a strong central authority to control information policy, funding and standards. The  naming of  Aneesh Chopra and Vivek Kundra to the positions of federal CTO and CIO, respectively, are positive developments on this front. And finally, government should allow and encourage citizens to participate in government through collaborative projects, like the successful Peer to Patent Project.

    Graham writes persuasively, “Neither the economy nor health care can be fixed unless transparency policies are fixed…Markets and ordinary citizens can cope with risks as long as they can understand them.”

    That sounds like transparency to me.