The Sunlight Foundation uses cutting-edge technology and ideas to make government transparent and accountable. Underlying all of our efforts is a fundamental belief that increased transparency will improve the public's confidence in government
This morning a number of organizations — POGO, OMB Watch, CREW, National Security Archive, the Center for Democracy and Technology and the Open The Government coalition– and Sunlight sent a letter to Vivek Kundra, Federal CIO, about improvements needed to the release of High Value Datasets on Data.gov. Here are the core recommendations included. Please tell us what you think in the comments below.
As advocates for government openness, we support the Administration’s efforts to provide the public with access to information through Data.gov. We are eager to work with you to ensure the success of Data.gov and, in that spirit, write to raise our concerns with the datasets submitted by agencies to fulfill their requirement under the Open Government Directive to post three high value datasets by January 22, and to offer constructive suggestions for improving their usefulness.
As an overall recommendation, we urge you to add public representatives to the Open Government Initiative interagency working committee and ask the committee to address the problems and recommendations identified below.
Release Format and Usability by the Public
We understand one of the primary purposes of Data.gov is to enable the technology community and transparency advocates to most effectively use the data to make a direct impact on the daily lives of the American people. The format of the data plays a key role in its usability; many within the community of advocates who re-use and repackage government data would prefer data in CSV format, rather than the XML format in which many of the posted databases are provided. Accordingly, we recommend that you strike an appropriate balance between formats (such as XML) that serve the coding community and web-based presentations by agencies that can be used and understood by the general public.
In addition, some of the currently posted files are quite large, ranging upward to several hundred megabytes. Their large size undermines their usefulness for most people or organizations. The large number of currently posted datasets also makes it difficult to find a particular database of interest. We therefore recommend that if a Data.gov dataset is available from an agency through a web-based interface, Data.gov link to that interface on the dataset’s Data.gov landing page. For a consumer looking for information on a car seat, for example, it would be far easier to search the Department of Transportation’s online database rather than scrolling through screen after screen of raw data in XML format. Additionally, as agencies continue to post datasets to Data.gov, efforts should be made to identify those of greatest public interest that lack such interfaces and develop web interfaces that allow the data to be explored online.
Further, while we agree there is value in aggregating government data in a single site, it is questionable how much the collocation of the currently posted information on Data.gov actually benefits the public. The site is not searchable by topic and does not provide any way to bring together data from different sources on similar topics.
As an enhancement to the organization of the site, we recommend that you use tagging or metadata to enable the public to bring together information on a topic. The thesaurus that USA.gov uses provides a useful example of the needed vocabulary.
Value of Data
The release of the datasets also has prompted discussions about the value and the quality of the released data, and the additional value provided by access to existing data in a new format. We believe repackaging old information is of marginal value, yet that is what many agencies have done with their recent postings on Data.gov. According to the Sunlight Foundation, of 58 datasets posted by major agencies, only 16 were previously unavailable in some format online. This leaves the impression that agencies posted easily available data, the proverbial low-hanging fruit, rather than seriously considering which of their datasets truly are of high value. While these initial postings can be considered a test run, more attention needs to be directed toward ensuring the overall quality and usefulness of the data.
In addition, sustained attention should be paid to the possibility of making some of the datasets available as feeds that are constantly up to date, rather than as static datasets that are pulled down and then reposted on an occasional basis. We recommend that agencies be required to explain why the data is high value by having them designate which of the “high value criteria” the data meets: information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation. Similarly, we recommend requiring agencies to indicate whether a high value dataset was previously unavailable, available only with a FOIA request, available only for purchase, or available, but in a less user-friendly format. Going forward, this will make it much easier to track how agencies are complying with the other requirements of the Open Government Directive. While we appreciate the value of data that furthers the mission of an agency, we believe it is equally important to make available to the public data that holds an agency accountable for its policy and spending decisions. We hope to see more datasets of this type available in the near future.
Quality
As is to be expected in efforts of this type, there were a number of glitches–datasets that could not be downloaded or, once downloaded, could not be opened (the Central Contractor Registration FOIA extract from the General Services Administration seems to have caused several users problems). Additionally, some datasets were incomplete (the Hazard Grant Mitigation Program data released by FEMA is missing 23 years of data between 1966 and 1989). Even more troubling, some did not have header rows, and for those that did, their Data.gov pages did not always link to code sheets explaining what those header rows meant. Without this information, the data cannot be used.
We therefore urge the implementation of a responsive feedback mechanism that allows the public to alert an agency that a specific dataset is not working, lacks information, or is missing explanatory material and provides a response to the concerns within a specified time. One way to address this may be to include an agency contact with the ability to resolve any database problems or provide information about the database. The interagency working group could sample the quality of these agency-specific dialogues to ensure that they are having an impact and to develop recommendations on best practices to improve the responsiveness. Additionally, we strongly recommend that all datasets on Data.gov be directly associated with their code sheets.
Finally, we are concerned with the current lack of public notice when data is removed from the site. We respectfully urge you to note all raw tools and data that are removed from Data.gov, and to provide an explanation for their removal.
Many of the concerns outlined above apply across all or many of the agencies’ datasets. Accordingly, we think that standards for handling these types of problems can easily be addressed through the interagency working group and then disseminated amongst the agencies.
Here are some of the more interesting media mentions of Sunlight and our friends and allies over the past week:
Alan Fram with the Associated Press wrote about how the health insurance industry is fighting to prevent the Congress from passing a health care overhaul that includes a government-run plan to compete with private insurers. Fram cites data from the Center for Responsive Politics to show how health insurers have made $41 million in campaign contributions to current congressional lawmakers since 1989, “with more than half going to lawmakers on the five House and Senate panels writing this year’s health bills.” Since the beginning of 2008, insurers have spent $145 million on lobbying.
The New York Times‘ Jack Rosenthal, in writing the paper’s “On Language” column, mentioned how Andrew Raseij, Sunlight’s senior technology advisor and co-director of Personal Democracy Forum, is pushing for a federal law that redefines “public” to mean searchable and readable online. U.S. Rep. Steve Israel (N.Y.) is drafting just such legislation. Rosenthal also noted how the Senate does not disclose campaign-contribution information to the Federal Election Commission in an electronic form. “That means it must be digitized by the commission, by which time the next election may well have come and gone. Transparent? Yes, but also emasculated,” Rosenthal wrote.
Federal Computer Week’s Ben Bain wrote about how the Obama administration is asking federal agencies to gear their spending plans for science and technology in fiscal 2011 toward projects designed to drive economic growth, create energy independence, improve health, and bolster security, according to recently issued general guidance. Peter Orszag, Obama’s OMB director, outlined the new emphasis in an August 4th memo (PDF). Craig Jennings, a senior federal fiscal policy analyst with OMB Watch, said the memo is an indication that science and technology will be high priorities for the administration. (Continue reading…)
For over a century, the American Association of Law Libraries has been a strong voice on a broad array of information policy issues, including matters related to copyright, access to government information and privacy. The now 5,000-member AALL is hosting its annual meeting in Washington this week.
And I’m honored to announce that Sunlight is this year’s recipient of the Public Access to Government Information Award, given in the spirit of AALL’s principal tenet: the right of equal access to information for all to ensure an informed citizenry and to promote a just and democratic society. For the past decade, the organization has been bestowing the award to recognize persons or organizations that have made significant contributions to protect and promote greater public access to government information. The fact that friend and colleague Gary Bass, executive director of OMB Watch was last year’s recepient only makes it that much sweeter. Also, Steven Aftergood of the Federation of American Scientists and author of “Secrecy News” was the recipient in 2006, putting Sunlight in very good company. A full listing of their awards can be viewed here.
I would like to send special and heartfelt thanks to the folks at AALL for their work and for recognizing ours.
Here are some of the more interesting media mentions of Sunlight and our friends and allies over the past week:
CQ Weekly’s Maura Reynolds wrote about the Obama administration’s successes and failures in achieving its transparency goals six months into the term. Reynolds quoted Ellen Miller, Sunlight’s director, about how many of their transparency initiatives are still in development and how the kinks are being worked out. “A default position that government data will be accessible to the public in machine-readable format is a huge step forward,” Ellen said. “Is it moving as fast as I’d like? Of course not. But I can be patient while this unfolds.” Ellen also commented on some of the administration’s initiatives, such as “town hall” meetings, that have been tightly controlled. “There is real transparency, and then there is transparency theater,” she said. “I can distinguish between the two.” Reynolds wrote that the more people expect the Internet to deliver the information they want, the more kinds of information they will expect to access that way. “It’s kind of a genie out of the bottle,” Ellen said. “The Internet has raised expectations. I fundamentally believe that the way technology pushes information out to the edges will have a powerful effect on the power structure.” Reynolds reports that open government advocates praise two federal Web sites, USAspending.gov, a site that tracks all federal spending and was set up as a result of a bill co-sponsored by then-Sen. Obama, and Data.gov, the site the new administration designed as a “one-stop shop for number crunchers that consolidates statistics across federal agencies in standard, machine-readable formats.” The article quotes Gary Bass, director of OMB Watch, saying the sites could be vehicles for connecting government performance to spending. “From the point of view of the average user, there has been nothing like this before. That is truly a credit to this administration.” Reynolds notes that it was OMB Watch’s FedSpending.org that served as the technical platform for USAspending.gov.
Despite the existence of rules requiring congressional lawmakers to disclose earmarks they request, rules do not exist requiring them to disclose items classified as “program support.” The Washington Post’s Carol Leonnig illustrates this problem with a report on how $160 million intended to help Mexico’s police buy U.S.-made first-responder radios was tucked into the voluminous congressional plan for U.S. military spending next year. Leonnig quotes Bill Allison, Sunlight’s senior fellow, “It kind of makes a mockery of the disclosure requirements we have. They will disclose the little things, the $1 million projects, but when you have the big-ticket items, you don’t have members willing to take responsibility for those.”
Stephanie Condon, writing at CBS News‘ “Political Hotsheet” column, cited a report from Taxpayers for Common Sense that found that lawmakers serving on the the House Appropriations Subcommittee on Defense included 1,080 earmarks worth $2.7 billion dollars in the fiscal-year 2010 defense appropriations bill they approved last week. The lawmakers specifically requested more than $1.6 billion in earmarks for their campaign contributors, entities who had donated nearly $1 million to the committee members.
As our colleagues at OMB Watch blogged about yesterday, the Coalition for an Accountable Recovery, of which Sunlight is a member, released more analysis (PDF) they’ve conducted of the Office of Management and Budget’s recent guidance (PDF) on how Recovery Act recipients should report how they used the funds. CAR’s analysis in a nutshell: “While this guidance is a step in the right direction, there is still much room for improvement.”
So far, OMB has provided guidance only for recipients of grants and loans. OMB Watch says that separate guidance for federal contractors is coming soon. OMB has started to flesh out the details of the reporting process, which up until this point have largely been vague and unformed, OMB Watch reports.
CAR lists the good and the bad about OMB’s guidance. First the good:
(T)he guidance provides a useful framework for reporting to a central data collection service, called FederalReporting.gov. The design of the system is scalable to ultimately have all recipients of Recovery Act funds, including multi-tier sub-recipients, report directly. The guidance also creates a distinction between sub-recipients and vendors, which will prove useful. At the same time, OMB allows prime recipients to delegate direct reporting to sub-recipients – except for jobs data – which will likely cause confusion. There is also significant ambiguity about penalties for reporting non-compliance.
And the bad:
(There is) a lack of multi-tier reporting, job quality data, and performance data information; that jobs information is still being reported as undefined full-time equivalents (FTEs); that it is not clear if the information will be publically accessible with easy to use machine-readable tools; and that OMB requires the use of DUNS numbers and poorly considered identifiers for sub-recipients.
CAP’s full analysis is here (PDF).
The Right-To-Know Network (RTK Net) has completed their Web site redesign, and it looks totally awesome. RTK Net is a project of OMB Watch, and they provide the public easy access to environmental and public health information such as pollution releases, chemical spills, hazardous waste generation, data that we need to know in order to keep ourselves healthy and safe.
RTK Net’s new site includes interactive maps showing pollution data for each state, graphs that chart pollution trends and lists of the top polluting power plants, refineries and other facilities. And the site provides free public access to environmental information from a number of databases managed by the Environmental Protection Agency (EPA) on such things as toxic pollution, hazardous waste and spills and accidents. The site also allows users to identify specific factories and their environmental impacts, as well as what might happen if things go terribly wrong.
RTK Net is a great example of using technology to create government transparency. By giving the public easy access to this data, RTK Net is also making it possible for all of us to be involved in the government’s environmental decision making.
Here are a few of the more interesting media mentions of Sunlight and our friends and grantees from this week:
David Herbert with the National Journal (subscription required) wrote about the grades new media experts from across the political spectrum gave the Obama administration’s Web presence. The experts gave WhiteHouse.gov an average grade of C+. Although they mostly see it as an improvement from the previous administration’s site, many noted that it remained a one-way forum and suggested it be opened to allow comments and other interactive features. Herbert quotes Ellen Miller, Sunlight’s executive director, “This occasional use of interactive tools” is impressive, but “90 percent of the time the site is pretty straightforward, as it was under [George W.] Bush.” Recovery.gov, the administration’s site where citizens can monitor the expenditure and use of recovery funds, fared even worse in the Journal’s poll, averaging a C. The most common gripe about the site, Herbert writes, is that it’s “the view from 30,000 feet,” as Micah Sifry, senior technology advisor for Sunlight and Personal Democracy Forum (PDF) co-founder, told him. Without providing on-the-the ground details, Recovery.gov offers taxpayers few tools for staying on top of where their money is going, reviewers said. Recovery.gov has competition in the form of privately-operated Recovery.org, which has “more granular data and a real search tool, which one assumes we’ll eventually see on Recovery.gov,” Micah explains. “I don’t think it’s fair to compare this site to other Web sites yet, as it’s just weeks old,” Micah added. “Let’s take another look in three to six months, OK?”
Chris Lefkow with Agence France-Presse gained a different take by interviewing academics, technology analysts and nonpartisan groups on the administration’s technology efforts. Lefkow writes that they all said the first “tech president” is off to a good start. Lefkow quotes John Wonderlich, Sunlight’s policy director, “their first pronouncements are very encouraging,” and added that the challenge, however, is going to be the implementation. Andrew Resiej, Sunlight’s other senior technology advisor and PDF co-founder, said the administration been doing as much as it can to fulfill its promises in regards to transparency and technological innovation. “However they’ve been constrained by decades of industrial-age rules and regulations and procurement protocols that are handicapping the speed at which they can implement that vision,” he said.
Here are a few of the more interesting media mentions of Sunlight and our friends and grantees from this week:
Sunday evening, BlogTalkRadio posted an episode of “Talking Gov2.0,” where Clay Johnson, Sunlight Lab’s director, discussed Sunlight, Sunlight Labs and the Apps for America contest. Speaking of Apps for America, Clay announced the winners on Monday. And Marshall Kirkpatrick at ReadWriteWeb wrote about the contest, and included a screencast of the winners.
Victoria McGrane with the Politico wrote about the lack of online disclosure of campaign finance data by candidates for the U.S. Senate, and the efforts to rectify this through S. 482, the Senate Campaign Disclosure Parity Act. She mention’s Sunlight’s Pass S. 482, and extensively quotes Lisa Ronsenberg, Sunlight’s government affairs consultant, about the need for the Senate to join the 21st Century.
The National Journal reported on data from the Center for Responsive Politics (CRP) that shows last year’s top 20 Political Action Committee contributors to federal candidates poured a combined $22 million into lobbying efforts from January through March — an increase of nearly 20 percent over the same period in 2008.
Anne C. Mulkern with Greenwire (subscription required) used Capitol Words to look at the use of energy- and environment-related words by congressional lawmakers. The New York Times re-posted Mulkern’s piece.
There’s a bit of irony in this story.
House Republican leaders are calling for Democrats to post the stimulus bill, the American Recovery and Reinvestment Act of 2009, online immediately. In a letter sent to Speaker Nancy Pelosi and Majority Leader Steny Hoyer, the GOP leaders write that having the bill online would allow citizens to study its contents before Congress agrees to it and the president signs it into law. The GOP leadership is correct to, on behalf of the American people, claim the right “to see each provision of this legislation and evaluate the merit of each dollar of government spending their children and grandchildren are being required to fund.”
Too bad they haven’t always been for such transparency.
Since inception, Sunlight has been calling for exactly this sort of openness. We think all legislation should be posted on line for 72 hours before debate. We’re hoping now that alot of Republicans will sign onto this measure when it’s reintroduced in this Congress.
In the wake of the Troubled Asset Relief Program (TARP), and while Congress debates the massive stimulus bill, the Coalition for an Accountable Recovery was created to promote accountability for both federal government agencies doling out the trillions of dollars, for the states and for the companies that benefit from recovery funds. The best way to assure taxpayers that the funds are being used responsibly is to provide “radical” transparency on stimulus spending and to make the details of the stimulus available in online, in real time.
No great surprise to here that the Coalition (of which Sunlight is a member) is calling on Congress to require online reporting that allows the public to easily search, sort, track and download data on the use of recovery. Each state should be required to report on all funds they receive and all data should be presented in a uniform manner, making sure it is compatible with the USASpending.gov Web site. The Coalition has also state that the newest technology should be applied to both the Recovery.gov Web site and USASpending.gov to make the information more accessible for everyone
Sunlight has joined the over 30 groups as part of the coalition, including the Center for Responsive Politics, Common Cause, National Institute for Money in State Politics, OMB Watch, OpenTheGovernment.org, Project on Government Oversight and Taxpayers for Commonsense.