Who is right, me or the White House?

Last night I wrote a blog post about the launch of data.gov.uk and said it had 3X as many data sets as the US’s data.gov. Today I got an interesting email from the White House (cool!) saying I was wrong. A number of other people disagreed with me as well. It’s a fun little story, but the question comes down to: do you think that the US Geological Survey maps and related entries category that so dominates data.gov should be counted as equal contributions to the open data ecosystem? Let me know what you think.

Note that I’ve reproduced an email below and at least one commenter has told me I was out of line to do so. I disagree. I think the email is very straight-forward, from a public official and ok to run with a question about whether the assessment of mine it is challenging is correct or not. No big deal, I think. I’m going to think about it some more, though.

Hi Marshall,

I wanted to reach out to you regarding your piece on Data.gov.

Below is a blog post from WH.gov from Vivek Kundra that includes the latest information about the number of data sets available on Data.gov – 168,000.

Your piece incorrectly states that Data.gov has less than 1,000 data sets.

Your story also mentions “critics” of Data.gov who pointed that “it was filled with relatively non-controversial data sets”—if you are interested in representing both sides of the story I’d be happy to put you in touch with some folks.

Below is the link to Vivek’s blog post:


Please let me know if you have any additional questions.

Jean B. Weinberg
Deputy Press Secretary
White House Office of Management and Budget

Here’s my response. It’s not buttoned-up and respectful, I suppose – but we’re all bloggers now, right?

Hi Jean, thanks for the email. Here’s my take on it: data.gov has 969
records of “machine readable, platform-independent datasets.” It also
has aprox 167k geodata records, almost all maps. That’s a convenient
way to say there are 168k datasets, but a big map dump doesn’t seem
that compelling to me. Maybe I’m wrong – but when I see the UK site
sharing data sets like soldier suicides and number of abortions, that
makes a big dump of geological maps on the US site seem anemic. I’d
be happy to talk to someone who feels otherwise, though. Please do
connect me with someone I can speak to about this. I’ve been very
critical of data.gov since it launched and would be happy to be
persuaded to feel otherwise.

Thanks for sending me that link to Mr. Kundra’s blog post by the way.
I think I’ll write a post in response and see if my readers see it the
way I do. I must say, I found his post rather shocking in tone.
Claiming that the UK is following the US’s lead when the UK is working
with Tim Berners-Lee who brought us the World Wide Web and has the
very forward-looking semantic web paradigm in its sights – that
doesn’t make a lot of sense to me.


Marshall Kirkpatrick
VP of Content Development and Lead Blogger

In reality, I imagine that the truth is somewhere in the middle. Maps are hardly worthless and the US Geological Survey data that dominates data.gov isn’t just maps. It mostly is, though, and my point is that data.gov is disappointing so far. What do you think? Am I being unfair? Should I change my perception and coverage of data.gov? I know I’m not the only person who feels critical, but I thought I’d run this numbers discussion past some more people to get some more perspective.

  • df

    I’m with you Marshall.

  • Henry

    Transparency is transparency and this is just sheer manipulation and complacency. I look forward to the day when they share the information with us that we, as members of this free society, have a right to view.

    I think you are spot on!

  • You are 100% wrong when you say that the UK has more data sets. As long as it’s a number’s game — the answer provided to you was that the US site has the numbers. You are 100% right when you say that the US data sets are relatively disappointing. That’s because you are looking for /meaningful/ datasets. A better way to measure is to see the popularity and utility of the site. In other words — if one site had only 500 really incredibly vital and useful datasets, then it’s far more relevant and therefore effective. But that’s a tough thing to measure on day one. Let’s see what it’s like in one year. JMO

  • I met a representative of the Obama Administration at SemTech 09. He was a very nice man who walked up to the mic during the Linked Open Data Panel. His question was amazing. He asked (I’m paraphrasing) “It’s one of the goals of the Obama Administration to release as much data as we can over the next 4 years, We would like to know how we should give it to you. Also, If any of you are interested in access to this please come find me”

    I’m in a rare position as someone with early access to both open government projects, as well as San Francisco’s CivicDB.

    The White house press secretary is just that- the press secretary. Though she probably has full understanding of why Data.gov is important, she may not have the background required for a debate. She did offer to put you in touch with someone who could respond.

    The data set comparison and the post at RWW could go back and forth a bit, but this post is all wrong.

    I think yes, we are all bloggers but when you post an email you received from someone, you’re taking something from a 1 to 1 context and placing it in the public eye. We’re all bloggers yes, but to me, identifying as a “blogger” implies social license to exercise freedom of speech in a forum accessible by billions, where the expectation is that you’ll be speaking your mind at the top of our lungs. To me, It doesn’t give us license to be disrespectful. That is sadly, still our own limb to climb out on. Which sounds like a judgement, but I assure you it is not. I love your writing. *and* as a blogger, from time to time I will rant off a cliff myself and have to take it back.

    Remember that you asked for this:

    Taking emails from a White House Press Secretary (or anyone at all) and airing them in public with a technical rebuttal is totally out of line. It makes her less likely to reach out to bloggers and reflects on RRW and I think you should apologize.

  • Marshall

    Jeez, Sid – now you’re making me feel terrible. To be honest, this email I received was really straight-forward and from a public official. I don’t think it’s out of line to publish it.

  • Jon

    Spot on, Marshall. Like you, I would be ‘happy’ to have a different opinion of our govt’s attempts. To me it just seems like their attempts are sadly attempting something less than adequate.

  • Jon

    Oops – no edit here so I have to add another (sorrrreeee!)

    The White House Press Secretary was doing the job. Any time a ‘press secretary’ responds to a public statement, that response should be understood to be a function of the job, which is to release public statements to the press.
    Marshall, you’re the press. You’re doing your job. The WH Press Secretary did her(?) job. Me? I’m the American Public – I’m doing my job, too: Staying Informed.

  • TK

    Marshall, Any communication by a U.S. government employee or officer is in the public domain. Don’t feel at all bad for posting it- it’s your right to do so. She knows that and wouldn’t be sending out anything she’d want concealed from the public. If she doesn’t understand this, she shouldn’t be the White House Press Secretary in the first place.

    And don’t worry that the Press Secretary will be less inclined to reach out to bloggers. She doesn’t “reach out” as a favor to you, she does it to help get her message out. Like Jon said, it’s her job.

  • You are not out of line Marshal every e-mail sent from a government computer is archived as public record. The only way to stop people from seeing them is claiming executive privilege (would be stupid in this case). You are actually on the ball for airing it in the context in witch it was written. That press secretary needs to get used to the fact that the modern internet means you are always on the record as a government official.

    The problem I have with the U.S. Data portal is that it doesn’t serve citizens well because they are used to searching for information via web search (like google). So bascialy you are right that the data is way less useful to the majority of people. You just can’t get much from geospatial data if you are a common person.

    The portal is great for entities that can mine geodata. Google, Microsoft, independent app developers like Waze, or other government orgs. In this respect the UK portal is totally inferior.

    But who should the site be aimed at really? My take is average people.

  • Ridiculous, pointless and inaccurate data-points are worthless. It’s a tie. 0-0.

  • Marshall,

    Just to be clear I definitely don’t think you should feel bad. I hate it when people do that because of things I say.

    There are people out here that are pretty sharp and read what you write and care what you think. I’d rather you felt successful at engaging your audience and that all this has meaning. I want you to feel empowered and heard by people who aren’t just going to say “right on” but have been invited to think. I enjoy your work.

    Perhaps I was a too demanding of an apology. We had a terrible 8 years of war, death, censorship and fear during the Bush Administration and I don’t like to see the “low bar” illustrated in our dynamic with our government. I’d prefer the bar be higher. I’d prefer we expect better. It influences the results we get.

    I apologize for being too defensive of Obama’s staff. They have a very hard job. Like taking care of a dog who’s owner used to hit them. It takes time before they stop wincing every time you lift your hand. Eventually you get to a better place.

    I think this thread needs to evaluate it’s cynicism. Jean Weinberg is a person. I don’t think it matters if she’s a press secretary or if the message was public domain or what: “—if you are interested in representing both sides of the story I’d be happy to put you in touch with some folks” is an awesome response to what you wrote. I would love to hear who they are and what they have to say. RWW is a big f*ing blog. You may have a shot at speaking directly with the CTO you referenced in the article.

    Personally, I know large data sets are hard to move around, transform and expose as linked open data. Someone has to do all that work. It would make no sense to release data in the order of how sensational it is in the media. Ideally the order of release is a logistical concern and not a tool of statecraft. It would be big news if you could confirm that either way. In any case, I hope you do get a chance to speak with the CTO, I have no doubt that you’d remain skeptical and penetrating and write what’s worthwhile.

    Take care,

  • I’m with you on this one: that was clearly an e-mail meant to be reposted, at least the relevant paragraph; legality is important when you are in an adversarial situation, but I can’t imagine the problem being your quote. I would have replied something less rebutting, and more openly re-orienting the debate: “My bad: US does share more data. However, what I’m interested in is the possible relevant use rather then metrics: 3kb on soldier suicides or arbortions tell me more about the UK or the US then 10Tb of maps—maps are great, a necessary basis, although… Let me rephrase: I want the same title of data as data.gov.uk” and I would have avoided the “can hardly be called”: that’s unnecessary, and you risk having the map-geeks hate you. And you can’t hide from those guys. And they are surprisingly cool & geekish, too.

  • I’m a map geek.

  • Well, I don’t know if I should write comments on this post as I’m not a US citizen, neither am I living anywhere near the US. But I feel that an Email sent from the Press Secretary of The White House “can” be published in public. I believe that for honest journalism, communication between one side and the press should be open.

    If one side asks not to reveal what was in the Email, then we all would be suspicious thinking if both parties are making any secret deal.

    Apologies if I’m writing something wrong. But that’s what I think as a technology journalist from my country.

  • Thanks for publishing the email exchange. I think it’s important to have public discourse about government in the US: “government by the people, for the people, and of the people” is something to strive for, constantly IMHO.

    That being said, comparing the numbers of data.gov and data.gov.uk seems like an activity that is bound to create some a flamewar of some kind. And sometimes the blogosphere seems to be optimized for that: the purpose being to attract attention, and the best way to attract attention is to create an argument of some kind–even if one isn’t warranted.

    So rather than a discussion about how to count data sets I’d personally be more interested in seeing you write more about the way in which the UK Gov’t has sought guidance from Tim Berners-Lee in articulating what should be done at data.gov.uk.

    It seems to me that the sort of understanding of what the web has to offer in the public data space is somewhat lacking here in the US. The focus on cloud computing in particular seems to be missing the point. I’d personally like to see more leadership in the data.gov space around how to make data available, what the plans are for making new data sets available, and even building new apps around the data services to make sure the approach works.

  • Marshall,

    I think that releasing (or at least making more visible and easy to get at) a large collection of geodata IS pretty important (and very valuable).

    Justin – I disagree – a site like Data.gov is NOT intended for the general public – rather is should be intended for 1000’s of app developers (and companies & journalists & students etc) who in turn will manipulate and extrapolate and visualize and otherwise reuse & use the datasets which are being made available.

    ie it is a data site – and data is exceptionally valuable and useful – and public datasets should be made electronically available and accessible and as such I think both the US & the UK’s efforts should be applauded.

    But more data sets isn’t really the measure of EITHER initiative – rather what should be done by journalists & critics is:

    1. Highlighting the great data sets which are available

    2. Highlighting where data sets are lacking, missing or questionable (i.e. are they missing stuff which should be there?)

    3. Continue to push for the data sets to be made widely available in as open as possible formats

    4. Focus not just on the raw data (which is important) but equally (and I’d say even more) on USES of that data – on great tools developers (and companies and media businesses and schools) are building on top of this data. Highlight great examples – and point out any which are deceptive (intentionally or not)

    And then push for additional datasets which might address gaps or errors.

    But realize also that government data should, ideally (I’d argue) be fairly neutral & meaningless by itself – it should be the raw building blocks of other efforts.

    Geodata is certainly one very crucial (and large) dataset. Literally the maps of the US (what is/isn’t the country for example) but also all the nuances of geopolitics which such maps represent.

    A few:

    – where do Federally protected lands actually end? (and thus related mining, water etc rights)

    – where are special areas such as Indian Reservations/native lands? (many are surprisingly contentious)

    – Where are state borders (esp when/if geological features change – think glaciers melting, rivers flooding etc)

    – where do US Coastal Waters end? (again surprisingly contentious)

    – Where are ALL the various US land claims? (hint this could be really really complicated – technically every embassy is US land…) But even just all the various protectorates, islands, military bases etc is pretty complex.

    – How have geological features changed over the time that the US government has collected geographic data? (i.e. are glaciers melting? where have coastal lands fallen into the sea? where have rivers changed their boundaries?)

    A random note – there are apparently a few parts of Virginia (and perhaps other states) whose legal status is unclear as they don’t appear on the “official” maps of the state – but are, in fact, physically present (map errors) as such these areas have somewhat unclear legal & tax status – i.e. if they aren’t on the land rolls of the state they don’t legally “exist” (at least that is what I read years ago)

    So yes, geodata could be really interesting.

    But an important issue & post – thanks for posting the exchange.

  • How cool (and a compliment) that the White House follows what you write!
    re: your posting of their email — If they follow you, then they know you’re a journalist. Unless you had an explicit agreement with them in advance, the writer would have no reason at all to believe that anything they wrote you would be confidential.

    A thought though — Could what appears to be a lack on data.gov be a side effect of so much being out there already in other forms? Just the turnaround time of getting these data into the overarching data.gov structure could be a factor. And there wouldn’t be a huge rush by bureaucrats to convert it back to “raw” format if it’s already out there.

    I have no objective evidence, just my subjective experience that — in the health statistics arena — the availability of federally collected data is quite formidable.

    I’ve been writing about medicine/health exclusively for ~15 years, and have watched as more and more figures about disease incidence, health trends, etc. have become available for download through various NIH-related sites. While pursuing a statistic I need for an article, I often get sidetracked onto “oh, wow! I never knew that!” tangents because there is so much interesting info there.

    Most times the data are already formatted for usability by the average non-expert, in PDF or Excel format. Graphic info such as bar graphs and pie charts also can be downloaded as PPT slides for direct use in presentations. (Tunnel into http://www.cdc.gov/DataStatistics/ for examples of these.)

    But the raw data can be downloaded too. Here’s one page that lists that sort of thing: http://www.cdc.gov/nchs/data_access/ftp_data.htm

    So maybe **some** of the difference in total # of U.S. and U.K. data sets reflects the fact that certain federal agencies have been way out front in getting the statistics out there already?