Data: Making a List of the Top 300 Blogs about Data, Who Did We Miss?

Dear friends and neighbors, as part of my ongoing practice of using robots and algorithms to make grandiose claims about topics I know too little about, I have enlisted a small army of said implements of journalistic danger to assemble the above collection of blogs about data. I used a variety of methods to build the first half of the list, then scraped all the suggestions from this Quora discussion to flesh out the second half. Want to see if your blog is on this list? Control-F and search for its name or URL and your browser will find it if it’s there.

Why data? Because we live in a time when the amount of data being produced is exploding and it presents incredible opportunities for software developers and data analysts. Opportunities to build new products and services, but also to discover patterns. Those patterns will represent further opportunities for innovation, or they’ll illuminate injustices, or they’ll simply delight us with a greater sense of self-awareness than we had before. (I was honored to have some of my thoughts on data as a platform cited in this recent Slate write-up on the topic, if you’re interested in a broader discussion.) Data is good, and these are the leading people I’ve found online who are blogging about it.

How the Blogs Are Ranked

I then ran these blogs through my favorite web service, Postrank, which looks at every post across every one of these blogs and scores them in terms of social media engagement: comments left, inbound links from other blogs, times that link was shared on Twitter, bookmarked on Delicious and more. Postrank then ranks all the blogs in any collection in terms of the amount of social media engagement they have received in recent history. That’s where this ranking came from. Nothing but which sites get included is under my control – so I think I can be objectively proud that my co-workers at ReadWriteCloud have come in at #3. Note that you might find a blog or two here where Postrank’s analysis of its feed needs a reset, because it’s hit an error and returned blank results. That’s what happened with the primary O’Reilly feed about data, and I’ve emailed Postrank to ask them to reset their scoring machine for it. That’s especially in need of remedy given that O’Reilly is working hard on a forthcoming conference all about data called Strata. (I’ll be there, moderating a panel on data-driven journalism.)

After I ran these through Postrank, I pulled down the data the way I wanted it using Needlebase, then put it in this Google Spreadsheet and embedded it here.

I did the same thing with 300 blogs about geotechnology last week – and just like I did then, I’ll ask now: who did we miss? I’d love to get these leader boards built out for several of the top topics ReadWriteWeb covers and turn them into weekly posts, covering the leading and ascendent voices in niche blogospheres covering topics that will change the future of the web and world.

I imagine that Data Blogs may be a bigger world than Geo Blogs, so I may have missed more this time. Let me know in comments if you’d like your blog included in the index and I’ll add it. Or if you know others that ought to be included. Fun times – and thanks for continuing to blog, folks, in this era of 140 character utterances!

  • Thanks for the comprehensive listing of data blogs. It’s a great starting place for our fledgling venture Visual Data India.

  • ZeroHedge is the number one blog in bigdata!!! He’s far ahead of Research Computing at NYU. Interesting criteria. I follow ZH. Apparently everyone else does too…

  • Marshall Kirkpatrick

    Yeah Ellie, that surprised me too 😉

  • santiago
  • Good list. I’ll be happy to see http://VisualJournalism.com included.

  • I wish you had provided an OPML file of these blogs.

  • HI Marshall,

    We’ve expanded your list to 1000. (I sent you an email). Is this OK to publish the top 500 in a comment ?

  • Marshall, I just found your collection of postrank / extractiv / needlebase / dapper posts and I feel like my head is going to explode.

    I have a client who needs top blog lists (based on social media engagement) like your geo- and bigdata lists in about 8 categories. I have the blog lists and I’d love to evaluate them with PostRank.

    Wondering if you can point me in the right direction to try and make that happen. I am looking at their Data Services > Content Stream, am I in the right place? I’m not expecting a tutorial, but if you don’t mind sharing a starting point, I’d be most grateful!

  • Marshall Kirkpatrick

    Jonah, just sent you an email.

  • Control-F and search for its name or URL and your browser will find it if it’s there.

  • Pingback: Case study with Data blogs, from 300 to 1000 « Influencers & Community Marketing()

  • Hi, Marshall. Interesting list! I’m at #33, but the URL on the spreadsheet is wrong – I’m actually at http://www.brentozar.com. The feed you’re using is a comment feed for just one of my posts. Not sure if that affects the rankings, but I found it pretty funny. Have a good one!

  • Pingback: 100,000 views « Blog about Stats()

  • Hi Marshall,

    I’d argue you missed me. I write about unstructured data and big data, was formerly the CEO of MarkLogic and current sit on the board of Aster Data.

    Thanks!
    Dave

  • Sushant Singh

    Nice Reading your blog it was really good. http://www.techmagnifier.com

  • Pingback: Data: Making a List of the Top 300 Blogs about Data, Who Did We Miss? « Another Word For It()

  • locksmith in san franciscoI always visit new blog everyday and i found your blog.

  • Marshall, awesome blog. It sucks that PostRank was absorbed into GA, are you still working on these types of data projects? I see huge implications for this type of data when it comes to people building distribution list for Press releases.
    Thanks a ton!