Data: Making a List of the Top 300 Blogs about Data, Who Did We Miss?

Dear friends and neighbors, as part of my ongoing practice of using robots and algorithms to make grandiose claims about topics I know too little about, I have enlisted a small army of said implements of journalistic danger to assemble the above collection of blogs about data. I used a variety of methods to build the first half of the list, then scraped all the suggestions from this Quora discussion to flesh out the second half. Want to see if your blog is on this list? Control-F and search for its name or URL and your browser will find it if it’s there.

Why data? Because we live in a time when the amount of data being produced is exploding and it presents incredible opportunities for software developers and data analysts. Opportunities to build new products and services, but also to discover patterns. Those patterns will represent further opportunities for innovation, or they’ll illuminate injustices, or they’ll simply delight us with a greater sense of self-awareness than we had before. (I was honored to have some of my thoughts on data as a platform cited in this recent Slate write-up on the topic, if you’re interested in a broader discussion.) Data is good, and these are the leading people I’ve found online who are blogging about it.

How the Blogs Are Ranked

I then ran these blogs through my favorite web service, Postrank, which looks at every post across every one of these blogs and scores them in terms of social media engagement: comments left, inbound links from other blogs, times that link was shared on Twitter, bookmarked on Delicious and more. Postrank then ranks all the blogs in any collection in terms of the amount of social media engagement they have received in recent history. That’s where this ranking came from. Nothing but which sites get included is under my control – so I think I can be objectively proud that my co-workers at ReadWriteCloud have come in at #3. Note that you might find a blog or two here where Postrank’s analysis of its feed needs a reset, because it’s hit an error and returned blank results. That’s what happened with the primary O’Reilly feed about data, and I’ve emailed Postrank to ask them to reset their scoring machine for it. That’s especially in need of remedy given that O’Reilly is working hard on a forthcoming conference all about data called Strata. (I’ll be there, moderating a panel on data-driven journalism.)

After I ran these through Postrank, I pulled down the data the way I wanted it using Needlebase, then put it in this Google Spreadsheet and embedded it here.

I did the same thing with 300 blogs about geotechnology last week – and just like I did then, I’ll ask now: who did we miss? I’d love to get these leader boards built out for several of the top topics ReadWriteWeb covers and turn them into weekly posts, covering the leading and ascendent voices in niche blogospheres covering topics that will change the future of the web and world.

I imagine that Data Blogs may be a bigger world than Geo Blogs, so I may have missed more this time. Let me know in comments if you’d like your blog included in the index and I’ll add it. Or if you know others that ought to be included. Fun times – and thanks for continuing to blog, folks, in this era of 140 character utterances!