Get fed: Comparing 3 RSS feed scraping tools

I wrote a blog post today over at Read/WriteWeb about a small message posting service called CBox that’s being used by a man believed to be the last practicing blogger under Burmese military rule. CBox doesn’t offer an RSS feed, which is a real shame. For my post, I thought it would be nice to be able to offer readers an RSS feed they could subscribe to in order to follow the events there via this blogger.

Just because there’s not an RSS feed where you’d like there to be one is no reason to give up hope! Here are 3 tools you can use, depending on the circumstances, to scrape an RSS feed from a page that doesn’t publish one.

FeedFire

FeedFire was the first of these tools I discovered several years ago and I still use it on occaision today. You enter a web page URL and it delivers any new link-text appearing there over time in the form of an RSS feed.

There’s not a whole lot you can do with it and it’s a little messy, but it’s really easy and very fast to use. If you’ve got a simple page with headline links, say a company’s old-fashioned “news” page with links to PDF press releases or something – FeedFire is perfect. You’ll get every link on the page in your feed, but once you mark the extra links (like to the home page) as read you’ll forget it ever happened. Still, I’d keep this for personal use only if possible. It’s quick and dirty.

FeedYes

I used FeedYes in assembling an OPML file for a client recently and it worked great. It’s a joy to use, in fact. Like FeedFire, FeedYes picks all the links out of a page – but then it asks you to click on the first one on the list that’s useful and on the last one. In this way it determines which link fields on a page to track, instead of tracking them all. Very nice. It’s a touch harder to use, but not really.

Feed43

It had been awhile since I had used Feed43 until my post about the Burmese blogger. This is what I used for that and so far it looks like it worked great.

Feed43 is awesome. It displays the source code for any page you tell it to look at, then lets you identify any part of that code as the begining and end of an RSS item. For example, in the Burmese CBox I wanted to scrape, there are no links but author’s name is in bold. I told Feed43 that items for a feed start when the bold tag closes and end when the next open bold tag appears. That worked great. The other fields are a touch confusing, but I got them figured out. The help pop-ups are only marginally useful. Once you get it down, though, Feed43 is no problem.

The advanced features are great. You can export all your feeds in OPML and you can password protect your feeds.

By default the feeds are checked every 6 hours. In this case, that wasn’t frequent enough. I’ve paid $17 to publish up to 20 premium feeds for the next 6 months with Feed43. One of the things I’ll be able to do when I get my registration # (I wish it came immediately after I paid by PayPal) is to increase the frequency with which the feeds are checked for updates to once per hour.

Concluding thoughts

Clearly different tools here are appropriate in different circomstances. One other note I’d make though is that if you are building these feeds for someone else, as I often do, you should run whatever you create through FeedBurner. That way, if anything goes wrong – you can log in to FeedBurner and change it on the backend (change the source if you have to) without bothering the people who are subscribed to the feed. They shouldn’t have to sub to a new feed if you can just make something else the source feed for FeedBurner, in the worst case scenario.

Speaking of making feeds for other people, I can’t emphasize enough the importance of paying for premium services. This is especially true if you are doing it for work. Just tell whoever’s paying that this is how much it costs and there is no practical free option. These people who build these tools deserve to make a living too, and they rarely do.

I hope that this has been helpful. There are other options available, but these are three I’m most familiar with and I think they illustrate the breadth of options well. Let me know if you have any questions.

Marshall Kirkpatrick

Consultant to green tech, renewables, and sustainability organizations // Motivated by love of learning and the earth

Get fed: Comparing 3 RSS feed scraping tools

FeedFire

FeedYes

Feed43

Concluding thoughts