Category Archives: Search

Word Frequency Visualization of Sarah Palin’s Resignation Speech

Below is a visualization of the most commonly used words in Sarah Palin’s resignation speech today. The full text of the speech is available online and I grabbed this image using Wordle.net – always a good thing to do when a politician gives an important speech. It’s interesting. It might be good to compare this cloud of words with a similar visualization of some of the other Republican governors resigning this summer.

Draw from this what you will. I’ve been reading coverage of the events through Memeorandum, a great source for political news, and the one thing that stands out to me in this visualization is that allegations Palin addressed the nation and not the state she was serving seem questionable given how much she talked about Alaska and Alaskans. It is also interesting to see how many times she used the word “dollars.” She used the word government far more than she did family, though when watching the video of her press conference it sounded like she was really talking about family a lot.

Do you think this kind of analysis can be truly useful? I think that it’s most useful when comparing multiple speeches for content, but even then I’m not sure how to read the meaning of word frequency.

See also a comparison I did in January at ReadWriteWeb of President Obama’s inaugural speech compared the Bushes’ and other past presidents.

Data analysis is fascinating and of course much larger opportunities to engage in it are becoming available every day online. I believe we’re going to see a whole lot of innovation making use of the text of conversations as a foundation for analysis in the near future. Not cute little stuff like this, but big, ongoing, ambitious projects. Hopefully for more than just marketing purposes. Here’s a blog post and great audio interview on that topic, if you’re interested.

Click This Button To See Into A Twitter User’s Soul

Twitter isn’t just a short messaging service – it’s a major communication platform that can be sliced and diced for all kinds of competitive and market intelligence research. And news writing. And who nows what else.

Last month I wrote a post at ReadWriteWeb titled “The Inner Circles of 10 Geek Heroes on Twitter.” It was all about a service called Mailana where you can plug in any Twitter user name and get a chart and graph of the other Twitter users that the user in question has had the greatest number of reciprocal public @ conversations with. It’s a way to systematically identify the influencers of the influencers in any field (on Twitter).

Just to prove to myself that it works in any field, I did a search of user descriptions in Twellow for the words “veterinary medicine” and found one of the top Twitter users in that field. I then ran her username through Mailana and was able to discover 13 people that she speaks publicly with most regularly on Twitter. It was pretty cool.



Tonight Tantek Çelik helped me figure out how to make a bookmarklet that you can push while on any Twitter user’s page to view their Mailana graph of closest connections. It’s awesome.

And so I present for your drag-to-toolbar pleasure…

Mailana – The Twitter Social Network Analyzer.

Please use it for good and not evil. And don’t let anyone tell you that there aren’t serious use cases for Twitter.

You can join me on Twitter here.

Three Useful Research Tactics I Learned Last Week

I’m always trying to figure out how to get more out of the tools I find online. I spend a lot of time figuring out new ways to discover good sources of information on a wide variety of topics; setting up systems for our writing staff at ReadWriteWeb and for consulting clients through my personal blog. Some of the things I’ve discovered lately I can’t disclose publicly, but here are three I can share. I hope you find them useful.
Continue reading

5 Minute Intro to Yahoo Pipes

I’m in the San Francisco airport flying back from a wonderful Foo Camp where I lead a discussion about RSS power user tips. It was a lot of fun. Several of the attendees had never used Yahoo! Pipes, one of the most powerful tools in the RSS toolbox. I told them that I too didn’t really learn to use Pipes for a long, long time after I first discovered it because it seemed too complicated for my poor little non-developer’s head. Once I was shown just two buttons to push in the service, though, I found out that some great results are actually very easy to achieve using Pipes. Just seeing some one do the simplest things there makes it a lot less scary. In that same spirit, I offer the following 5 minute screencast demonstrating 3 simple things you can do with Pipes. I hope it emboldens you to learn how to do even more with the service, but even if you only feel comfortable doing this much – I believe it will still prove very, very useful. Plus it will keep your toes safe (you’ll know what I mean after watching the video below.
Continue reading

The Awesome Potential of the Semantic Web

I just listened to the most amazing podcast about the future of the web and semantic analysis. It was an interview with BYU Phd student Yihong Ding, a researcher in what my ReadWriteWeb co-author Alex Iskold calls “the top-down semantic web.” The first 15 minutes of the hour long show are about Yihong Ding’s personal background, the next 15 about his research and the last 30 about his very compelling view of the future.

This interview shows just how much untapped potential remains in the world of web applications. Once our software is capable of deriving meaning from web pages it looks at for us, there’s a whole lot of work that will already be done, allowing our human, creative minds to reach new heights.

Download MP3 [50 mins, 23Mb]

Ding’s research combines the application of a manually supplied ontology (set of terms with connections for meaning), automated analysis of the structure of a web page (what’s in h2 tags? that’s probably a section title) and learned meaning after repeated application of the above and correction by the user. It’s fascinating and a prototype should be available in the first half of next year. I hope to get an early look at it so I can write about it on ReadWriteWeb just before public launch.

The vision of the future described in the interview is beautiful. It’s one of the most clear explanations of the semantic web and what some people call web 3.0 that I’ve heard yet. I’m just starting to dive deep into this, so forgive any excess enthusiasm, but I’m telling you – it’s good stuff.

Ding’s vision of a future web not of sites and pages but of “educated agents of meaning” (smart software applications is what I’m seeing), driven by human beings to serve our needs, is a really interesting one.

His conclusion makes me think of Google Custom Search, Lijit (which I must spend some time with) and I don’t know what else. It’s got me on fire, though.

I found the interview through a path you might find of interest. It was highlighted in the blog of Talis, a vendor in the semantic space, in their This Weeks Semantic Web round up. It’s a very rich resource, not to mention a great marketing asset for the company. I found that via the blog of semantic web rock star Danny Ayers. I was reminded of Ayers’ blog and have picked it back up with a renewed interest after seeing it in a list of 60+ Semantic Web Blogs at Semantic Focus, a fascinating looking group blog where, co-incidentally interview subject Yihong Ding is a regular contributor. So we come full circle and have found a whole lot of valuable resources along the way.

Prioritizing your reading list and doing rapid niche research using AideRSS

AideRSS is a service I’ve wanted to make creative use of for some time. It’s neat – you supply an RSS feed and it ranks posts in that feed in order of reader engagement. The company is Canadian, too, and Canadian internet stuff is totally hot.

AideRSS scores each post by the number of comments it received, number of times it’s been tagged in del.icio.us, inbound links from a number of blogsearch engines, etc. Thankfully, it scores those posts relative only to other posts in the same feed. So while a post on TechCrunch with 20 comments might score a 5 out of 10, for example, a post on Marshallk.com with 20 comments would score a 10 out of 10! Unfortunately, and this is a big dissapointment, AideRSS is just plain wrong far too often – reporting, for example, completely inacurate numbers for several posts in my feed. Come on AideRSS team, fix these problems. So it’s nothing to bet the bank on, but there’s some real potential here and as a rough guide it could still be useful today. I’ve contacted AideRSS to ask why they are getting things wrong as often as they are.

That’s all well and good, it’s a good way to see which of your posts are getting the most reader engagement (at least via these gestures being measured) and the widget that AideRSS provides is a neat way to highlight your most popular posts – but I know there’s a lot more that’s possible here.

Tonight I tried something unusual, at least it seemed that way to me. I plugged the RSS feed for items I’ve tagged “toread” in del.cio.us into AideRSS. It worked! It appears that the service figured out which were the hottest items in my feed. What a handy way to prioritize! I could grab scored RSS feed from AideRSS, including “good posts”, great posts or only the best posts. Here’s a widget displaying the best posts currently in my “toread” feed, according to AideRSS.



Isn’t that cool? Obviously it would be nice if users could define the number of characters and items displayed in that widget and the metrics used don’t capture anything personalized – but nonetheless, I think there’s some real potential here. (The numbers fetched aren’t always accurate, either – hopefully that will improve.)

Here’s an idea I thought of previously: say you’re looking to identify some of the top blogs in real estate. (Woo hoo!?) I would recommend starting at http://technorati.com/blogs/real_estate and sorting from authority. There’s an export in OPML link there, which unfortunately will not give you anything other than the top 10 blogs in that category no matter what you try to do, but you can import that OPML into AideRSS. You can then see the hottest posts in each blog, in other words: you can get a feel for what that blog’s community of readers takes interest in. So Technorati+AideRSS = easy identification of the biggest interests of top niche bloggers’ reading communities. Sounds invaluable to me.

These are the kinds of ideas I help come up with and implement with my consulting clients; though we wouldn’t want to depend too much on a tool that’s as loosely accurate as AideRSS is today.

If this general idea is of interest to you, perhaps more for personal use than marketing purposes, see also Rogers Cadenhead’s recent post on APML – Attention Profiling Markup Language. I tagged it in my blog and shared items feed, which you might like to subscribe to.

Thanks for reading.

The best things about Technorati

Technorati CEO Dave Sifry stepped down yesterday and the news gave cynics another opportunity to talk smack about blog search in general. There are a handful of things I really like about Technorati and I think the company deserves a bit of defense. If Technorati takes a dirt nap, I’ll be bummed for a number of reasons. (I’ve had the phrase “dirt nap” stuck in my head for weeks and am very relieved to have the chance to use it here!)

It’s not the full text search of blog posts that Technorati is really good for. Google Blogsearch is faster if you want to know if anyone has beat you to a story and Ask.com has much better spam control as it only indexes feeds that have a certain number of subscribers in Bloglines (hello, Google Reader and Blogsearch teams). Technorati has created a whole bunch of awesome experimental features, some of which worked and some of which didn’t. I don’t know how many of the people behind much of that innovation are still at the company but I hope things brighten up over there in the future.

What is Technorati good for? First, the Blog Index section of the site is very useful. Go to http://technorati.com/blogs/wtfeveryourelookingfor and you’ll find blogs that have been tagged as a whole, not on the level of a single post, by their own authors. Sort by “authority” (shudder) and you’ll see the ones with the most inbound links. I was talking to a potential client on the phone last week he asked “are there a lot of real estate blogs?” I knew anecdotally that there were, but quickly visiting http://technorati.com/blogs/real_estate told me there were more than 12,000 in Technorati alone! The Blog Index makes it easy to see which, by one standard, are some of the top blogs in any niche. It’s not perfect but it’s a good start.

Unfortunately, OPML export of anything more than the first 10 results of these searches isn’t possible. That looks to me like broken functionality and as the company slashes staff I have to worry that there’s little hope of the best parts of the service being maintained or improved upon.

The second cool thing about Technorati is the company’s partnerships with outside traditional large publishers. Specifically, the kinds of relationships they’ve built like the one with the Washington Post. In some sections of the WaPo website, you can see blogs linking to that article displayed in a little box, curtosy of Technorati. If those are sorted a bit for spam and crap then that becomes great stuff. I know that Sphere is providing related functionality on some sites, but it’s not the same. The ins and outs of this sort of service deserve a big blog post in and of themselves.

Finally, the Technorati 100 is a good thing. I know there’s a whole lot of criticism of it and a lot of that is valid. I don’t like the word “authority” and I don’t like measuring authority by links – but linking does mean something and the fact that Technorati shows off a leader board of that metric is worthwhile. FeedBurner ought to too, if the group feels like separating out blogs from the other feeds they publish.

I know that Technorati has been painfully slow at times, the most recent site redesign is awful and the focus on inbound links is overdone – but it’s an important company that deserves support in my opinion.