Help Me Articulate the Potential Of Twitter’s Annotations

At last week’s Chirp developers’ conference, Twitter announced plans to release a new feature called Annotations. As I understand it, it will be a way for any Twitter client program to add a metadata payload to each tweet it publishes, with any namespaces it desires. The potential here is poetic, epic, crazy awesome huge. Kim-Mai Cutler’s coverage of it on VentureBeat has been very good, she quotes one unnamed developer as saying it’s “the most disruptive thing Twitter’s done in two years.”

I have been trying to wrap my head around it so I can write about what it means for developers and non-developing end users. This deserves the blogging equivalent of a song, belted out with clear notes and a catchy melody. I’ve got librarians asking me to write about this, on Twitter, and when librarians call – a writer must answer.

I’m reaching out to some of the smartest people I know to get their thoughts about this, and consider yourselves among that group. I would love for you to share any quotable thoughts you have about Annotations in comments here. I will fold your best thoughts into the song I sing while I travel from village to outpost, singing to tell the tale about the epic development Twitter is about to attempt.

OK so really I’ll just blog about it from my bedroom office, but hopefully a lot of people will read it, so please share your thoughts below and make ’em good! Thanks!

  • Teresa (PDXsays) Boze

    We need serious consideration from folks who know their stuff before we create a convention. For example, I have seen coders I deeply respect suggest annotation marks. Problem: these same marks are used in texting to indicate emotive and other content context implication. It’s important that this doesn’t turn into “iPad” branding.

    Here’s some folks your gonna like. They are the grooviest, geekiest, and most open bunch you’ll ever find.

    One of Portland’s hidden gems is it has a strong history of producing world-class indexers. From the Pacific Northwest Chapter of the American Society for Indexing:http://www.pnwasi.org/benefits.htm

    A good index helps users by:

    * identifying information a user might look for
    * distinguishing substantial information and passing mentions
    * providing terminology that may not exist in the text
    * analyzing concepts to produce headings
    * directing a user, through cross-references, to appropriate terminology and concepts
    * grouping together references to the same topic
    * organizing entries systematically, e.g., hierarchically, alphabetically
    * collecting different ways of wording the same concept
    * providing subentries (rather than long strings of unanalyzed page references) to guide researchers directly to a specific aspect of a topic
    * retrieving information for review by students
    * filtering information for the reader in order to prevent burnout
    * facilitating quotation by other authors, by the media, by students, by readers
    * anticipating the reader’s viewpoint; that is, entries are worded to be useful to non-experts asking questions or looking for information
    * filtering the avalanche of information to reduce overload
    * organizing “aboutness” for quick recall

    A good index helps publishers/companies/authors by:

    * reducing the number of calls to a support hotline
    * ensuring your company’s positive, interactive and ongoing presence on a permanent basis in thousands of customer homes and workplaces at minimal cost to the company.
    * permitting a potential buyer to compare books to verify inclusion of secondary topics not mentioned in the chapter headings
    * allowing professors to choose textbooks based on whether they or other known experts or researchers in the field are quoted or discussed in the book
    * focusing the book’s information gateways to a specific audience
    * showing an author’s pride in his or her own work, and a regard for researchers and readers
    * determining whether a university library purchases the book at all

    Thanks to Martha Osgood for her contributions to this information.

  • Todd

    I’d recommend waiting until Chris, @noradio and @apparentlymart iron out activitystreams annotations;

    http://twitter.com/chrismessina/status/12301191727

    Once done, THAT will be truly newsworthy.

  • Marshall

    That is certainly an important and interesting part of the story. But Todd, I’d love to hear more of your thoughts than that!

  • This is the equivalent of adding semantic information to posts, giving machines the power to parse and understand the underlying currents. It is in these under-current trends where developers can really take advantage of the platform and monetize. This could basically kill A LOT of companies in the real-time/LBS space. Miso? Check. Foursquare/Gowalla/Brightkite/Whrrl/etc? Check. Yelp? Check.

    Additional ideas are laid out in Mai Cutler’s article, but the main trend here is that you now have a huge user base as a platform. And I guess this is what it really comes down to, annotations are a platform.

    But I feel there should be some guidance as this could get messy, real quick.

    Scobleizer has some scary thoughts at http://scobleizer.com/2010/04/15/twitter-annotations/ – the scary part is the non-updateability of annotations and the work arounds already being formed. Also scary is the weather section.

    As you well know, there are top-down and bottom-up approaches to aggregating this information. But this is not the web, this is a controled “protocol” where this metadata can and should be enforced. Maybe a non-biased working group would actually help in this situation.

    I do agree with Todd’s point on the Activity Streams notation. This would make life a lot easier for all and would help Salmon out a bit along the way.

    Twitter has dug a huge hole to be filled, which is a terrible mistake. There is so much monetization potential here. Forget the ads, mine the data.

  • Hello,

    Please see the article by @eugmandel (co-founder of @mustexist) on the subject of Twitter annotations: http://blog.mustexist.com/2010/04/14/tweet-annotations-a-way-to-a-metadata-marketplace/

    Enjoy, and Regards.

    Thank you,
    –Alex (@AlexSherstinsky)

  • Annotation could allow extracting concepts/entities from the text of tweets. “Could”, because you can annotate a tweet only when you publish it. More details on this: http://bit.ly/aAoNsW

  • It will be interesting to what happens with annotations, and what type of metadata will be put there. I can think of several things that they could be used for:

    – Hashtags replacements. Current hashtags are a way to embed metadata into tweets, but could be replaced via a “hashtags” annotation.

    – Could reference the full URL and media type (image, video, etc) of any included links in the tweet. This would make it easier for search engines to filter by certain type of content.

    – Integrations with other services. Flickr’s “machine tags” are similar in nature for including metadata for machine to use, which powers things like the Flickr -> Upcoming.org integration, among others.

    – Curation. Scoble wrote a good piece on the needs of curators (http://scobleizer.com/2010/03/27/the-seven-needs-of-real-time-curators/) and I could imagine annotations being used to solve several of his 7 listed requirements. But they will be write-once, so only when initially posting a tweet, which might make bundling-after-the-fact difficult/impossible.

    One challenge will be whether this devolves into tag soup. Twitter has indicated they are not going to be suggesting a set of standard metadata, but will leave it up to the ecosystem to develop. The tricky part there is that it is a big & fractured set of clients that would have to agree on what to write and how to treat embedded annotations. Still, hashtags, “RT” and other microsyntax evolved in a similar organic, fashion.

  • Hi Marshall,

    I’ll try to articulate how I personally see this.

    Twitter Annotations change the shape of a “tweet”, they provide the means to transfer packets of Data around the internet, together with a short human readable message.

    Inherently these data packets come with two huge benefits, they already have a simple short addressing scheme, and more importantly they have an inbuilt multi level publish-subscribe model, whether that be through the conventional “follow” relationship, or the more complex Twitter Lists.

    Specifically, with regards the annotations themselves, they are <2k packets of metadata, so if you consider adding the following key pairs to a tweet, you may see one of the many places this can go:

    type: Event
    startTime: 2010-08-12 13:00:00 UTC
    locationGeo: 52.124 12.5364
    moreInfo: http://example.org/my-great-event

    In short, anything which can have it's main properties described in under 2k can be published and stored via twitter annotations, which in reality covers almost everything from products to events.

    Other simple uses include tagging tweets, providing provenance data, adding creative commons style licenses, adding further "in reply to" data to build a multi level discussion model, and quite possibly even encrypted secure messaging using keys.

    Finally, Twitter Annotations make much more sense when you consider them in context with the bigger move and announcement Twitter have made, namely the move in to identification and authentication via twitter+oauth (single sign on in many ways), you are already familiar with how simple this makes things via comments on ReadWriteWeb, now consider twitter as a centralized data API with pub-sub and short addressing built in, accessible by a plethora of clients / applications, and further one that is rolled out and easy to use by all.

    Best,

    Nathan

  • Well … I was at Chirp. I very briefly talked with @noradio about the subject. My position is that this ought to be a *formal* RFC and a “semantic web” *standard*. Yes, negotiations, discussions, committees, etc. 😉

    Sir Tim Berners-Lee got a significant chunk of funding – has anyone actually asked *him* to weigh in on this? Thompson Reuters OpenCalais? Any of the other existing semantic web organizations / companies?

    There is a new Google Group spawned for discussion at http://groups.google.com/group/twitter-meta. And there’s an existing discussion thread on http://groups.google.com/group/twitter-development-talk/browse_thread/thread/0fa5da2608865453

    Now for the quotable sound bite. Somehow, I think we have to stop just creating awesome new innovative and complicated technologies and get back to solving *real* problems for *real* people. The servers powering Internet usage are now a significant contributor to the global carbon footprint, for example. Are we making lives better with *that* use of carbon than we would with other uses? Are scarce resources better allocated to carbon use reduction than to semantic web technologies?

    To be blunt, is self-actualization for Internet technologists coming at the expense of basic needs for non-technologists? Sometimes I really wonder these days.

    http://www.businessballs.com/maslow.htm

  • Talk about Serendipity!

    My post about Data 3.0 [1] is all about the likes of Twitter producing Structured Metadata, in whatever Data Representation works for them.

    The key to metadata structure is the Entity-Attribute-Value model [2] combined with the ingenuity of the HTTP abstraction for Naming and Data Access. Actual “Data Representation” can be in a myriad of different formats. The Model, Naming mechanism, and Data Access matter much more than actual Data Representation Formats.

    The most import thing is this:

    1. Use Generic HTTP Scheme Identifiers of Entity Names

    2. Ditto Entity Attributes

    3. Optionally with regards to Attribute Values (which can also be Literals).

    Put 1-3 together into a Document (an Entity Descriptor Document) and expose said Document at a Network accessible Location (e.g., a URL re. World Wide Web).

    Again, the Entity Descriptor Document contents exist in a variety of Data Representation Formats. The key thing is that said format expresses an Entity-Attribute-Value graph (*the critical constant for structured data*).

    In addition, if possible, you should construct Entity Descriptor Documents that present the EAV model based Structured Entity Descriptions in Human and Machine oriented formats, depending on Document Viewer.

    Execute on the above, an Twitter not only delivers Linked Data in a major way, it would also have solved a critical piece of “The Semantic Web Project” comprehension and bootstrap puzzle.

    Over to you Twitter !!

    One last thing:
    Twitter will not only gain even more credibility, it will actually have a business model that extends way beyond Advertising etc..

    Links:

    1. http://bit.ly/cA0zxw — Data 3.0 Manifesto
    2. http://en.wikipedia.org/wiki/Entity-attribute-value_model .

    Kingsley

  • My fervent hope is that developers will adopt linked data standards for annotations.

    One of the first things that I expect to see once annotations are released is that twitter web clients will publish RDFa when a tweet is annotated with some known ontology. Google already indexes RDFa so we could see real-time semantic web search.

    Twitter’s widespread adoption and competitive developers could mean that semantic data finally gets a decent user interface and that users get a good incentive to create semantic data.

  • David Semeria

    According to twitter the annotation(s) can occupy up to 512 (later 2k) bytes.

    Is this the first system in which the meta-data is larger than what it describes?

    It’s like having a filing cabinet where you can store more stuff on the label than inside.

    Prediction: annotations will evolve into the true payload, and the tweet itself will become the descriptor.

  • As many have already mentioned, semantic web standards and Linked Data are the way to go for Twitter annotations and I can hardly hide my excitement.

    Twitter’s annotation format (tweet – namespace + key – value) fits perfectly within an RDF triple (subject – predicate – object). By embracing semantic web standards, Twitter wouldn’t lose anything, while it could definitely gain a lot.

    Two major points, IMHO, why Twitter would benefit greatly from the Linked Data approach:

    1. It is standardization out-of-the-box. Standard vocabularies can be used, so annotations can be easily reused by other parties, even outside the Twitter context. For example, if a Twitter application annotates tweets with foaf:homepage, other applications can be 100% sure what that means and how they can reuse that piece of information, as FOAF is an open, documented vocabulary. On the other hand, if everyone invents their own undocumented annotation formats (abc:xzy), reuse will be difficult at best.

    2. It is already supported by a large community and the W3C. This could cause a huge influx of fresh ideas into the Twitter API community. There are already many smart people in the Linked Data community that have been testing different ideas for almost a decade now, but have lacked a real-world platform where their ideas could be applied. Twitter could become this platform.

    The standardization of metadata on the web is a problem already solved by W3C and Twitter shouldn’t bother themselves trying to solve it all over again. I think they have an opportunity here to push Linked Data into the mainstream and become its first “killer app”, which is something that many of us have been waiting for.

  • The idea of enabling individual applications to bind arbitrary metadata to tweets in a standard, accessible way is outstanding. But I believe the simplest and most elegant way to accomplish this has not been discussed.

    The absolute easiest way to enable metadata to be associated with tweets is to assign unique, transparent, persistent identifiers — URIs — to tweets which would then enable third-party applications and services to persist arbitrary assertions (RDF, accessible as linked data) about those tweets. Today this has only been approximately possible; for example, the URLs we see at the Twitter site for individual tweets (apparently) are not sufficient for identifying the tweets in portable and persistent ways.

    The DOI community started speaking of these possibilities approximately 15 years ago w.r.t. a metadata ecosystem based on binding metadata to content identifiers, but that world never had anything as compelling or pervasive as Twitter.

  • Marshall, the way I read the Annotations announcement:

    Annotations:Twitter::Microformats:Blogging

    I’m not deep into this to be able to offer a qualified assessment but that’s my take from my limited understanding. There may be useful lessons to be learned from studying the evolution of microformats/structured blogging if that is in fact an accurate analogy.

    Sean

  • Here’s my two cents:

    Allowing this type of metadata will create the largest pool of semi-structured, publicly-usable data set in the world. Overnight.

    There are many wonderful ways that this will be a good thing – adding structure and context will improve findability and curation capabilities of real-time information.

    But, the fact that Twitter will not be moderating the metadata that’s allowed means that some nefarious things come with this technology.

    For instance, terrorists and insurgents will have the ability to encode metadata as coded messages and wayfinders into the vast river of Tweets. No more hiding in chat rooms, where they can be infiltrated reasonably easily. And nothing to subpoena from Twitter or other social networks, because it can all be default-public – hiding in plain sight.

    Also, I see large potential to add financial or transactional data into the Tweet stream this way – allowing more objects and machines to Tweet (like your credit card) in a way that’s useful and doesn’t clutter the human stream. For example, purchase information could be tagged with a unique code for the store and transaction details, and voila: a Twitter-based receipt.