Feed on:
Posts
Comments

At a recent Semantic Web gathering at MIT, Maximilian Schich gave a talk about “Adding Art Research Data to the Giant Global Graph”. It sparked an interesting discussion regarding the valuable assets within some Linked Data systems and how to monetize them. It is a topic of interest to me as I believe a key adoption hurdle for Linked Data and Semantic Web technologies is in clarifying the value proposition to all parties involved (eg. producers, distributors, and consumers).

Schich talked about his conundrum of what to do with an incredibly rich data set he has access to describing historical art and archeology information. He presented how the data is already well defined within their data model and retrieval scheme. He also outlined how they propose bridging the gap to embrace the current and near-future Linked Data standards. The question, then, was how they would be able to pay for all this work.

There was a lot of discussion around possibly licensing access to the data via SPARQL, but the mechanisms for metering don’t exist, yet. Kingsley Idehen and some others discussed possibilities in adding support at the server layer for a formalized data response similar to HTTP 402 (ie. “payment required”). It was clear, though, that work would need to be done in this area before adopting it as a reasonable direction.

At some point during the conversation, I asked Schich what he believes is his more valuable asset: the data itself, or the presentation layer. After having seen the GUI of the demo application showing how the user could retrieve the data, it was clear that a lot of domain expertise was required to design it. Further, when he showed us the data schema and example retrieval/traversal modes, it was even more obvious that the average researcher would have to interface via the GUI (even if the data is freely available and fully compliant with SemWeb standards).

With this in mind, my suggestion was that he consider opening up the data entirely, forgoing any programmatic metering, and possibly license commercial access to it (allowing for free non-commercial use). My proposal was that they focus on monetizing killer GUI products tuned for each of their specific user groups. In this way, they could service both their institutional and individual users as appropriate.

Fortunately, Tim Berners-Lee jumped in and agreed. He clearly articulated (undoubtedly better than I could) the benefit in separating the data source from the presentation layer. Each, then could be treated separately in context of it’s use and license model.

Thinking about it later, I was reminded of the scene in Neal Stephenson’s Cryptonomicon when [spoiler alert] the characters find a pile of gold in the middle of the jungle on a remote Pacific island. It had been left there by the retreating Japanese army in World War II, and they were trying to figure out how to retrieve it through the unfriendly territory. Basically, what appears incredibly valuable at first glance (ie. a pile of gold = a mountain of rich data), is nearly worthless without a way to get it out.

I hope, however, that Schich is able to find a better solution than was presented in Cryptonomicon.

BTW - Since I mentioned them already, it’s worth noting that both Kingsley and Tim are giving keynotes at the Linked Data Planet conference in New York on June 17th. Also, you can listen to an interview I did with Kingsley a couple weeks ago as part of the DataPortability: In-Motion Podcast.

I play around with a lot of new widgets as they hit the scene, and of the zillion or so I’ve encountered, only a couple seem to have lasting value. Generally, I’m not a personal fan of the time-wasting variety of widgets, but I like the ones that seem to add to effective communication. For example, you can check out the about me page to see the LibraryThing widgets I use to show my reading interests (you can actually browse my complete library, if you’re so inclined). Similarly, each page of this site includes a couple other widgets for illustrating my interests as a tag cloud as well as my LastFM info.

I was recently playing around with the wealth of Amazon widgets you can create. If you haven’t already checked ‘em out, I recommend popping in and seeing what they’ve got cooking. There’s just about every type you can possibly imagine slicing-and-dicing any way you’d probably want. While checking it out, I played with their embedded store widget where you can sell anything you want on your site.

In order to try it out, I looked around my office for something I could sell. My eye landed on the one-of-a kind “J. Trent Adams Modern Man of Action” action figure. My brother made it for me a few years back, and I thought this’d be the perfect thing to sell though this kind of widget. Of course, I’d really rather not let it go, but then again, who’s gonna’ pay me US$500 for something like this?

Now, the reason for this post is that I found another widget from iTunes I’m playing with and I needed the Amazon real estate for it. Rather than toss out the Modern Man of Action one-of-a-kind opportunity, I moved it into this post. At least now it’ll live somewhere in case I want to refer to it in the future… or someone found a hefty chunk of change laying around they want to parlay into something hip.

Dose of Reality

I’ve always known that I live in a little bubble of people who think largely like I do. It seems perfectly natural to me that we hang out with people who share some sort of common interest. Even if the majority of your acquaintances are formed by proximity, there’s a high probability that everyone working together or otherwise living in the same neighborhood have some sort of connection.

What I bumped into during my recent holiday vacation, however, was an entirely different group of people than I usually encounter. While visiting my family in Colorado, I attended a few parties with folks I wouldn’t necessarily consider my posse. But when hanging out with the family, on their turf, you go with the flow.

Everyone’s always been interested in my work for the Patriots, and I’ve got a ton of amusing stories about the ten or so years I was working for them. Generally, there’s not a lot of explaining that has to take place prior to an anecdote; almost everyone knows what a football team is and what they do. This year, however, was very different.

When asked what I’m doing now, I would start with the standard matchmine elevator pitch. Over the past couple years that pitch has taken less and less further explanation. It’s hard to walk down the street in Boston today and not bump into a couple dozen companies grappling with the need to personalize their offering. Even outside involvement in the biz, most folks with whom I interact take the elevator pitch and easily start pushing their own question buttons.

Back home, however, I was surrounded by a shockingly large number of people who had only a vague notion of personalized media discovery. In fact, they assumed that the type of work we’re doing already exists under the covers of the sites they visit. Given my familial connections, there was a wide range of people spanning cowboys, doctors, judges, artists, and professional philanthropists. While not necessarily a mark of intelligence, the majority of them, while were working all over the map, had advanced degrees of one sort or another. Further, most of them talked easily about using email, the web, and tools like SMS, IM, and RSS feed readers in their daily lives (and not in a “gee whiz” sort of way, but as a matter-of-fact).

What I’m saying is that these are regular people who are relatively plugged in. While they’re not involved in building the online stuff, they’re actively using it. They’re moderately heavy computer users, tossing photos around with the likes of Flickr and regularly using iChat, etc., so I didn’t detect a learning curve required there. Further, everyone assumed that many sites performs some under-the-covers magic for their “You’re interest in this, so you might like that” features. It’s just that they didn’t realize these systems are all disconnected from each other.

Almost to a person, I finally got them on board when I said something to the effect that, “Using a MatchKey, you won’t have to do anything other than what you do now. Through our partners you’ll simply see an improvement in what they’re already offering you.” They were sold on the concept, then, when they got the fact that the MatchKey can be used as their proxy for their interests and tastes on various sites. It’s just that they don’t think about stuff like this much.

It was while ruminating on these discussions that I realized how important it is to remain focused on customer need as opposed to kewl tech. To that end, I have a renewed appreciation for the mantra of “don’t change the habits of the consumer, just improve their experience.” It’s easy to forget this reality when living in a bubble where everyone speaks the same lingo.

First Use of PowerMouse

I was finally accepted into the PowerSet PowerLabs beta program to give their natural language search technology a whirl. As far as I can tell, they’re not in “beta” or even “alpha” for any specific product line right now. Instead, they’ve released some demoware showcasing some of their capabilities.

Among the demos they’ve put up are structured queries for content related to business, arts, and quotes. Each of these allows the user to select one of a dozen or so canned query formats into which free-form nouns and/or verbs can be placed. The input for each of these canned queries is undoubtedly wrapped in some magic prior to execution against the specified source. By restricting user input, and directing the query toward engines tuned for the specific need they can narrow down the degrees of freedom.

For example, one of the business searches you can run is in the form “Who acquired INSERT:COMPANY”. Entering a company name will bring up a list of results highlighting the assumed relationships as found in articles on Wikipedia (their only current source for information).

Their goal is to home in on the intent of the user, and provide results that are better than standard keyword searching. To allow users to judge the quality of the results, most of the demos reply with side-by-side comparisons between what PowerSet can do next to what is returned by the inputs as keywords.

The demo that provides the most flexible input from the user is their PowerMouse application. Using it the user is able to build an undirected (ie. not forced to their “business, arts, or quotes” categories) query in the format “subject-verb-subject”. In fact, you can leave one (or two) of the fields blank to see what it finds. One of their canned examples is “zombies - eat - BLANK”. It is gratifying, then, to see Wikipedia articles returned that include “zombies - eat - brain” (along with eating “body part, boy, chick, debbie, flashback, franklin, galactus, granddaughter, hawkeye, head, man, meat, member, neighbor, people , richards, schoolchild, shell, study, sullivan, team, vet, yoshi”).

To try it out myself, I wanted to see what it would pick up about my friend Sunita Williams running the Boston Marathon while aboard the International Space Station. I already knew that there was a note about this on her Wikipedia bio page, so I figured it’d be a slow ball for PowerSet to knock out of the park.

I entered the query: “Sunita - ran - BLANK”

The result set included the following:

  1. Sunita - dump - maya
  2. Sunita - tie - maya
  3. Sunita - survive - ordeal
  4. Sunita - seek - reside
  5. Sunita - develop - feeling
  6. Sunita - get - marry
  7. Sunita - go - look
  8. Sunita - set - world record

I’m not sure what synset graph they’re using for “ran” in this context, but the first seven results were clear misses. The last entry, though, made me think PowerMouse hit upon what I expected as it was able to find Sunita’s bio page (as opposed to the unrelated “Sunita Parekh”, a TV soap opera character). Unfortunately, however, expanding the results showed that the blurb it keyed off was actually about her record-breaking space walk (no mention of the marathon).

To make sure I was remembering her bio page correctly, I checked and here’s the paragraph mentioning the marathon:

On April 16, 2007, she ran the first marathon by an astronaut in orbit.[6] Williams finished the Boston Marathon in four hours and 24 minutes.[7][8] The other crew members reportedly cheered her on and gave her oranges during the race. Williams’ sister, Dina Pandya, and fellow astronaut Karen L. Nyberg ran the marathon on Earth, and Williams received updates on their progress from Mission Control.

I’m surprised it didn’t clue into the part of the sentence that reads “she ran the first marathon”. That seems to be about as clear a match for the query as could be expected in many reasonable situations.

It’s too bad that the author of the blurb led the paragraph with a pronoun rather than Sunita (or Williams). It’s possible that the pronoun recursion required to connect the noun was unable to detect the association. Possibly compounding the problem is that the nearest previous noun was “Joan Higginbotham”.

I wonder, then, if the query would have picked it up had the sentence read “Williams ran the first marathon” (or more specifically, “Sunita Williams ran the first marathon”). Since it’d be better encyclopedia formatting to lead the paragraph with her last name, and updating the page wouldn’t hurt, I have half a mind to edit it and try the query again.

The shining star in the experience, though, was the user interface of their support site. Very nice use of in-situ form editing and feature flow. Not great knowledge repository, but fun to play with. My hope is they’re engaging with a good group of demo testers during their shakeout cruise. I look forward to watching as the training wheels come off to see how it works in the wild.

I’ve dabled with just about all the major collaborative bookmarking/tagging/link-sharing tools around. I put del.icio.us, Digg, Reddit, Technorati, etc. There are, of course, strengths and weaknesses of each one. The trick for me has been to integrate my use of them into my daily work stream.

Enter my new favorite class of tool: in-situ collaborative page annotation. I have no idea what others are calling them, but there ya’ go. I’m sure someone has come up with a clever Web 2.0 term, especially since mine is way too long.

Basically, these tools enable me to quickly and easily add my own notes to a page while reading it. The faster the tool works with only a few key clicks earns high marks in my book. Further, the more tracking / reporting / sharing functionality they have the better. For example, immediately after highlighting the pages I like to see the ability to immediately post my notes as a blog entry or otherise set it to track the page for changes.

The two tools that have floated to the top for me are Fleck and Diigo. My personal opinion having used both for a couple weeks now is that Diigo wins. While both have largely the same feature set, the RIA UI of Diigo really keeps me moving without waiting for page redraws. Further, they’re style guides associated with reading annotations (within their site, in a cross-post to a blog, or forwarded via email) are incredibly well designed.

The only downside I’ve found with Diigo so far is that it doesn’t automaticlly pick up the tags associated with the pages. It’d be a great feaure if they took a best guess based on the meta keywords in the header and let you modify them. Right now, though, it pre-populates the “tags” field with “no-tags” (and changing them requires a bit too many key clicks for my tastes).

As a side note, LibraryThing does a fantastic job of empowering the user to quickly and easily add tags to books in your library. Talk about great human factors (though minimal pretty pictures in their GUI)… in fact I’ll probably spend some time chatting that site up some time soon.