Attribute-Based Messaging and SemWeb Overlap

Sitting in a talk by Peter Neumann about “Identity and Trust in Context” at IDTrust 2009 he mentioned the use of attribute encryption within Attribute-Based Messaging (ABM). As I was unfamiliar with ABM, I found the following description from the paper “Using Attribute-Based Access Control to Enable Attribute-Based Messaging” by Rakesh Bobba, Omid Fatemieh, Fariba Khan, Carl A. Gunter, and Himanshu Khurana.:

Attribute-Based Messaging (ABM) is the concept of allowing lists of messaging recipients to be formed dynamically by using an attribute-based recipient address. This approach brings the flexibility of attributes in enabling the sender to send targeted messages, which 1) enhances the relevance of messages to the recipient and 2) allows the sender to send confidential messages knowing that the messages would be delivered only to the intended recipients.

Basically, what this means is that a user wanting to send a message to unknown recipients would run a query against a system so it was only sent to people who match the selected attributes. For example, I could use an ABM solution to send a survey of IETF participation to colleagues who are members of at least three IETF discussion lists.

I immediately thought that this is the type of solution that fits squarely in the sweet spot of the Semantic Web. I could easily see that if the attributes are encoded using RDF, an ABM system would seem to be an excellent use case leveraging SPARQL. Looking around, though, I can’t find anyone working on this approach.

Does anyone have any examples of or suggestions for this idea in practice?

  • Share/Bookmark

Organic Growth of the Semantic Web

I recently had a brief Twitter exchange with @MarkHawker about the term Semantic Web. It started with his tweet:

Would love to see how all these “semantic web” applications are utilising the full SW stack with ontologies, trust and related technologies.

Quickly followed by:

All I fear is Semantic Web will go down same route as Web 2.0 definition. Needs to be clarity and understanding of underlying technology.

To which I responded with:

@markhawker With a lot going on in the SemWeb space that’s not strictly utilizing the “full stack”, I still see the movement as positive.

He followed up with:

@jtrentadams Agree movement positive as achieving full stack is one of toughest computing challenges. Though appreciation of stack needed.

And shortly thereafter with:

@jtrentadams Analogy of me having a steering wheel & engine & claiming to have a car. Devalues contributions in other areas of innovation.

Since I didn’t have a chance to respond quickly enough before the thread went stale (easy to do when I step away from the computer for more than a nanosecond), I thought I might as well follow it up here.

I’m not really one to be hung up on terms, so I don’t really mind the loose application of terms like “Web 2.0″. In my opinion, it’s just a moniker people can use as a placeholder for a grouping of technologies creating something more than what was originally rolled out in 1994. There are endless debates about what it really means, and I’m not sure anyone’s going to agree to a definition any time soon. Perhaps that’s a job best left to the historian class of 2050.

For sake of this post, assume that Web 1.0 was the “document web” where most links were essentially static. Naturally, what followed was an emerging desire to actively link resources in a way we could consider to be a more “dynamic web”. This more active type of linking opens the way for net-native applications and mashups we could call Web 2.0.

Regarding the term Semantic Web, I see it as a handler for something else again. We could just as easily call it Web 3.0, I guess, as some people do. What I see as the salient difference between the SemWeb and where we are today, however, is “context awareness”. Even in the dynamic linking we see around us today, what’s missing is connections being made due to inherent knowledge of and between the end points.

Returning to the thread with @MarkHawker, I see a major problem with the adoption of the SemWeb “technology stack” (eg. ontologies, RDF, SPARQL, etc.). Specifically, it’s that they’re currently a tough nut to roll on top of existing systems. That being said, I see nothing wrong with easing into them where appropriate to slowly begin to build traction.

In fact, if folks are using any SemWeb tech, I’m happy to hear them crowing about it. For example, if someone’s doing nothing more than using a triple store model for their data so they can move it around with RDF, I give them a SemWeb bonus point. Each step (no matter how trivial) we collectively make toward our end points being able to effectively communicate gets us that much closer to the goal.

Consider a company going to market saying they’re “Fully Semantic Web Enabled” and all they’ve done is add RDFa into their markup. If the market responds favorably to them, more cash will emerge to support further advancement across the board.

In the end, I’m much more interested in success stories around any of “the stack”, not waiting until someone implements “the full stack”. The fully-realized SemWeb is going to grow organically, and I doubt we’ll see a clear line dividing it from it’s predecessors.

  • Share/Bookmark

Portability with Linked Data

Linked Data Chart - 3/31/2008 (300px)There is a lot of focus in the DataPortability Project about making it easier to access user data. Another aspect to data portability, in general, is an analogous set of activities around enabling other data on the web to be more machine accessible. A few groups have been approaching this issue in various ways, many of whom work under the umbrella of the Semantic Web community. One subset of people focusing their efforts on this are taking what they call the Linked Data approach.

At a recent Cambridge SemWeb Gathering at MIT, Kingsley Idehen, CEO of OpenLink Software and a founder of the DBPedia project, had a great term for where he sees himself within the greater context of people working on these issues:

I like to say that I belong to the Semantic Web Community, but I’m a member of the Linked Data Tribe.

I found this concept of a tiered relationship and allegiance illuminating. Talking about it with him, he makes a distinction between the community as a whole and the fact that he focuses on a specific set of actionable efforts. It has been this sense of “what can be done right now” that has helped build upon what others are doing to move toward the goals of the community as a whole.

For example, I recently discussed how microformat markup could benefit the Semantic Web with Danny Ayers, an RDF/SemWeb guru working for Talis. Similarly, Ivan Herman gave a talk at the gathering about how to leverage RDFa within the context of an existing XHTML web page. Both examples are stepping stones in the direction of truly portable data on the web, and something that Kingsley considers the “data substrate” upon which Linked Data representations can be built.

To that end, I’m on a mini crusade to encourage developers to take the extra few minutes required to consider how their display layers can expose their content with effective markup. Rather than everyone having to learn OWL, RDF, and SPARQL before any progress can be made, there are some simple steps that will catalyze further steps. It’s really not that hard, and even if you’re not a developer you can mark up your own blogs and pages with microformats to provide search engines with much-needed context to describe your content.

To learn more:

  1. Linked Data Links
  2. Microformats Overview
  3. RDFa Primer

NOTE: I’m purposefully not diving too deep here into the real “meat” of Linked Data. Instead, I hope you’ll spend a couple clicks checking out the simplicity of what can be done to help build the “data substrate”.

  • Share/Bookmark

Value Struggle: Data, API, or Presentation Layer

At a recent Semantic Web gathering at MIT, Maximilian Schich gave a talk about “Adding Art Research Data to the Giant Global Graph”. It sparked an interesting discussion regarding the valuable assets within some Linked Data systems and how to monetize them. It is a topic of interest to me as I believe a key adoption hurdle for Linked Data and Semantic Web technologies is in clarifying the value proposition to all parties involved (eg. producers, distributors, and consumers).

Schich talked about his conundrum of what to do with an incredibly rich data set he has access to describing historical art and archeology information. He presented how the data is already well defined within their data model and retrieval scheme. He also outlined how they propose bridging the gap to embrace the current and near-future Linked Data standards. The question, then, was how they would be able to pay for all this work.

There was a lot of discussion around possibly licensing access to the data via SPARQL, but the mechanisms for metering don’t exist, yet. Kingsley Idehen and some others discussed possibilities in adding support at the server layer for a formalized data response similar to HTTP 402 (ie. “payment required”). It was clear, though, that work would need to be done in this area before adopting it as a reasonable direction.

At some point during the conversation, I asked Schich what he believes is his more valuable asset: the data itself, or the presentation layer. After having seen the GUI of the demo application showing how the user could retrieve the data, it was clear that a lot of domain expertise was required to design it. Further, when he showed us the data schema and example retrieval/traversal modes, it was even more obvious that the average researcher would have to interface via the GUI (even if the data is freely available and fully compliant with SemWeb standards).

With this in mind, my suggestion was that he consider opening up the data entirely, forgoing any programmatic metering, and possibly license commercial access to it (allowing for free non-commercial use). My proposal was that they focus on monetizing killer GUI products tuned for each of their specific user groups. In this way, they could service both their institutional and individual users as appropriate.

Fortunately, Tim Berners-Lee jumped in and agreed. He clearly articulated (undoubtedly better than I could) the benefit in separating the data source from the presentation layer. Each, then could be treated separately in context of it’s use and license model.

Thinking about it later, I was reminded of the scene in Neal Stephenson’s Cryptonomicon when [spoiler alert] the characters find a pile of gold in the middle of the jungle on a remote Pacific island. It had been left there by the retreating Japanese army in World War II, and they were trying to figure out how to retrieve it through the unfriendly territory. Basically, what appears incredibly valuable at first glance (ie. a pile of gold = a mountain of rich data), is nearly worthless without a way to get it out.

I hope, however, that Schich is able to find a better solution than was presented in Cryptonomicon.

BTW – Since I mentioned them already, it’s worth noting that both Kingsley and Tim are giving keynotes at the Linked Data Planet conference in New York on June 17th. Also, you can listen to an interview I did with Kingsley a couple weeks ago as part of the DataPortability: In-Motion Podcast.

  • Share/Bookmark

Data Portability and Consumer Value

Nitin Borwankar put forth a compelling commentary as it relates to Data Portability vs. a deeper Terms of Service (TOS) discussion on behalf of the consumer:

The real problem – The Elephant In the Room – is whether web app vendors “play fair” with my data when it is IN the web app, not whether they “allow” me to take my data and go play elsewhere. There are two major choices for a web app user here, just as for a dissenter in a social structure – “voice” and “exit”. Data Portability focuses only on “exit” and is not just incomplete but massively disempowering to the user of the web app.

He then called out four points he sees as the consumer’s “voice” within a given service:

  • Data Accessability (DA)
  • Data Visibility (DV)
  • Data Removal (DR)
  • Data Ownership (DO)

You’ll probably want to read through his entire post for the full meat (there’s much there), but he sums up with:

In summary, incorporating Data Property Rights into the current conversation completes the picture by adding the web app user’s “voice”. This empowers web apps users and it also seeds new viable business models. For-fee services providing strong user rights without a coercive advertising model will emerge and form a new “data infrastructure” layer of the Internet Operating System – it’s a need that is crying out to be fulfilled. If the dominant players do not want to satisfy this need then market forces amplified by user emotion will disrupt them and we will see once again how the net routes around damage – in this case badly damaged Data Property Rights.

I agree with much of what Nitin is saying here. I see the DataPortability Project story as being a strong part of this picture he’s painting. I understand there are a lot of nuances here between “Data Portability” and his four points, and time will tell what consumers latch onto and how the ball is moved forward.

I believe the world’s moving quickly to a point where content units will be quantized to the degree where they will easily flow between distribution/syndication channels. Perhaps it’ll be driven by something like what people are calling the Semantic Web, basically allowing content units to be self-describing so they can be assembled by consumers and their agents (eg. sites, applications, feeds, etc.).

The value in the relationship with a customer, then, is centered around servicing them. Regardless of the content they’re seeking, companies will want to develop a solid relationship with their consumers. In this model, the long term value to the consumer could be a function of (DA,DV,DR,DO,DP). The trick will be in determining the weighted relationships between each parameter (per each consumer/provider pair).

FWIW – My bet is that there won’t be a one-size-fits-all equation, but rather a range of acceptable values based on context.

  • Share/Bookmark

Semantic Servant

This may not be a totally revolutionary idea, but it’s something I’d love to see implemented. The end state of the proposed application would be to deploy what I call a “Semantic Servant” that provide guidance for searching and indexing. I’m terming it a “servant” rather than a “server” for the basic reason that I see it as a “helper tool” to existing servers rather than serving up content itself.

Without getting into it too deeply, the concept is that the Semantic Servant (via a new “Semantic Servant Index Protocol”) would reply on a specified port to provide a machine readable summary of the content available from another server. For example, if a web site is available at “http://www.contentsite.com”, the servant would reply on the same URL via something like “ssip://www.contentsite.com”. The results would be an XML packet including rules for leveraging the content on the sister site.

Keep in mind that this is a totally half-baked idea. My goal in this concept would be to empower a website developer with a tool that would, with a few minor configuration clicks, tell spiders/bots/indexers/etc. more about the associated site. In order for this to work, the servant application would have to be incredibly light weight and easy to use out-of-the-box. Assuming the servant defaults to a standard OWL, RDF, etc. standard configuration, the administrator could select from some pre-canned configurations and let it go.

The more time the administrator spends customizing the configuration, of course, the more fine-tuned it could be to the content of the specific site. In this way, though, indexers visiting the site would (a) have more information about the content of the site than is currently (easily) available, and (b) changes to the site would be more forgiving.

This is, of course, assuming that producers of web content want their information to be aggregated more freely. If a site producer wants to force all of it’s users to it’s front gate, this isn’t the solution for them. As I think we’re moving to an “All Content Everywhere” model, though, whereby there are multiple ways to experience the same content, I see something like this as an eventual must-have.

… then again, I’m a dreamer.

  • Share/Bookmark