May 14, 2008 by Trent Adams
There is a lot of focus in the DataPortability Project about making it easier to access user data. Another aspect to data portability, in general, is an analogous set of activities around enabling other data on the web to be more machine accessible. A few groups have been approaching this issue in various ways, many of whom work under the umbrella of the Semantic Web community. One subset of people focusing their efforts on this are taking what they call the Linked Data approach.
At a recent Cambridge SemWeb Gathering at MIT, Kingsley Idehen, CEO of OpenLink Software and a founder of the DBPedia project, had a great term for where he sees himself within the greater context of people working on these issues:
I like to say that I belong to the Semantic Web Community, but I’m a member of the Linked Data Tribe.
I found this concept of a tiered relationship and allegiance illuminating. Talking about it with him, he makes a distinction between the community as a whole and the fact that he focuses on a specific set of actionable efforts. It has been this sense of “what can be done right now” that has helped build upon what others are doing to move toward the goals of the community as a whole.
For example, I recently discussed how microformat markup could benefit the Semantic Web with Danny Ayers, an RDF/SemWeb guru working for Talis. Similarly, Ivan Herman gave a talk at the gathering about how to leverage RDFa within the context of an existing XHTML web page. Both examples are stepping stones in the direction of truly portable data on the web, and something that Kingsley considers the “data substrate” upon which Linked Data representations can be built.
To that end, I’m on a mini crusade to encourage developers to take the extra few minutes required to consider how their display layers can expose their content with effective markup. Rather than everyone having to learn OWL, RDF, and SPARQL before any progress can be made, there are some simple steps that will catalyze further steps. It’s really not that hard, and even if you’re not a developer you can mark up your own blogs and pages with microformats to provide search engines with much-needed context to describe your content.
To learn more:
- Linked Data Links
- Microformats Overview
- RDFa Primer
NOTE: I’m purposefully not diving too deep here into the real “meat” of Linked Data. Instead, I hope you’ll spend a couple clicks checking out the simplicity of what can be done to help build the “data substrate”.
Tags: data portability, dataportability, microformats, ontologies, rdf | 1 Comment »
March 20, 2007 by Trent Adams
I’ve struck up a conversation with Yihong Ding about taking small steps toward a more effective Semantic Web and an idea popped into my head that seemed worth jotting down. In order to get folks moving in the right direction perhaps something small like a reader of microformats… we could call it a “MicroReader” … might be interesting. It could be something that sits in your browser (eg. a Firefox add-on) and does nothing more than display a list of detected microformat tags in the page being read.
While not a lot of folks are leveraging microformats (from what I can tell), it might be a way to increase awareness of their utility. It’s possible, for example, that it’d be a poor man’s way to encapsulate some of the salient points the author wanted to convey. We all know people don’t read every word of an article, but rather glance at bullet points, captions, charts, etc. This might be just the ticket to easily display the “bullet points” of a post.
For example, if the MicroReader detected a known format (eg. “hreview“) it could automatically generate a summary panel with the information. With only a few sites actively leveraging the format now, it’d be of little value, but if Yahoo Tech is using them, could mom-n-pop be around the corner?
I’m not really sure if that gets us closer to a semantifying the web, but it’d be kinda’ neat.
Tags: discovery, microformats, ontologies | 2 Comments »
March 20, 2007 by Trent Adams
I realize I’m showing up (fashionably) late to the semantic web party, but the timing seems to feel ripe. As I mentioned in an earlier post about what I call a “Semantic Servant“, I’ve been thinking a lot about how to (easily) cross-connect online systems. Despite the zealot debates between the Web 2.0 / 3.0 / Semantic Web crowds, there’s a lot to be gained from cooperative growth.
For example, I found this post about “Pinging the Semantic Web” by Harry Chen. In it he mentions there’s a lot to be learned from the blog pinging services:
As the Semantic Web grows, we also need similar services. Ping.SemanticWeb.Org is an experimental service for notifying search engines (or semantic web bots) about changes made in semantic web documents. The present service accepts pings from semantic web documents that describe SIOC, FOAF and DOAP.
He goes on to give some rationale behind his belief in this type of system. My personal favorite is his second point:
Second, a wide adoption of ping services can help to speed up the convergence of standard ontologies. In the blogosphere, we have seen the convergence of few RSS standards, which I believe is due to the wide adoption of ping services, as well as RSS readers and blog publishing software. If Semantic Web ping services are widely used, I believe it’s only nature for SWD publishers to adopt few standard ontologies that are supported by the ping services, and not to create the owner ontologies.
As much as I hate to admit it, the semi-formalization of RSS did for online content sharing what HTML did for Internet content publishing in general. What I mean by that is sometimes it takes an example of technology deployed in a useful context to propel it into mainstream adoption. There’s no reason why we need RSS to share content (we could simply use straight XML, or even straight HTML), but it certainly makes it easier — especially if everyone adopts it.
Now, all we need to do is come up with “an example technology deployed in a useful context.” Piece of cake.
Tags: ontologies, search engines, semantic search, web 2, web 3 | Comment »
March 19, 2007 by Trent Adams
This may not be a totally revolutionary idea, but it’s something I’d love to see implemented. The end state of the proposed application would be to deploy what I call a “Semantic Servant” that provide guidance for searching and indexing. I’m terming it a “servant” rather than a “server” for the basic reason that I see it as a “helper tool” to existing servers rather than serving up content itself.
Without getting into it too deeply, the concept is that the Semantic Servant (via a new “Semantic Servant Index Protocol”) would reply on a specified port to provide a machine readable summary of the content available from another server. For example, if a web site is available at “http://www.contentsite.com”, the servant would reply on the same URL via something like “ssip://www.contentsite.com”. The results would be an XML packet including rules for leveraging the content on the sister site.
Keep in mind that this is a totally half-baked idea. My goal in this concept would be to empower a website developer with a tool that would, with a few minor configuration clicks, tell spiders/bots/indexers/etc. more about the associated site. In order for this to work, the servant application would have to be incredibly light weight and easy to use out-of-the-box. Assuming the servant defaults to a standard OWL, RDF, etc. standard configuration, the administrator could select from some pre-canned configurations and let it go.
The more time the administrator spends customizing the configuration, of course, the more fine-tuned it could be to the content of the specific site. In this way, though, indexers visiting the site would (a) have more information about the content of the site than is currently (easily) available, and (b) changes to the site would be more forgiving.
This is, of course, assuming that producers of web content want their information to be aggregated more freely. If a site producer wants to force all of it’s users to it’s front gate, this isn’t the solution for them. As I think we’re moving to an “All Content Everywhere” model, though, whereby there are multiple ways to experience the same content, I see something like this as an eventual must-have.
… then again, I’m a dreamer.
Tags: discovery, ontologies, search engines, syndication, web 3 | 1 Comment »
March 12, 2007 by Trent Adams
If we’re all moving toward a more connected set of tools for communication with hopes of a better Web 3.0, how’re we gonna’ get there? Getting everyone to agree on a single standard seems like a pipedream, but what can we do in the meantime? From what I can tell, it seems relatively easy to chat up the concept of Microformats.
I bumped into this post from Tom Johnson which seemed to sum it up well:
The idea of microformats and the semantic web sound cool. And I’m looking forward to the day when microformats are widely adopted. But if microformats are so useful, why hasn’t Google come out with a microformats search yet? Why aren’t microformats being baked into the core structure of WordPress and other blogging platforms?
Not many people are using the structured blogging plugins, and those that do use it mainly to autoformat their posts. I even heard in a recent interview with Matt Mullenweg, the WordPress lead, that there are no current plans to develop structured blogging microformats into the WordPress code.
Oddly enough, Jason Kolb made a similar comment in a recent post:
The only technology that would really be necessary to make this work is to embed microformats in site text itself. I’m really not sure why this hasn’t taken off yet, it seems like a no-brainer to me. What I’m talking about, and I’ve actually posted some working examples of this before, is to surround chunks of text from a weblog post or text published to a public site with microformat markup so that it can be extracted as meaningful data.
It seems like a simple enough first step toward the semantic web thing. Like these two cats, I’m relatively surprised microformatting hasn’t been embraced, but I do believe the value chain still seems to be missing a couple links. There probably need to be a couple of successes (like a popular microformat tagging/retrieval tool) before the masses jump on board.
For my part in this digital village, I’m going to actively explore more microformatting opportunities. More if it develops.
Tags: microformats, natural language processing, nlp, ontologies, search engines, semantic search, web 3 | Comment »