|
|
Chris Saad recently posted a succinct clarification of the following questions related to some business issues around data portability:
- Why would a vendor allow users to leave their service?
- Why make it easy for users to take the precious data you have about them and use it on other sites?
- What is the business justification for letting data walk out the door?
He’s got some helpful diagrams that illustrate his point, so I suggest reading his post on “The mythical value of data lockin“. In short, though, it’s this paragraph that seems to sum it up:
Even if you are Google, and you know every search your users do, every document they write, every chat they have – you still don’t know their facebook social graph. You don’t know their tweet stream. You don’t know the books they bought on Amazon.
I wish I could remember where I first heard this quote to attribute the source, but it works as the bumper sticker (or Twitter) version of the same sentiment:
No matter how large a website is, the internet is bigger.
Basically, sites will ultimately learn much more about their users=customers when they plug into the sharing network than they’ll be giving up. Here at matchmine, of course, we’re all about enabling sites to access user interests and tastes (under the control of the user), so we bounce into these questions (and provide the same answers) on a daily basis.
To this point, we’re walking the data portability walk ourselves. We’re not only consuming feeds from various sources, but are also a couple days away from streaming data back out, too. It’s all part of our Openness Roadmap I hope to start talking up in the coming weeks.
In the end, all of us (e.g. users, service providers, destination sites, publishers, etc.) win when we aren’t wasting time constantly reinventing wheels (or filling out yet another form). Instead, we can use that time to focus on the unique values we bring to the collective table.
This is just a quick, off the top of my head post about the similarities and differences I see between the terms “specifications” and “standards” and our approach to supporting them. It’s by no means an exhaustive treatise on the subject, but just common sense thoughts on a working framework about their usage.
When talking with people, it seems as if the two terms are often used interchangeably. In non-technical discussions this is probably a reasonable enough conflation. After all, most folks don’t need to make a distinction between the two. I believe, however, that it’s important to draw a line around each concept to illuminate some of the subtleties of meaning in some contexts.
So, in my style of rough-n-ready definitions to clarify discourse:
- Specification: A formalized definition for the way something should operate.
- Standard: A common way in which something operates.
You can look up more complete definitions in your favorite dictionary, but these seem to be reasonable enough to encapsulate the salient points. Barring any niggling nuances, you can see how the two have a significant overlap in utility. More interesting, and I think less understood, is how the two differ.
In a nutshell, there are a lot of specifications floating around that aren’t being used widely enough to be considered a standard (e.g. most APIs are well-specified, but not a standard beyond the walls of a single system). In general, any newly proposed specification toward common interoperability can’t really be termed a standard until it’s been widely adopted. On the flip-side, just because a technique for interoperability is pervasive enough to have become standardized doesn’t mean it’s backed by a specification (e.g. CSV is a common standard output format but there’s no agreed-upon specification governing its format).
Why does this matter? Well, for those of us working on issues surrounding interoperability between disparate systems need to make the distinction when evaluating various techniques. In some cases it’s likely we’ll opt to focus on an emerging standard, even if the specification hasn’t solidified, yet. In other cases, we’d have to sit on the sidelines before implementing a promising specification until it is widely adopted.
At the end of the day, the name of the game for us is connecting things together. A great specification without proven utility doesn’t really help reach that goal. An emerging standard method for connecting without an associated formalized definition can easily lead followers astray if the usage shifts. Assuming a higher degree of comfort with formalized approaches, there’s the question of how to encourage wider adoption of a new specification so that it has a chance of becoming a standard.
While I’d like to think we live in a logical world where solutions are selected purely on their merits, this isn’t generally the case. Instead, in this chicken/egg game, it’s a matter of being plugged into the discussions and betting on what can be seen within a reasonable adoption horizon.
For us, this means flexibility. Flexibility, in turn, means being resigned to a lack of assured clarity. We’re still in the early days of defining what it means to connect users with their data (in our case their interests and tastes in media). Fortunately, we’ve built a nifty set of tools that are flexible in supporting both nascent specifications as well as adopted standards without a lot of retooling.
These tools are rapidly coming online, and I look forward to seeing them put to the real world test supporting the user’s desire to remain in control over their interconnected data across various systems.
WARNING: If you don’t like reading about the natural order of things in which cats chase mice, I strongly urge you to stop reading now.
For those of you who follow me on Twitter or subscribe to my Flickr stream, you’re probably already up to speed with bits and pieces of the Great Mouse War currently underway at House Adams. It started with skittering in the attic (enemy incursion), continued with a few mouse sightings (field scouts), and came to center stage as we found a few of the casualties left by the cats (our sentries).
Well, what began as a few individual skirmishes turned into an full-fledged battle last night. While kicking back on the couch, we heard a familiar ruckus upstairs. This is generally attributed to the cats chasing each other around, so we paid it little mind. Then, however, we see Ebony trotting down the stairs with one of the enemy mice clamped in her jaws, its hind legs and tail dangling. As we’ve been battling these often unseen critters for a while, it was good to see that the cats are on duty.
When trying to extract the mouse from its captor, however, the battle really began. As it wasn’t moving, I’d assumed the cat had already ended this particular mouse’s career as a soldier. She dropped the mouse to the floor and I reached to pick up it’s still form, when it sprang into action. Spinning it’s little legs like something you’d see in a cartoon, trying to get traction on the hardwood floor, the mouse took off.
Needless to say, it surprised everyone. This is when all hell broke loose, converting the family room, dining room, and den into a battlefield. Both cats darted here and there trying to recapture their prey. It was actually quite amazing to see them apparently working in concert as Ebony would dart left, forcing the mouse right into the path of Noir. Once he had it now in his mouth, however, it was difficult to figure out a way to extract it. He’d drop it, only to have the mouse repeat it’s attempted escape.
During the literal cat and mouse game going on, the human contingent were tossing chairs, books, and toys aside. The goal was to eliminate potential nooks and crannies into which the enemy could retreat. After much shouting back and forth indicating the mouse sightings, the cats closed in using a classic pincer maneuver to trap the mouse in a corner. It only remained, then, for me to reach down and capture the infiltrator.
I’ll spare the details from there, but needless to say… it’s time to (again) call in the mercenaries. Don’t get me wrong, I’m as much an animal-lover as the next guy… we’ve even got a perfectly nice garage / barn for critters to peacefully inhabit. In fact, it’s a veritable zoo out there as we’ve found indications of habitation by possums, skunks, and raccoons in addition to various smaller rodents of the mouse, chipmunk, and squirrel variety.
That being said, we have to draw the line somewhere. And since I come from the Southwest where field mice spread the hantavirus, I’m not a fan of living too closely with them. So, it’s time to up the ante and call in the big guns.
In the meantime, I’m planning a medal ceremony for Ebony and Noir, the nocturnal sentries who have been protecting our borders. Well done, and keep up the good fight!
Over the weekend I was chatting with some folks totally outside the web tech gravity well. While explaining more about what we do at matchmine, it was interesting to see how they approached the concept of media recommendations. Initially, they didn’t see much differentiation between various techniques, but through some examples started to see some of the nuances.
Here are some example recommendation sets that helped get them to mental escape velocity:
| Set One |
Set Two |
Set Three |
|
|
|
|
In case it’s not obvious, the sets were spawned from the film The Sixth Sense, each exploring various aspects of it. For example, the first set dives deeper into the “supernatural thriller” subgenre. The films in the second set are examples of “turnaround” films (i.e. surprise endings). And the third set spreads the gamut, each touching a “key element” related to The Sixth Sense (e.g. same director or cast, similar plot structure, same genre, etc.).
In the end, sets like this help illustrate the variety of similarity types available. The set that makes sense within a specific context, and is most valuable to a particular user, is what we are doing at matchmine. While this was only a Gedanken exercise, it was useful when explaining aspects of our value proposition.
There is a lot of focus in the DataPortability Project about making it easier to access user data. Another aspect to data portability, in general, is an analogous set of activities around enabling other data on the web to be more machine accessible. A few groups have been approaching this issue in various ways, many of whom work under the umbrella of the Semantic Web community. One subset of people focusing their efforts on this are taking what they call the Linked Data approach.
At a recent Cambridge SemWeb Gathering at MIT, Kingsley Idehen, CEO of OpenLink Software and a founder of the DBPedia project, had a great term for where he sees himself within the greater context of people working on these issues:
I like to say that I belong to the Semantic Web Community, but I’m a member of the Linked Data Tribe.
I found this concept of a tiered relationship and allegiance illuminating. Talking about it with him, he makes a distinction between the community as a whole and the fact that he focuses on a specific set of actionable efforts. It has been this sense of “what can be done right now” that has helped build upon what others are doing to move toward the goals of the community as a whole.
For example, I recently discussed how microformat markup could benefit the Semantic Web with Danny Ayers, an RDF/SemWeb guru working for Talis. Similarly, Ivan Herman gave a talk at the gathering about how to leverage RDFa within the context of an existing XHTML web page. Both examples are stepping stones in the direction of truly portable data on the web, and something that Kingsley considers the “data substrate” upon which Linked Data representations can be built.
To that end, I’m on a mini crusade to encourage developers to take the extra few minutes required to consider how their display layers can expose their content with effective markup. Rather than everyone having to learn OWL, RDF, and SPARQL before any progress can be made, there are some simple steps that will catalyze further steps. It’s really not that hard, and even if you’re not a developer you can mark up your own blogs and pages with microformats to provide search engines with much-needed context to describe your content.
To learn more:
- Linked Data Links
- Microformats Overview
- RDFa Primer
NOTE: I’m purposefully not diving too deep here into the real “meat” of Linked Data. Instead, I hope you’ll spend a couple clicks checking out the simplicity of what can be done to help build the “data substrate”.
This morning (Monday, April 21, 2008) at 10:55am EDT, we’re in the back yard getting the garden ready for the season when my phone rings. After answering it, I’m greeted with heavy breathing, and a question about where I am. It takes a beat for me to recognize the breathy voice, but as soon as it clicks I realize it’s my friend Sunita Williams… and I guessed the reason for her being out of breath is that she’s running the Boston Marathon (again).
For those of you who followed her exploits, you’ll remember it was exactly one year ago that she was orbiting 500 miles above the planet in the ISS when she last ran this particular race. So, it’s quite an accomplishment to have her feet back on the ground and running another 26 miles (this time with the pull of 1G rather than the simulated effects of the straps holding her to the treadmill).
Anyway, she tells me she and her sister have just crossed into Framingham and wondered if we were along the route somewhere. I told her we hadn’t planned on it, but that I’d bundle up the family and we’d intercept her somewhere to cheer them on. I shout to the kids to drop the shovels and that we’re going on a mission, to which my daughter starts singing the theme to her favorite show:
We’re going on a trip in our favorite rocket ship, flying through the sky, Little Einsteins.
Climb aboard and get ready to explore, there’s so much to find, Little Einsteins.
We’re going on a mission, start the count down: FIVE-FOUR-THREE-TWO-ONE
Mission Time: 11:05am EDT: Fortunately, my wife’s a marathon runner and could roughly gauge their speed. Assuming Suni called when she first entered Framingham, she calculated where she’d be in about 30 minutes.
Mission Time: 11:20am EDT: We find a parking spot near Natick center, and deploy the troops. Since I used to live with my brother in the area, I had a good sense for where we could park close to the route.
Mission Time: 11:30am EDT: We arrive at the corner of Routes 135 and 27 where I take and upload a snapshot to Twitxr – Twitter – Flickr.
Mission Time: 11:35am EDT: Success! Dina and Suni run by as the Adams Clan hoot as loud as we can, hoping to be heard over the noise of the crowd, cheering them on.
Mission Time: 11:45am EDT: Return trip to resume the previous gardening activity.
I’m not entirely sure everyone else will find this as amusing as I did… but I find it hilarious that (a) Suni called while running, (b) she thought we’d be able to get somewhere to see her run by, (c) we were able to calculate the intercept trajectory, and (d) be accurate within 10 minutes.
Then again… Suni’s used to much greater odds in a successful rendezvous.
At a recent Semantic Web gathering at MIT, Maximilian Schich gave a talk about “Adding Art Research Data to the Giant Global Graph”. It sparked an interesting discussion regarding the valuable assets within some Linked Data systems and how to monetize them. It is a topic of interest to me as I believe a key adoption hurdle for Linked Data and Semantic Web technologies is in clarifying the value proposition to all parties involved (eg. producers, distributors, and consumers).
Schich talked about his conundrum of what to do with an incredibly rich data set he has access to describing historical art and archeology information. He presented how the data is already well defined within their data model and retrieval scheme. He also outlined how they propose bridging the gap to embrace the current and near-future Linked Data standards. The question, then, was how they would be able to pay for all this work.
There was a lot of discussion around possibly licensing access to the data via SPARQL, but the mechanisms for metering don’t exist, yet. Kingsley Idehen and some others discussed possibilities in adding support at the server layer for a formalized data response similar to HTTP 402 (ie. “payment required”). It was clear, though, that work would need to be done in this area before adopting it as a reasonable direction.
At some point during the conversation, I asked Schich what he believes is his more valuable asset: the data itself, or the presentation layer. After having seen the GUI of the demo application showing how the user could retrieve the data, it was clear that a lot of domain expertise was required to design it. Further, when he showed us the data schema and example retrieval/traversal modes, it was even more obvious that the average researcher would have to interface via the GUI (even if the data is freely available and fully compliant with SemWeb standards).
With this in mind, my suggestion was that he consider opening up the data entirely, forgoing any programmatic metering, and possibly license commercial access to it (allowing for free non-commercial use). My proposal was that they focus on monetizing killer GUI products tuned for each of their specific user groups. In this way, they could service both their institutional and individual users as appropriate.
Fortunately, Tim Berners-Lee jumped in and agreed. He clearly articulated (undoubtedly better than I could) the benefit in separating the data source from the presentation layer. Each, then could be treated separately in context of it’s use and license model.
Thinking about it later, I was reminded of the scene in Neal Stephenson’s Cryptonomicon when [spoiler alert] the characters find a pile of gold in the middle of the jungle on a remote Pacific island. It had been left there by the retreating Japanese army in World War II, and they were trying to figure out how to retrieve it through the unfriendly territory. Basically, what appears incredibly valuable at first glance (ie. a pile of gold = a mountain of rich data), is nearly worthless without a way to get it out.
I hope, however, that Schich is able to find a better solution than was presented in Cryptonomicon.
BTW – Since I mentioned them already, it’s worth noting that both Kingsley and Tim are giving keynotes at the Linked Data Planet conference in New York on June 17th. Also, you can listen to an interview I did with Kingsley a couple weeks ago as part of the DataPortability: In-Motion Podcast.
If you find yourself walking along Newark Avenue and First Street in Jersey City, NJ, look around. In one of the barren fenced corners you’ll see an interesting installation of guerrilla art. They are glass spheres made by wiring the necks of discarded bottles together in series, each about 2 to 3 feet in diameter.
OK, that’s intriguing enough… but I got some of the back story when talking to Peter Wasinger, the artist responsible. He apparently got the idea for the installation as he walks past the space to and from work each day. Using bottles from a local bar, he then wired the spheres together in his studio and installed them at night (well, actually 4am).
During the installation, he said that passersby made some interesting comments. A group of men walked by and said, “You go Pappi! This is SO New York”. In response, Wasinger replied, “It is now Jersey City too.” They then said “Keep it going, Pappi. You rock!”
You can check out more photos of the installation in my Flickr stream. He’s already working on other installations, and trying to figure out how to light them up. While that’s in progress, you can check out some of his other artwork available on his Cafe Press site.
Keep it going, Pappi.
UPDATE: 2/15/2008 – I just heard from Peter Wasinger (the artist) that the installation was removed by the Jersey City Parking Authority. Bummer setback.
Nitin Borwankar put forth a compelling commentary as it relates to Data Portability vs. a deeper Terms of Service (TOS) discussion on behalf of the consumer:
The real problem – The Elephant In the Room – is whether web app vendors “play fair” with my data when it is IN the web app, not whether they “allow” me to take my data and go play elsewhere. There are two major choices for a web app user here, just as for a dissenter in a social structure – “voice” and “exit”. Data Portability focuses only on “exit” and is not just incomplete but massively disempowering to the user of the web app.
He then called out four points he sees as the consumer’s “voice” within a given service:
- Data Accessability (DA)
- Data Visibility (DV)
- Data Removal (DR)
- Data Ownership (DO)
You’ll probably want to read through his entire post for the full meat (there’s much there), but he sums up with:
In summary, incorporating Data Property Rights into the current conversation completes the picture by adding the web app user’s “voice”. This empowers web apps users and it also seeds new viable business models. For-fee services providing strong user rights without a coercive advertising model will emerge and form a new “data infrastructure” layer of the Internet Operating System – it’s a need that is crying out to be fulfilled. If the dominant players do not want to satisfy this need then market forces amplified by user emotion will disrupt them and we will see once again how the net routes around damage – in this case badly damaged Data Property Rights.
I agree with much of what Nitin is saying here. I see the DataPortability Project story as being a strong part of this picture he’s painting. I understand there are a lot of nuances here between “Data Portability” and his four points, and time will tell what consumers latch onto and how the ball is moved forward.
I believe the world’s moving quickly to a point where content units will be quantized to the degree where they will easily flow between distribution/syndication channels. Perhaps it’ll be driven by something like what people are calling the Semantic Web, basically allowing content units to be self-describing so they can be assembled by consumers and their agents (eg. sites, applications, feeds, etc.).
The value in the relationship with a customer, then, is centered around servicing them. Regardless of the content they’re seeking, companies will want to develop a solid relationship with their consumers. In this model, the long term value to the consumer could be a function of (DA,DV,DR,DO,DP). The trick will be in determining the weighted relationships between each parameter (per each consumer/provider pair).
FWIW – My bet is that there won’t be a one-size-fits-all equation, but rather a range of acceptable values based on context.
I just finished reading a book I really enjoyed, and have been recommending it to various folks. This got me thinking about the elements that prompt me to recommend it, considering how they generalize across other narrative fiction (as opposed to biographies, histories, etc.) I’d recommend.
The book in question is “Jonathan Strange & Mr. Norrell” by Susanna Clarke. It had a good run in hardback, but for some reason or another it’s paperback distribution didn’t fair as well. Some of the trouble might have come from it being bit long (around 845 pages), and that it’s moderately hard to classify at first blush. In short, it’s a revisionist history of magic in 19th century England… but that’s not necessarily why I’d recommend it.
I came up with the following criterion I tend to use when considering whether or not to suggest a book as a good fit:
- Plot – The basic genre and story arc, along with how well constructed it is over the course of the book.
- Universe of Discourse – The thoroughness and consistency with which the world is created by the author in relation to the plot and character interactions.
- Character Development – How stock or fully developed the characters are who inhabit the universe, and their believability acting as defined.
- Originality – How much of the overall reading experience brings something new to the table, including all elements from writing style to plot through presentation.
- Writing Style – How the author’s chosen voice for the particular story, as well as the pacing and sentence structure, speak to and support the story.
- Resolution – The effectiveness of the interplay between the climax, denouement, and conclusion.
So, in the case of “Jonathan Strange & Mr. Norrell“, I’d say it’s got an incredibly original plot, with a well-defined universe of discourse, and very believable characters. Further, the writing style and presentation were enjoyably original (as an homage to 19th century contemporary monographs). Finally, I felt perfectly satisfied with the resolution after investing 845 pages in the story (which, sadly, I can’t say for “Cryptonomicon”, one of my favorite, and most recommended books). Thus, in my mind, I’d put this in the “Highly Recommended” category.
There are only a few books like this I’d recommend to everyone I know. I generally try to match the recommendation to people I think would appreciate it. The trick, of course, is to identify the right match…
Other Books I Recommend:
|
|
Freeing Locked-up User Data
Chris Saad recently posted a succinct clarification of the following questions related to some business issues around data portability:
He’s got some helpful diagrams that illustrate his point, so I suggest reading his post on “The mythical value of data lockin“. In short, though, it’s this paragraph that seems to sum it up:
I wish I could remember where I first heard this quote to attribute the source, but it works as the bumper sticker (or Twitter) version of the same sentiment:
Basically, sites will ultimately learn much more about their users=customers when they plug into the sharing network than they’ll be giving up. Here at matchmine, of course, we’re all about enabling sites to access user interests and tastes (under the control of the user), so we bounce into these questions (and provide the same answers) on a daily basis.
To this point, we’re walking the data portability walk ourselves. We’re not only consuming feeds from various sources, but are also a couple days away from streaming data back out, too. It’s all part of our Openness Roadmap I hope to start talking up in the coming weeks.
In the end, all of us (e.g. users, service providers, destination sites, publishers, etc.) win when we aren’t wasting time constantly reinventing wheels (or filling out yet another form). Instead, we can use that time to focus on the unique values we bring to the collective table.