I had no idea what the guys at the office were up to when they camped out in the conference room one evening with their video gear. I figured it was just one of those things you just shouldn’t ask any questions about, and quietly move on. Well, here’s what they were doing:
… and I have to admit I think their take on collaborative filtering is pretty funny.
In his Data Mining blog, Mathew Hurst posted a simple and elegant Gedanken exercise of traffic drivers to specific content from various sites. Based on the readership and interests of their audiences, he points out some clear ideas related to the differences between “popularity” and “authority”.
What I’m describing … is the difference between some notion of popularity (which may be called influence) and some other notion of authority (or expertise) and how these issues are related to both the blogger (blog) and the readers of that blog or feed. Measuring readership on topics is key to really modeling this stuff in social media which is why FeedBurner is such an asset to Google. It also captures why metrics for bloggers should capture notions of topic (something which BuzzLogic understands).
Clear and concise. Now the trick is to capitalize on this concept.
I’ve dabled with just about all the major collaborative bookmarking/tagging/link-sharing tools around. I put del.icio.us, Digg, Reddit, Technorati, etc. There are, of course, strengths and weaknesses of each one. The trick for me has been to integrate my use of them into my daily work stream.
Enter my new favorite class of tool: in-situ collaborative page annotation. I have no idea what others are calling them, but there ya’ go. I’m sure someone has come up with a clever Web 2.0 term, especially since mine is way too long.
Basically, these tools enable me to quickly and easily add my own notes to a page while reading it. The faster the tool works with only a few key clicks earns high marks in my book. Further, the more tracking / reporting / sharing functionality they have the better. For example, immediately after highlighting the pages I like to see the ability to immediately post my notes as a blog entry or otherise set it to track the page for changes.
The two tools that have floated to the top for me are Fleck and Diigo. My personal opinion having used both for a couple weeks now is that Diigo wins. While both have largely the same feature set, the RIA UI of Diigo really keeps me moving without waiting for page redraws. Further, they’re style guides associated with reading annotations (within their site, in a cross-post to a blog, or forwarded via email) are incredibly well designed.
The only downside I’ve found with Diigo so far is that it doesn’t automaticlly pick up the tags associated with the pages. It’d be a great feaure if they took a best guess based on the meta keywords in the header and let you modify them. Right now, though, it pre-populates the “tags” field with “no-tags” (and changing them requires a bit too many key clicks for my tastes).
As a side note, LibraryThing does a fantastic job of empowering the user to quickly and easily add tags to books in your library. Talk about great human factors (though minimal pretty pictures in their GUI)… in fact I’ll probably spend some time chatting that site up some time soon.
According to their press, ZoomInfo is taking the path toward semantic search by utilizing their patented technologies to pre-scrub crawled data. This approach, rather than relying on adding linguistic magic at query time, allows them the flexibility to massage the crawled data into searchable indexes. In this way, it then looks like the information is retrieved by a user’s more typical keyword searches.
When searching for themselves using their engine, they say:
ZoomInfo is the best destination for information about people and companies. Our product is a summarization search engine that finds, understands, extracts and summarizes information about people and companies on the Web.
ZoomInfo employs Artificial Intelligence Algorithms to analyze Website pages and to create a human like understanding of their content. With these algorithms, ZoomInfo analyzes the type of Website and the content of the Website based on how it’s constructed. ZoomInfo is able to deduce that a specific paragraph is a company description or that a specific address contains the location of a company’s headquarters to extract the most accurate and relevant information.
ZoomInfo’s semantic search engine continually crawls the Web and reads business information. Using proprietary Natural Language Extraction technology, ZoomInfo analyzes sentences to understand their meaning and to extract relevant information about companies, and people, such as the industry a company is in and its products or services, or the company a person works for and his/her job title.
A search by company for IBM turns up some basic information, and lists Ramon Demper as the company CEO and CTO. As far as I know Sam Palmisano is the IBM CEO and Demper left IBM in 1993. A search for ZDNet in both basic and powersearch (requires registration) and by company and people turned up outdated and grossly incorrect information. Similarly, a search on CNET turned up a lot of erroneous information.
And if you consider using it as a consumer to find a “security software” company:
Searching for security software companies in California with $50 million or less in revenue and fewer than 100 employees turned up Network Associates, which merged with McAfee in 2002, as the first entry.
I’m assuming they’re still working out the kinks in their system. The problem I see, though, is they appear to be relying too heavily on their smart software without a human in the loop. If they’re hoping to court the business community with subscription services, I’d think they’d need to significantly increase their accuracy rate.
While it’s currently hip to be wrong (and opening the doors to social networking style corrections), that doesn’t seem to be what their doing.