Feed on:
Posts
Comments

According to their press, ZoomInfo is taking the path toward semantic search by utilizing their patented technologies to pre-scrub crawled data. This approach, rather than relying on adding linguistic magic at query time, allows them the flexibility to massage the crawled data into searchable indexes. In this way, it then looks like the information is retrieved by a user’s more typical keyword searches.

When searching for themselves using their engine, they say:

ZoomInfo is the best destination for information about people and companies. Our product is a summarization search engine that finds, understands, extracts and summarizes information about people and companies on the Web.

And for a deeper dive, here’re a couple notes from their technology page (annotated):

ZoomInfo employs Artificial Intelligence Algorithms to analyze Website pages and to create a human like understanding of their content. With these algorithms, ZoomInfo analyzes the type of Website and the content of the Website based on how it’s constructed. ZoomInfo is able to deduce that a specific paragraph is a company description or that a specific address contains the location of a company’s headquarters to extract the most accurate and relevant information.

ZoomInfo’s semantic search engine continually crawls the Web and reads business information. Using proprietary Natural Language Extraction technology, ZoomInfo analyzes sentences to understand their meaning and to extract relevant information about companies, and people, such as the industry a company is in and its products or services, or the company a person works for and his/her job title.

That certainly sounds kewl. But what about the reality? Check out this recent ZDNet post (annotated):

A search by company for IBM turns up some basic information, and lists Ramon Demper as the company CEO and CTO. As far as I know Sam Palmisano is the IBM CEO and Demper left IBM in 1993. A search for ZDNet in both basic and powersearch (requires registration) and by company and people turned up outdated and grossly incorrect information. Similarly, a search on CNET turned up a lot of erroneous information.

And if you consider using it as a consumer to find a “security software” company:

Searching for security software companies in California with $50 million or less in revenue and fewer than 100 employees turned up Network Associates, which merged with McAfee in 2002, as the first entry.

I’m assuming they’re still working out the kinks in their system. The problem I see, though, is they appear to be relying too heavily on their smart software without a human in the loop. If they’re hoping to court the business community with subscription services, I’d think they’d need to significantly increase their accuracy rate.

While it’s currently hip to be wrong (and opening the doors to social networking style corrections), that doesn’t seem to be what their doing.

If we’re all moving toward a more connected set of tools for communication with hopes of a better Web 3.0, how’re we gonna’ get there? Getting everyone to agree on a single standard seems like a pipedream, but what can we do in the meantime? From what I can tell, it seems relatively easy to chat up the concept of .

I bumped into this post from Tom Johnson which seemed to sum it up well:

The idea of microformats and the semantic web sound cool. And I’m looking forward to the day when microformats are widely adopted. But if microformats are so useful, why hasn’t Google come out with a microformats search yet? Why aren’t microformats being baked into the core structure of WordPress and other blogging platforms?

Not many people are using the structured blogging plugins, and those that do use it mainly to autoformat their posts. I even heard in a recent interview with Matt Mullenweg, the WordPress lead, that there are no current plans to develop structured blogging microformats into the WordPress code.

Oddly enough, Jason Kolb made a similar comment in a recent post:

The only technology that would really be necessary to make this work is to embed microformats in site text itself. I’m really not sure why this hasn’t taken off yet, it seems like a no-brainer to me. What I’m talking about, and I’ve actually posted some working examples of this before, is to surround chunks of text from a weblog post or text published to a public site with microformat markup so that it can be extracted as meaningful data.

It seems like a simple enough first step toward the semantic web thing. Like these two cats, I’m relatively surprised microformatting hasn’t been embraced, but I do believe the value chain still seems to be missing a couple links. There probably need to be a couple of successes (like a popular microformat tagging/retrieval tool) before the masses jump on board.

For my part in this digital village, I’m going to actively explore more microformatting opportunities. More if it develops.

I found an interesting new (beta launched 2/13/2007) search engine called hakia. From the looks of it, they’re rolling their own solution on semantic-based stuff. Here’s a blurb from their site:

The basic promise is to bring search results by meaning match - similar to the human brain’s cognitive skills - rather than by the mere occurrence (or popularity) of search terms. hakia’s new technology is a radical departure from the conventional indexing approach, because indexing has severe limitations to handle full-scale semantic search.

Interestingly, they purposefully call out specific uses in which they believe their solution is particularly well-suited:

hakia’s capabilities will appeal to all Web searchers - especially those engaged in research on knowledge intensive subjects, such as medicine, law, finance, science, and literature.

I hammered on it for a bit, and it does look like it’s got some good feet under it. I’ll try replacing it as my go-to search site for a while and see how it goes (similar to what I did with AltaVista when I found Google in 1997 - never to look back). More on the experiment - if it develops.

I turned up a short counter-point blog post about their approach by Marc Fawzi and ToxicWave:

We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies.

I like that idea… at least until the machines are smart enough to push aside their masters (as anyone who reads science fiction knows they’ll do eventually).