Interview with an Editor: rdkeating25

Recently, there has been some interesting news about web search, in particular human-powered search. Meanwhile, the ODP community has been having serious discussions about how to improve our product. I asked Mr. Robert Keating about his thoughts on the state of web search. Keating is the Editor in Chief of the Open Directory Project.

When did you join the Open Directory Project and what led you to become its Editor In Chief?

I joined the ODP in June 1999 when I came to AOL to work on a directory solution for the new AOL Search service. At that time, a web directory was the primary tool for finding information on the web, with Yahoo the industry leader. We leveraged ODP data, adding value to it by mapping AOL’s unique content within.

After AOL Search launched, I began working exclusively on the ODP, helping the community with editorial and governance issues. Initially, I worked mainly with the Business and Regional editors, and managed a group of staff editors who were similarly helping editors in other areas. In 2000, skrenta appointed me Editor in Chief, a position I’ve held ever since, although only part-time for the past four years.

You have seen the evolution of Internet search from it's infancy. What would you identify as the key challenges facing Internet search today?

Web search is still in its infancy in many respects. Understanding context and user intent are two of its biggest challenges. A number of interesting developments over the past few years in machine learning and clustering have emerged, but even these innovations have limitations and are not easily understood by the general public and web users adding an even greater challenge: usability. While relevancy in web search is light years from where it was even 3 years ago, people still have trouble understanding search results.

Your background is in Library Science, right? How can traditional classification techniques be applied to help address the challenges posed by today's Internet?

Traditional classification techniques used in libraries were designed to categorize physical objects, and don’t really work well on the web or in libraries for that matter. In the web’s infancy many attempts were made to apply systems such as Dewey Decimal, but their complexity and hierarchical nature did not translate well to web. More importantly, traditional classification systems were not designed to reflect how users search and navigate the web.

Classification systems don’t have as much potential as controlled vocabularies, which are lists of standard terms used to describe content. The ODP is a hybrid of both: it’s a fixed hierarchy that has guidelines for ordering and naming categories. The problem with fixed hierarchies is the assumption that all users navigate information the same way. Content tagging is supposed to make up for this shortfall. By navigating tags, users have multiple pathways to information. However, there is no consistency to assigning words to tags, which doesn’t make information any easier to find. By using consistent terminology in tagging, users can drill down on tags and be assured that they are getting all relevant information. Controlled content tagging is widely used in e-commerce and enterprise sites, but has not yet been adopted on a large scale such as the web. Controlled content tagging developed and reviewed by a large scale web community like the ODP has the greatest potential addressing the challenges of organizing the web.

Time Magazine recently recognized the contributions of the volunteer web community in its man of the year issue. As someone who has led a volunteer web community for so long, what are your thoughts about the article?

It’s as if Time discovered the sky was blue, wasn’t it? All kidding aside, it’s great to see community contributed projects gain legitimacy and become mainstream. Kudos to Time Magazine for finally giving props to the web community.

Today, anyone with a few dollars can put up a web page about any subject, anyone can create a blog, anyone can contribute to a Wikipedia article, or create a Myspace page. What can be done to help surfers determine whether Internet content is trustworthy?

I think caveat emptor has always been the mantra on the web. “Trustworthiness” is a very subjective concept, and I’m suspicious of anyone who endorses a website as trustworthy. Wikipedia has been unfairly slammed in the press for including biased, inaccurate information, yet allegedly authoritative resources (e.g., government and corporate-produced resources) are even more biased and inaccurate. With Wikipedia, blogs, etc. you are more likely to get uncensored points of view so you can make up your own mind. The groups publishing these resources are pretty transparent, Wikipedia in particular. The same cannot be said for supposed “trustworthy” resources.

Transparency of authorship is an important criterium in editing the ODP. Health editors, for example, would be doing a disservice to surfers by listing sites on which an authentic author or responsible party can’t be identified by the surfer. Likewise, if a history site gives factually inaccurate information, you’re not helping anyone by providing it as a resource. Evaluating factors such authorship, authenticity and accuracy can be a slippery slope leading to censoring points of view. However, since no one owns a category, the ODP is able to be objective and inclusive while maintaining editorial integrity.

The ODP community should become more transparent. Opening up the ODP is long overdue. One criticism of the ODP is that it is closed community, making its own trustworthiness questionable. It would be wise for the editing community to reconsider how it goes about editing and interacting with the web community.

As a pioneer in the field of Internet search, what new search technologies are you most excited about?

Not to sound too cliché, but the industry moves quickly, and my excitement changes almost on a daily basis. Today, I’m excited by the announcement of Wikiasari, and the social search engines that are emerging, such as Decipho which launched a week or two ago.

- xixtas


