Sunday, March 11, 2007

Can you represent things “how they are”?

I stumbled into this NYT article about Danny Hillis – of Thinking Machines fame – latest startup, Metaweb Technologies. His goal is to create Freebase, a database that will describe things “how they are”. He wants, for example, to describe Arnold Schwarzenegger with different views as a bodybuilder, a movie actor, and a politician.

I am wondering if the approach of having a centralized system to describe "things how they are" will ever work. The point is that the same things can have very different meanings for different people. I just think back to the only time I visited East Berlin before the wall came down. I went to a bookstore and looked at schoolbooks talking about the second world war. The books were telling how the great and wonderful Soviet Union liberated Germany from the Nazi dictatorship. The US was barely mentioned. In the meantime I have been in now unified Berlin many times. But I can not find these books anymore in the regular bookstores.
The point is that depending on society, upbringing, ethical and moral system, etc. the same “facts” can be viewed 180 degree opposite. What is “way cool” to one group can be unacceptable to another group. How one can capture such divergent viewpoints in a single database I don’t know.

Of course, Wikipedia as a centralized repository describing “things how they are” comes immediately to mind. It seems, however, that Wikipedia reflects the viewpoint of well-educated, tech-savvy, Western, mostly liberal people – a small elite, who is unaware of the real problems of the world, as other groups coming from other parts of the World might say. Also, even in swarm-controlled Wikipedia there are “editing wars” on controversial topics such as “George W. Bush” or “abortion”, where editing access to these pages has to be controlled by editors.

Another famous earlier project that tried to capture the commonsense knowledge of the world in a centralized repository was CYC. It was started in 1984 by Doug Lenat, when artificial intelligence was seen as the holy grail of computer science. It describes knowledge in form of well-structured rules. But the problem is that knowledge changes so fast that the people capturing it for CYC were never able to keep up.

As a believer in swarm creativity and the “wisdom of crowds” I think that the decentralized and chaotic approach of the Web at large, with search engines on top to retrieve and access knowledge is a much more flexible way than having a centralized repository. Searching for controversial topics on the Web will bring up pages discussing it from all possible points of view from all walks of live and regions of the globe. I will be really curious to see how Danny Hillis will succeed in keeping up with change while capturing opposing viewpoints.

