Saturday, January 12, 2008

What makes a search engine biased?

I have been thinking about the sensitivities surrounding China's growing internet culture with regard to censorship and freedom of speech. The Universities of Leiden and Heidelburg archive websites with a view to supporting Chinese studies - see Digital Archive for CHinese Studies (DACHS). A very strict control is kept on who gets to use that collection, no doubt for good reasons. Why? Political control and freedom of speech concerns in China. Might be worth taking a look at Nicolai Volland's PhD thesis The Control of the Media in the People's Republic of China.

This territory of censorship is not new or exclusive to China though - other countries around the world are looking to tighten their laws to 'control' the content viewable by its citizens, including France and Australia. Admittedly the reasons for control in China may well be different to that of other countries; but these governmental interventions beg questions. How much government control over publicly available content should be asserted? How much should people be allowed to determine that at their end, i.e. let the user drive?


What I find more intriguing, possibly because it is less obvious, is the underlying programming in the development of Baidu the search engine and its ability to enable effective information retrieval with Chinese characters and search strings. The fact Baidu indexes websites that search engines like Google don't have access to is an obvious advantage. An article I read in New Scientist (Beyond the great firewall, November 2007) refers to the fact that many Chinese people use Baidu also because it processes Chinese characters more effectively and indexes websites possibly not indexed by Google.

Aside the content bias (or constraints), I began to think a bit about the fact that a great deal of computer coding is done in English, a la, the lingua franca of computer programming is English. It made me wonder about the coding in Google, and how effectively Google handles diverse language content (in its alogrithms) and how much those linguistic calculations are based on an understanding (mostly) of English, or many languages? Is the lingua franca of coding English too with Baidu? Are their linguistic calculations for search different because of the nature of Chinese characters? I don't know, but it pays to look below the surface (i.e. content) too, to see how the internet is or isn't working for information retrieval irrespective of the political controls being asserted over the top of this.

How well have the world's Chinese speakers been served by Google? If they prefer Baidu, what is that about? Is it just better indexing and search results is there something else? How do Baidu and Google compare and what basis are they best compared on? Which is biased in what way and by what?

No comments: