Tuesday, September 07, 2004

The Problem with Wikipedia

Ed Felten gets to the heart of what is problematic about relying on Wikipedia as a reference. [Via Boing Boing]

Overall verdict: Wikipedia's advantage is in having more, longer, and more current entries ... Britannica's advantage is in having lower variance in the quality of its entries.
The key word here is variance: when Wikipedia articles are good, they can be very good indeed, a treasure trove of information that won't easily be found elsewhere, but when they're bad, they can be absolutely awful. If what one is looking for a reference of first resort, then Wikipedia will not fit the bill, as the key criterion in such a circumstance would be that the information one gets meets a certain minimum level of accuracy; the Encyclopedia Britannica is still the premier authority in this case. As an alternative take on a subject one for which one already knows at least the outlines, however, Wikipedia often can't be beaten.

As a practical illustration of what I'm talking about, let us look at the Niger-Congo languages, of which Yoruba, the language of my ancestors, is a member of the Benue-Congo branch. The Wikipedia entry for this extremely large family is disappointly light on content, as are the entries for Benue-Congo and Yoruba, and what little information there is in both of the latter entries is present largely because of my own efforts, though I am no professional linguist. The Britannica entry (pay only) for Niger-Congo, on the other hand, includes not just useful information on the common and distinguishing features of the various branches, but also helpful maps showing the distribution of the member languages. This discrepancy between the two sources is present throughout all subject areas dwelling on matters related to Africa.

When we turn our attention to mathematics, on the other hand, a very different picture emerges. Take the Wikipedia entry for algebraic geometry, for instance; not only is it accurate and concise, but it also provides references to all of the standard texts on the subject, and it links to entries for topics like Gröbner bases, sheaves, schemes and other related topics is such as would be unthinkable to find in a generalist work like the Britannica. As a mathematics reference, Wikipedia is very often better than even a carefully curated resource like Eric Weisstein's World of Mathematics. The Britannica too has an entry for algebraic geometry, but while it again shows that the curators have done their job of keeping errors to a minimum, it does not bear real comparison with the Wikipedia version, not least because one cannot learn very much more about the subject from the Britannica entry without going beyond the confines of said encyclopedia.

It is only natural to wonder what other interesting statistical properties the Wikipedia's entries may have. In terms of quality, what does the distribution of the articles look like, beyond just having a broader range of variation than one would find in a traditional encyclopedia? Is it normal, bimodal, chi-squared or one of the beta family of distributions? Does it skew one way or another? What is its kurtosis, or in other words, how flat or peaky is it? Is it even possible to devise a scale which will be widely agreed upon on which articles can be more finely rated than "good" and "bad"? The Wikipedia Statistics page provides lots of info on raw data like article and word-count, number of daily edits and the like, but I don't see any evidence that the sorts of questions I'm raising here are being addressed.