In Wiki We
Trust Can computer scientists make
Wikipedia more reliable? Hadley Leggett plunges
into the weird world of Internet collaboration. Illustrated by
Jessica Shult Huppi and Margaret E.
Gibson. Illustration:
Jessica Shult Huppi You
can find almost anything on Wikipedia. Type "hurricane" into the search
box, and you'll get a comprehensive overview of tropical storms,
complete with colorful diagrams and links to weather stations around
the world. On Ludwig van Beethoven's page, you can listen to all
three movements of his Moonlight Sonata. If you're seeking a
destination for your next Mediterranean vacation, you can learn
about Porchesia, a little slice of island paradise off the coast
of Syria. Wait, Porchesia? Let's hope you
didn't book plane tickets based on Wikipedia. Despite the detailed
entry about Porchesia and its 354,897 inhabitants, the island doesn't
exist. Even worse, it took Wikipedia editors 10 months to discover
the bogus page and delete it. Each month,
about 60 million people visit Wikipediaa free, open-access online
encyclopedia with more than 12 million articles in 260 languages.
Despite its popularity, Wikipedia has major drawbacks. Anyone with
an Internet connection can contribute, so the site is subject to
vandalism, bias, and misinformation. And edits are anonymous, so
there's no way to separate credible information from fake content
created by vandals. Because of these
unpredictable qualities, many college professors have forbidden
students from citing Wikipedia in their papers. At one unfortunate
newspaper, reporters published two wiki-based blunders in a row.
The next day, a fed-up editor added using Wikipedia to the list of
offenses that could get someone fired. But
according to computer scientists at UC Santa Cruz, banning Wikipedia
won't solve the problem. I think it's incredibly short-sighted,
says Luca de Alfaro, who runs UCSC's Wiki Lab. Instead of trying to stop the flood of collaborative
online content, de Alfaro wants to help users know when to trust
Wikipediaand when to reach for that dusty Encyclopedia Britannica
on the shelf. His group has created a gadget called "WikiTrust," which assigns a color code to newly
edited text based on the reliability of its author. It's a simple
concept: The more often a person's contributions are thrown out,
the less reliable they're likely to be. They've
hit on the fundamentally Darwinian nature of Wikipedia, says graduate
student and Wikipedia enthusiast Virgil Griffith of the California
Institute of Technology. Everyone's injecting random crap into
Wikipedia, and what people agree with more often sticks around.
Crap that people don't like goes away. Wikipedia:
A happy accident? Wikipedia began as an
offshoot of another free online encyclopedia, called Nupedia,
but Wikipedia quickly surpassed its parent site in size and popularity.
The difference? Nupedia's articles were written by experts and had
to undergo a long peer-review process. During its three-year
existence, Nupedia produced only 24 finished articles. To speed things along, Nupedia's founders decided
to try an experiment: What if they made a sister site, where anyone
could create and edit pages? They hoped the new site would generate
content that experts could review and eventually add to Nupedia.
They called their idea Wikipedia, after wiki
software, which allows many people to edit the same Web page at
once. The name wiki comes from the Hawaiian word meaning fast.
Indeed, Wikipedia grew faster than anyone expected. Within a year
of its launch in January 2001, the encyclopedia had accumulated
20,000 entries. The fact that Wikipedia
took off and allowed anyone to edit, I think it was a bit of a
historical accident, says Wikipedia expert Joseph Reagle of New
York University, who wrote his PhD thesis about the encyclopedia's
unique culture. Initially, anonymous edits
helped the site gain popularity, Reagle says. As the project gained
momentum, it attracted an astonishing number of loyal fans. They
call themselves Wikipedians, and they spend countless hours creating
and editing pages. Reagle traces the Wikipedia craze back to an
innate human desire to collect and archive facts. I think there's
an ancient bibliophilic passion, he says, and then Wikipedia boosts
that up a level, because it's incremental. You can make a change
even if you don't have a lot of time that day. With
millions of eyes scanning its pages and correcting mistakes, Wikipedia
achieved a surprising level of correctness. A 2005 study conducted
by Nature concluded that, at least for science articles,
Wikipedia was nearly as accurate as the Encyclopedia Britannica.
Getting wiki right But
rapid expansion led to growing pains. Wikipedia has struggled
against a burgeoning tide of spammers, vandals, and folks who just
don't know what they're doing. Everyone
wants to give it a try, says Ellen Levy Finch of San Jose, California,
who edits Wikipedia under the username Elf. Those
under the age of 18 or who've had one too many beers apparently
find it hysterical to insert profanities into the middle of articles
and other random stuff. Even just a few
bad pages can erode the public's trust. For example, when journalist
John Seigenthaler discovered a Wikipedia page linking him to the
Kennedy assassinations, he wrote a scathing editorial in USA
Today. In January 2009, Wikipedia made headlines again: After
Senator Ted Kennedy suffered a seizure at President Obama's
inauguration, he met an untimely death on his Wikipedia pageyet
recovered nicely in real life. Out of the
many millions of Wikipedia articles, Reagle says, it only takes a
couple that have something libelous or wrong to be very embarrassing
and detrimental. To keep Wikipedia accurate,
nearly 3,000 volunteers serve on the Recent Changes Patrol. They monitor newly edited pages, and
they've developed tools to check for mistakes and vandalism. For
example, special computer programs called bots use semantic analysis
to pick out words and phrases likely to be spam. Bots flag the
lines of jerk jerk jerk that keep popping up on George W. Bush's
page and the intentionally scrambled words that appear almost daily
in the dyslexia entry. In 2007, CalTech's
Virgil Griffith introduced another innovative way to stymie spammers.
Called the WikiScanner, Griffith's tool traces anonymous Wikipedia edits
to their source's IP address, an identification number assigned to
each computer connected to the Internet. By combining this tracking
with data on which companies own which IP addresses, WikiScanner
created quite a stir. It revealed that Wal-Mart, Starbucks, and
even members of the U.S. Congress had spruced up their own Wikipedia
pages. Folks at The New York Times and BBC had vandalized
Bush's page. Griffith, 25, calls himself
a disruptive technologist. He created WikiScanner to punish
politicians and corporations who whitewash their own pages. But
his tool doesn't make bad content easier to spot, and that's why
Griffith thinks a gadget like WikiTrust is crucial. One of the
biggest criticisms of Wikipedia is saying, 'Hey, that page you're
looking at could have been edited five minutes ago by a crazy
person,' he says. WikiTrust blows that criticism out of the water.
Humble beginnings De
Alfaro's journey into Wikipedia began when he wanted an easier way
to share his favorite recipe for Pasta all'Arrabiata with friends
and family. For a computer scientist with a background in game
theory, what better way to exchange cooking advice than to create
a wiki? His CookiWiki attracted people who wanted to share recipes, but
it also appealed to vandals. Instead of getting frustrated, de
Alfaro began to brainstorm. I started to think there has to be
some way to give people an incentive to behave in a more productive
way, he says. De Alfaro came up with the idea of a reputation
system for Wikipedia. He hoped it would encourage editors to make
worthwhile changes, while also helping readers sort reliable content
from misinformation. Collaborative websites
such as Amazon.com and eBay already have reputation systems based
on user ratings. For instance, if you write a book review on
Amazon.com, other readers can boost your user rank if they liked
what you wroteor lower your rank if they didn't. Many people
proposed creating a similar reputation system for Wikipedia, but
de Alfaro feared that user-generated ratings might upset Wikipedia's
collaborative atmosphere. He also didn't want to create more work
for editors. If something works as well as Wikipedia, de Alfaro
says, you think very hard before proposing to modify it in such a
way that everybody has to give comments on everybody else. Since Wikipedia already keeps track of every
revision, de Alfaro realized he could use that data to create a
reputation system independent of human input. Machines should work
for humans and not the other way around, he says. So if you can
get information without bothering people, via clever algorithms,
this is much better. Welcome to WikiTrust
The Wiki Lab built its trust tool around the
principle that Wikipedia pages tend to improve over time, or at
least to move toward consensus. You can measure an author's
trustworthiness by looking at how long his or her edits persist
over time, says UCSC graduate student Bo Adler, who worked on
WikiTrust with de Alfaro. When you add something to Wikipedia and
it lasts a long time, you did a good job, Adler says. If it gets
erased right away, you did a bad job. Based
on an editor's past contributions, WikiTrust computes a reputation
score between zero and nine. When someone makes an edit, the
background behind the new text gets shaded orange depending on the
author's reputation: the brighter the orange, the less trust the
text has. Then when another author edits the page, they essentially
vote on the new text. If they like the edit, they'll keep it, and
if not, they'll revert it. Text that persists will become less
orange over time, as more editors give their votes of approval.
| Image: Courtesy of the Wiki Lab,
UC Santa Cruz | A screen
shot illustrates the orange color-coding used by WikiTrust, with
the least reliable edits in bright orange. | The Wiki Lab spent months optimizing the WikiTrust
algorithm. We try to predict when things are going to be deleted,
Adler says. We want words that are going to be deleted to have a
low trust, and words that are not going to be deleted to have a
high trust. They also wanted to balance the need to flag questionable
text with the need to keep the page readable. Too much orange text
would turn people off, Adler says. The team
designed WikiTrust to detract as little as possible from the Wikipedia
experience. The gadget hides in a tab at the top of the screen.
If you don't want to bother with trust ratings, Adler says, don't
click on the trust info tab. And don't go hunting for your own
orange ratings: The team decided not to display user reputation to
avoid discouraging new users. Even if you're a wonderful biologist,
de Alfaro says, if you haven't written very much at all on Wikipedia,
your reputation will be low. Coming to a
wiki near you In November 2008, the Wiki
Lab published WikiTrust 2.0, the latest version of its software.
Now anyone who hosts a wiki can download the program and run WikiTrust
in real time. For instance, de Alfaro installed WikiTrust on his
CookiWiki. The Wiki Lab has also been working closely with
the WikiMedia Foundation, the nonprofit organization that manages
Wikipedia. WikiMedia has been supportive of the work, but the
process of getting WikiTrust on the main site has been slow. For
now, Wikipedia users can only explore a demo version of WikiTrustbut
that's about to change. After years of
collaboration, WikiMedia bigwigs finally decided in April 2009 to
make WikiTrust available for all registered Wikipedia users. The
launch date for the new gadget has not been set, but de Alfaro
thinks it will go live in September or October. Some critics think there may be hurdles to running
the trust tool over the entire site. This isn't a trivial web
architecture design and implementation issue, says Ed Chi of the
Palo Alto Research Group, who studies Wikipedia and social cognition.
Since WikiTrust assigns a reputation score to every word in every
article, running the program in real time will demand significant
processing power and several terabytes of extra disk space. It
would require a whole new infrastructure for Wikipedia to handle
this, Chi says. Wiki Lab researchers already
are working on making their gadget more efficient. Using the first
version of WikiTrust, it took a regular computer 20 days to process
five years of Wikipedia revision data. The latest edition cuts
that time to four days, and it can calculate trust ratings for 30
to 40 revisions per second. That's on a single machine, Adler says.
So it's very practical for us to keep up with Wikipedia. Measuring truthor consensus? WikiTrust can detect most types of questionable
content. But when asked whether his gadget measures "truth"
on Wikipedia, de Alfaro hesitates. WikiTrust determines trustworthiness
based on how many people agree with a particular passage of text.
But majority approval doesn't guarantee truth. If 20 people are
all biased in one way, our tool does not know it, de Alfaro says.
Our tool can simply measure consensus. Adler
offers a hypothetical example. What if Wikipedia was dominated by
Nazis? he says. Whatever you say about the Holocaust, they're
going to revert you, and then other people are going to come and
support those edits rather than your edits. In that case, WikiTrust
would start flagging your Holocaust content as unreliableno matter
how accurate it was. Trial by consensus
sounds sketchy, but majority opinion has nearly always dictated
society's definition of truth. A 15th century encyclopedia would
have insisted that the sun revolves around the earth. The 1911
edition of the Encyclopedia Britannica asserted that bacteria causes
the flu, since viruses hadn't been discovered yet. So perhaps it's
not a question of whether to trust consensus. Rather, whose consensus
do you want to trust: a handful of experts, or thousands of anonymous
Internet users and a clever computer algorithm? Sidebar:
Will wiki work for health information? | Illustration: Margaret E.
Gibson | With four
children under age 5, James Currier had lots of questions about
sniffles, rashes, and fevers. One late night, while holding a sick
child in his lap, he tried searching for answers online. "I was shocked," Currier says. "I
couldn't find anything easy to read, explanatory, and credible,
that wasn't covered with pharmaceutical ads or Botox commercials."
Currier didn't know much about medicine, but
he knew a lot about the Internet. As an expert in user-generated
media and social networking, he wanted a better way to share health
information online. So in February 2009, with support from medical
schools around the country, Currier launched Medpedia, a
wiki-style Internet encyclopedia for medical information. Unlike its famously successful counterpart,
Wikipedia, Medpedia doesn't abide by the motto Anyone can edit.
Only physicians and qualified PhDs may make real-time changes.
Others can propose edits using a suggest changes button, but the
material won't go live until approved by an editor with the right
credentials. The site's creators hope expert-only editing will
ensure accurate content. At the same time,
restricting the site to experts will limit the number of people who
can contribute, and it may slow the site's growth. "That's
the key question," says public-health physician John Swartzberg
of the University of California, Berkeley, a volunteer adviser for
Medpedia. "Will physicians and PhDs spend their time vetting
this material?" To encourage busy
physicians and researchers to participate, Medpedia offers networking
for health professionals and organizations. It also gives credit
to experts who contribute to the site. Like Wikipedia fans,
supporters of Medpedia say they were also drawn to the site's
altruistic goals. "We were quite
taken with the idea that this would be a kind of open system,"
says information technology expert Henry Lowe of the Stanford
University School of Medicine. "All the information in Medpedia
would be high quality, reliable, up-to-date, and obtained by experts.
It would be available to the world at large, free and
unrestricted." Although more than 110
organizations have contributed or pledged to contribute content,
the site is still in its infancy. Users can create a profile and
get updates on topics that interest them, but for now, Medpedia
doesn't offer enough articles to compete with more established
health sites like WebMD or Health Central. "It's going to
take time," Lowe says. "But if it could work for general
knowledge, I don't see why it can't work for medical knowledge."
Story 2009 by Hadley Leggett. For reproduction
requests, contact the Science
Communication Program office. Top Biographies Hadley Leggett B.A. (biochemistry
and Spanish) Rice University M.D., UC San Francisco
Internship: Wired.com, San Francisco A
generous soul might label my path to science writing indirect.
Others might call it crazy. Indeed, why would any sane person spend
four years in medical school, enduring sleep-deprived call nights
and endless hours of studying, only to retire her stethoscope upon
graduation? I could plead insanity, or I
could tell the truth. I found medicine fascinating, challenging,
at times even exhilarating, but writing has always been my secret
dream, the impractical aspiration I never quite let myself consider.
In science writing, I've discovered my middle path: enough writing
to fulfill my creative spirit, enough science to satisfy my curious
brain. My pen won't touch a prescription pad any time soon, but
beyond the hospital walls, I look forward to blank pages and open
spaces. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . Jessica Shult Huppi
B.A. (marine biology and art) UC Santa Cruz Internships:
Smithsonian Institution, Washington D.C., and Scientific
American, New York I have always
pursued both science and art. I completed a double major in marine
biology and art in college and later taught science, math, and art
to middle and high school students. Being able to join the Science
Illustration Program and be surrounded by others like me has been
an amazing opportunity. After completing my internships, I plan
to freelance from my home in San Francisco. Visit my web
site. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . Margaret E. Gibson B.A. (environmental
studies) Vassar College Internship: Northwoods Stewardship
Center, East Charleston, VT I was
born in rural northern Vermont and have had a passion for creating
art and for exploring the natural world ever since I can remember.
I graduated from Vassar College with a degree in environmental
studies, an interdisciplinary major that combined environmental
sciences and the arts to study natural history and its impact on
the human experience. Completing the Science Illustration Program
was a perfect way to continue this course of study. My primary
interest lies in creating environmental education material for
children. Top |