COLUMNS Behind
the Screen The Privilege of Ranking: Google
Plays Ball
by Richard Wiggins
Senior Information Technologist,
Michigan State University
Here's a riddle: What do undergraduate admissions at
the University of Michigan (UM), the college football
Bowl Championship Series, and Google have in common?
Lately, I've been springing that riddle on people
when we talk about searching. Usually folks scrunch
up their faces a bit before attempting to answer. I
happened to ask this very question during a presentation
at a conference sponsored by the Ann Arbor Computer
Society just after the Supreme Court heard arguments
on the Michigan admissions case. My audience, mostly
hard-core software developers, must have thought a
jab from East Lansing was headed their way. (Michigan
State University and the University of Michigan have
been known to feud, now and again.) On the contrary:
It's a riddle for any audience that uses search engines.
For readers of Searcher, this riddle is probably
pretty easy unless you don't happen to pay attention
to college football. Here's a hint: Before 1998, the
national champion in college football was chosen by
polls. The two polls that attracted the attention of
sports fans were the Associated Press poll of sportswriters
and the USA Today/ESPN poll of leading coaches. The
college bowl system was never a tournament; who played
in what bowl game depended on prestige and tradition
as much as that year's rankings. In many years, the
number-one and number-two teams in the nation never
met. Some years, the polls named different national
champions at the end of it all.
The Bowl Championship Series (BCS) tries to fix all
that. The goal of the BCS is to ensure that the best
two teams in the nation meet in one bowl game to determine
one true champion. The only problem is, unlike NCAA
basketball's famous "Road to the Final Four," the BCS
still isn't a tournament; in effect, there is a single "round" with
one championship game.
So to make this all work, the BCS has to rank the
top college teams so that the best two teams in the
land meet in one winner-takes-the-championship match.
BCS needed a mechanism to pick these two contenders;
if a human committee made the selection, someone would
always cry foul.
So the BCS came up with a formula an algorithm a
mathematical scheme for ranking the best college
football teams. It's surprisingly complicated, based
on:
The average of the coaches' and AP
media polls.
An average of the rankings of leading
computer-based ranking services, such as The New
York Times computer poll and the Sagarin poll.
Strength of schedule, as measured by
the win-loss record of a team's opponents1.
Losses: one point demerit for each
loss.
"Quality" wins: bonus points for defeating
a top-10 team.
Now, back to our riddle. Figured it out yet?
The Challenge of Ranking
In the innocent days before 9/11, Southwest Airlines
used to joke during the safety speech, "In the event
of sudden loss of cabin pressure, oxygen masks will
appear. Put your own mask on before assisting others.
If you are traveling with two small children, please
decide now which one you love more."
Sometimes it's hard to make a choice, even among
only a handful of alternatives. Anyone who's had trouble
deciding what career to pursue or where to go on vacation
this summer knows how to do this: You write down all
the pros and cons of each alternative on an index card;
after all the choices are "scored," you sort among
them based upon your evaluation.
The BCS series' challenge is to pick the top two
of 117 or so Division I-A teams. At the end of the
season, only a few teams contend in most people's minds
for the number one spot. Most sports fans, for instance,
would not accept a national champion that played a
weak schedule or that lost more than one game in the
regular season.
So all of the complexity of the BCS formula is brought
to bear on what ought to be a very simple problem:
Among a handful of contenders, choose which two teams
deserve to play in the championship game. The BCS formula
attempts to be "scientific." It appears objective,
even though human judgment does enter into the formula
due to the polling component.
Professional searchers understand that all search
engines, Web or otherwise, use or at least offer
to use relevancy ranking to sort items on a
hit list. This is true whether it's a Web search engine
such as Google, a legal search engine such as Westlaw,
or a full-text scholars' resource such as ProQuest.
Each system evaluates each candidate item, assigns
each a score based on a series of criteria, and then
sorts all the items from the highest- to the lowest-scoring.
Now we're ready to solve our riddle. BCS, UM admissions,
and Google all attempt to sort dissimilar items into
a list starting with the "best" items on top:
The BCS selection process picks the
top two entries from a list of 117 potential candidates.
The undergraduate admissions process
at the University of Michigan uses a mathematical
formula to select about 5,000 students for admission
out of a pool of over 20,000 applicants.
The search engine Google uses a mathematical
formula to choose the top 10 or so Web pages to present
to users out of a pool of over 3 billion URLs.
In all three cases, some people defend the formula
vigorously especially its owners and stakeholders.
In all three cases, other people criticize the formula
vociferously especially those people who don't
rank at the top. If you like how you're ranked, the
formula works just fine. If you don't like how you're
ranked, you try to pick apart the formula to find its
flaws.
In all three cases, knowledge of how the formula
works causes people to change behavior, trying to "game" their
ranking. The formula doesn't just measure the items
to be ranked; it leads players to try to "improve" themselves
in light of the formula's parameters.
And in all three cases, the owners of the formula
periodically "tweak" the algorithm when it turns out
that the rank-ordered list doesn't yield the desired
results. Sometimes the tweaks yield unintended consequences.
Ranking Michigan Admissions
Jennifer Gratz applied to the University of Michigan
in the fall of 1994, her senior year of high school.
She had a 3.8 grade point average, ranking her 13th
in her high school class of 298. She scored 25 on the
ACT. An honor student, tutor, vice president of the
student council, homecoming queen, and varsity cheerleader,
she thought she presented the kind of credentials that
would guarantee admission to UM.
On April 24, 1995, she opened a rejection letter
from UM. She immediately asked her father, "Can we
sue them?"
If you think it's hard to choose between Lake Louise
and Sanibel for your next vacation, imagine trying
to pick the best 5,000 applicants out of 20,000 candidates.
UM chose a scoring process to ease the burden. Gratz'
case shone a bright light on the details of the UM
undergraduate admissions process. As the case wound
through the courts, UM established a Web site to publish
its side of the story [http://www.umich.edu/~urel/admissions],
which, among other things, explained its scoring process.
UM calls its ranking formula a "selection index," which
assigns up to 150 points to each applicant. Available
points break down this way:
- 80 for high school grades
- 12 for standardized test scores.
- 10 for strength of high school's academics.
- -4 to +8 points for strength of student's course
choices.
- Up to 40 points for "other factors":
- 10 for being a Michigan resident.
- 6 for being a resident of a county in Michigan
that is underrepresent
- 2 for residency in an underrepresented state.
- 4 for alumni relationships.
- 5 for leadership and service.
- 3 for a personal essay.
- 5 for personal achievement.
- 20 points for one of these:
- Membership in an underrepresented
minority group.
- Socioeconomic disadvantage.
- Attendance at a predominantly minority
high school.
- "Athletics."
- "Provost's discretion."
UM argues that its formula accomplishes the university's
goals for its undergraduate class. It argues that all
admitted students meet minimum qualifications to succeed
in coursework at the university; the formula yields
a class that is also ethnically and geographically
diverse. Some 100 groups filed amicus curiae briefs
supporting UM before the Supreme Court.
Jennifer Gratz and her sponsors don't like that 20-point
boost given to some applicants. They uncovered evidence
that some minority applicants were admitted who would
have ranked lower than Gratz without that 20-point
bonus.
Our purpose here isn't to weigh the legal arguments
or offer a late brief for the Supreme Court. However,
we do want to expose how much faith we place in mathematical
formulas. In essence, no one is quarrelling with the
effectiveness of UM's ranking algorithm. Instead, the
quarrel is with its premise.
UM's algorithm literally follows the language of
the landmark Bakke decision, in which the Supreme
Court ruled that racial quotas were illegal, but that
race could be used as a "plus factor" in considering
applicants.
Alas, the purity of the premise can be called into
question. UM doesn't just seek academic quality
and applicant diversity. That 20 point boost for "Athletics" is
necessary because Michigan, like Stanford and Duke,
is one of the few schools that manages to be strong
academically and competitive in major sports. If you're
a selective school and also selective about athletes,
you need that 20-point fudge factor to insure the presence
of talented athletes.
That "Provost's discretion" boost is especially interesting.
Press reports indicate that applicants associated with
major donors receive that bonus. A recent Wall Street
Journal expose showed that marginal applicants
to Duke can essentially buy their way into the freshman
class.
Critics of the UM algorithm concentrate on the presence
of a racial component in the formula, which is the
issue now before the Supreme Court. These critics conveniently
ignore the other non-academic components. Press reports
say that George W. Bush's SAT score was 1206, which
probably would not suffice for admission to Yale without
his family alumni connections. Critics of the critics
argue that if race can't be a factor, then the fact
that grandpa was an alumnus, or that mom donated to
the development fund, or that Biff is a 350 pound linebacker,
shouldn't count, either.
Other critics attack the very idea of a formula.
If a formula is used to score applicants, isn't the
admissions process, well, formulaic? Some exclusive
small schools have stepped forward with this criticism.
But none of this points to an inherent flaw in using
a scoring algorithm to rank competing candidates. If
you've got 20 different people evaluating 20,000 candidates,
a scoring algorithm is the only way to ensure
a consistent process, certainly for the first pass.
People also tend to evaluate the formula without
fully understanding its implications. That boost of
20 points sounds enormous on a scale of 150, raising
critics' hackles. However, because a much larger number
of non-minorities apply than do minorities, the impact
of minority applications is less than people might
guess. Former university presidents William Bowen and
Derek Bok crunched the numbers and found that eliminating
racial preferences would increase the likelihood for
a white undergraduate to be admitted to a selective
school from 25 percent to 26.5 percent.
Even the choice of scale influences the visceral
reaction to the UM formula. Suppose instead of a 20-point
boost on a 150 point scale, UM awarded a 2 point boost
on a 15 point scale. Mathematically it's the same outcome,
but emotionally it sounds different. Or suppose UM deducted 20
points for being white again, mathematically,
there's no difference, but you can just imagine how
much more criticism would ensue.
This dispute is about the premise behind the formula the
goals implicit in the formula's components not
about using an algorithm to help make hard choices.
Google, Integrity, and Tweaking
In the UM and BCS examples, a complex formula is
used once a year to build a rank-ordered list, and
the resulting rankings will greatly affect some people's
lives. Over 150 million times a day, Google users invoke
a complex formula to choose among 3 billion Web pages and
they expect an answer in less than a second.
By any measure, Google is the overwhelmingly dominant
Web search engine. It's pretty uncommon for a single
Google search to change someone's life significantly,
but Google plays an important role in the information
many people rely on for choices in their lives. Cumulatively,
the quality of Google's ranking impacts millions of
choices made daily.
We know a lot about how Google works in general and
little about the details. Google was conceived when
two Stanford graduate students sought an efficient
algorithm for ranking the "importance" of millions
of Web pages. Larry Page and Sergey Brin observed that
existing search engines such as AltaVista had compromised
their editorial integrity by silently selling the top
of the hit list to advertisers. They sought to build
a search engine whose ranking formula exhibited integrity. "Why
not rank pages based on how important they are to everyone
on the Web?" they thought. So they invented PageRank,
a mathematical formula that, to oversimplify a bit,
scores each candidate Web page by counting the number
of links to it.
Brin and Page described PageRank in detail in a 1998
paper. (The paper is still online at Stanford; see
http://www-db.stanford.edu/~backrub/google.html.)
They invented PageRank to use the link structure of
the Web which pages link to which other pages to
decide what makes it to the top of the list. Their
original paper proposed the scheme:
We assume page A has pages T1...Tn which point to
it (i.e., are citations). The parameter d is a damping
factor which can be set between 0 and 1. We usually
set d to 0.85. There are more details about d in the
next section. Also C(A) is defined as the number of
links going out of page A. The PageRank of a page A
is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution
over web pages, so the sum of all Web pages' PageRanks
will be one.
Expressed in simple terms, PageRank causes pages
that link a lot to bubble to the top of the list. If
the pages that link to a candidate page are themselves
linked-to frequently, the candidate bubbles even higher.
PageRank is by no means perfect. In November 2002,
a teacher in a town on Lake Michigan opened a curriculum
guide and gasped when she saw a lesson plan on the
migration of whales in that lake. Of course, there
are no whales in Lake Michigan. A small company in
Utah sells the guide to subscribing teachers. An ill-informed
researcher writing a lesson plan on whales did some
Google searches and found a hoax site published by
a company that gives boat tours on the lake. She swallowed
the whale story whole.
This example underscores the limitations of Google's
link analysis. A Web page with a large number of links
to it is popular for some reason, but popularity does
not equate to authenticity, authoritativeness, accuracy,
or currency. It doesn't even indicate that a source
is believed or trusted people can link to a
source that they distrust or even scorn. In this example,
Google's ranking algorithm took a page popular for
its comical touch and confused a searcher who assumed
that any item ranked high on a Google hit list must
be true.
Another limitation that Searcher readers understand
implicitly is that Google can only rank what it indexes.
Several days after the deadly disease SARS erupted
on the world scene, a Google search for "sars" yields
the South Africa Revenue Service. You have to go to
Google News to find information on the new SARS malady.
Despite all that, Googlenauts not only trust its
formula, they leap to defend it. When I observed in
a public forum that AlltheWeb.com did a better job
than Google if you search for "New York City" because
it automatically treated the search as a phrase, several
people flamed me. "Google would never modify my search
like that!" one wrote. Sorry to break the news, Bubba,
but every search engine "modifies" your search with
its own spin on what defines relevancy.
Tweaking the Rankings
Sometimes formulas used for ranking can have unintended
consequences. The original BCS formula gave credit
for the margin of victory in each game. This makes
sense, doesn't it? If you just squeak by and beat someone,
you're not that much better than they are, right? Conversely,
if you clobber them, you must be heaps better.
But this encouraged dominant teams to run up the
score against clearly inferior teams, just to improve
the BCS scoring. It gave incentive to schedule a game
against the College of East Overshoe, just so you could
clobber them for points. So in 2002, the BCS removed
margin of victory from the formula.
Most fans, coaches, and sports columnists believe
that Oregon lost a chance for a national championship
in 2001 because it won a number of close games. The
human rankers the coaches' poll and the media
poll placed Oregon number 2. The BCS formula
placed it number 4.
Consider the possibility that UM will find it necessary
to tweak its admissions formula if the Supreme
Court doesn't tweak it out of existence. Suppose, for
instance, that minority test scores and high school
grades begin to rise over time. The day could come
when one or more minority groups are represented out
of proportion to the actual population. Or, suppose
scores fall. UM would be under tremendous pressure
to take action even though the university claims
its formula isn't a "quota."
Google isn't immune to the need for tweaking, either.
For example, go to Google and search for:
google september 11
The top item on the list is http://www.firstmonday.dk/issues/issue6_10/wiggins an
article I happened to write about how Google reacted
to 9/11. How did this article make it to the top of
the hit list for this particular search? As it happens,
the article was cited in a number of bibliographic
sites concerning 9/11.
Probably the more important factor is that a number
of influential Web logs also cited the article. Web
logs, or blogs, represent an important phenomenon in
blending traditional media with Web media. Writers
such as Dan Bricklin linked to the article; other blogs
picked up the reference, and suddenly PageRank ranks
this page at the top.
But this is not all good news. Bloggers have learned
that they can "game" the system by organizing a campaign
in which a large number of blogs mention a URL they
wish to boost. This simple technique proved to be an
effective method of "index spamming." Google has tweaked
its algorithm to reduce its effectiveness, but hasn't
solved the problem.
Blogs and other news sources represent a paradox
for Google, allowing content about breaking events
and news to enter the mix and to influence rankings.
In other words, these sources allow Google to grow
and adapt to reflect new stories and new sources of
Web-based information. But blogs can also magnify a
source's relative trustiness merely because other bloggers
find the content worth talking about. Apparently, Google
sees blogs as an opportunity, not a problem: It recently
bought the leading blogging technology company, Pyra
Labs.
Actually, Google tweaks its formula frequently. Many
searchers don't realize that Google launches an entirely
new index once a month. This allows Google to re-sweep
the Web in a way that clears away deadwood and discovers
new content. It also provides Google with an opportunity
to tweak how it ranks pages. It was during one of those
monthly updates that Google tweaked to adjust for blog
campaigns.
From Brin and Page's days at Stanford until today,
when Google overwhelmingly dominates the Web search
engines arena, Google has steadfastly maintained that
the choice of ranking is editorially pure, driven by
what is genuinely most relevant to the customer, not
by advertising dollars or any other consideration.
At the same time, Google's revenues from advertising
are on the rise, even in a market where online advertising
budgets are a fraction of those during the halcyon
days of the Internet bubble. Google even applies its
ranking techniques to its successful Adwords program:
If your ad draws more click-throughs, your ad is more
likely to appear the next time someone does a relevant
search.
Maybe I'm naïve. I trust Google to keep its
hit list pure. But I think it can offer even a better,
more intellectually honest form of ranking. I'd like
to see a user control panel for Google a sort
of graphic equalizer for the search engine. In addition
to a knob for Popularity, there'd be a knob labeled
Authenticity and one labeled Level of User Trust. Throw
in Accuracy and Cynicism and Humorousness and.... Heck,
why not even have a knob labeled Payoff? Let me dial
up that knob if I want to see, á là Overture.com,
the ads of the highest bidders at the top of the hit
list.
But please, let the default setting for that knob
be zero.
Footnote
1 Each of the
components of the BCS formula is itself complicated.
Strength of schedule, for instance, is calculated
according to this formula: SOS = 2/3(OpW)/(OpW+OpL)
+ 1/3(OpOpW)/(OpOpW + OpOpL) where:
OpW is the sum of all opponents'
wins minus the number of losses by the
team in question minus the number of wins
over Div1AA teams
OpL is the sum of all opponents'
losses minus the number of wins by the
team in question
OpOpW is the sum of all opponents'
wins as calculated above
OpOpL is the sum of all opponents
losses as calculated above.
Source: http://www.geocities.com/rtell/FAQ.html. |
|