Advances and Applications in Statistics
Volume 3, Issue 3, Pages 199 - 216
(December 2003)
|
|
DOWNWEIGHTING TIGHTLY KNIT COMMUNITIES IN WORLD WIDE WEB RANKINGS
Gareth O. Roberts (U. K.) and Jeffrey S. Rosenthal (Canada)
|
Abstract: We
propose two new algorithms for using World
Wide Web link structures to determine
authority values of web pages from search
queries. Both algorithms postulate an
underlying latent cluster structure, in an
effort to avoid the Tightly Knit Community (TKC)
effect which can occur in the Kleinberg and
SALSA algorithms. The first algorithm,
Similarity-Downweighting (SD), weights
outlinks inversely with apparent cluster size.
The second algorithm, Sequential Clustering
(SC), first generates an underlying cluster
structure consistent with the observed links,
and then uses an empirical Bayes approach to
compute authority values. We present
experiments indicating that both algorithms do
a fairly good job of selecting authoritative
web pages for a given query, given only the
link structure of the Base Set of pages, while
largely avoiding the TKC effect. We also
consider a fully Bayesian approach, but find
that it is too sensitive to prior information
to be useful at this point. |
Keywords and phrases: tightly knit communities, link analysis, web searching, hubs, authorities, SALSA, Kleinberg’s algorithm, clusters, Bayesian. |
|
Number of Downloads: 366 | Number of Views: 984 |
|