Bradley–Terry model

The Bradley–Terry model is a probability model that can predict the outcome of a comparison. Given a pair of individuals $i$ and $j$ drawn from some population, it estimates the probability that the pairwise comparison $i > j$ turns out true, as

P(i>j)={\frac {p_{i}}{p_{i}+p_{j}}}

where $p i$ is a positive real-valued score assigned to individual $i$ . The comparison $i > j$ can be read as " $i$ is preferred to $j$ ", " $i$ ranks higher than $j$ ", or " $i$ beats $j$ ", depending on the application.

For example, $p i$ may represent the skill of a team in a sports tournament, estimated from the number of times $i$ has won a match. $P(i>j)$ then represents the probability that $i$ will win a match against $j$ .^[1]^[2] Another example used to explain the model's purpose is that of scoring products in a certain category by quality. While it's hard for a person to draft a direct ranking of (many) brands of wine, it may be feasible to compare a sample of pairs of wines and say, for each pair, which one is better. The Bradley–Terry model can then be used to derive a full ranking.^[2]

History and applications

The model is named after R. A. Bradley and M. E. Terry,^[3] who presented it in 1952,^[4] although it had already been studied by Zermelo in the 1920s.^[1]^[5]^[6]

Real-world applications of the model include estimation of the influence of statistical journals, or ranking documents by relevance in machine-learned search engines.^[7] In the latter application, $P(i>j)$ may reflect that document $i$ is more relevant to the user's query than document $j$ , so it should be displayed earlier in the results list. The individual $p i$ then express the relevance of the document, and can be estimated from the frequency with which users click particular "hits" when presented with a result list.^[8]

Definition

The Bradley–Terry model can be parametrized in various ways. One way to do so is to pick a single parameter per observation, leading to a model of $n$ parameters $p 1, ..., p n$ .^[9] Another variant, in fact the version considered by Bradley and Terry,^[2] uses exponential score functions $p_{i}=e^{{\beta _{i}}}$ so that

P(i>j)={\frac {e^{{\beta _{i}}}}{e^{{\beta _{i}}}+e^{{\beta _{j}}}}}

or, using the logit (and disallowing ties),^[1]

\operatorname {logit} (P(i>j))=\log \left({\frac {P(i>j)}{1-P(i>j)}}\right)=\log \left({\frac {P(i>j)}{P(j>i)}}\right)=\beta _{i}-\beta _{j}

reducing the model to logistic regression on pairs of individuals.

Estimating the parameters

The following algorithm computes the parameters $p i$ of the basic version of the model from a sample of observations. Formally, it computes a maximum likelihood estimate, i.e., it maximizes the likelihood of the observed data. The algorithm dates back to the work of Zermelo.^[1]

The observations required are the outcomes of previous comparisons, for example, pairs $(i, j)$ where $i$ beats $j$ . Summarizing these outcomes as $w ij$ , the number of times $i$ has beaten $j$ , we obtain the log-likelihood of the parameter vector $p = p 1, ..., p n$ as^[1]

L({\mathbf {p}})=\sum _{i}^{n}\sum _{j}^{n}w_{{ij}}\ln p_{i}-w_{{ij}}\ln(p_{i}+p_{j}).

Denote the number of comparisons "won" by $i$ as $W i$ , and the number of comparisons made between $i$ and $j$ as $N ij$ . Starting from an arbitrary vector $p$ , the algorithm iteratively performs the update

p'_{i}=W_{i}\left(\sum _{{j\neq i}}{\frac {N_{{ij}}}{p_{i}+p_{j}}}\right)^{{-1}}

for all $i$ . After computing all of the new parameters, they should be renormalized,

p_{i}\leftarrow {\frac {p'_{i}}{\sum _{j}^{n}p'_{j}}}.

This estimation procedure improves the log-likelihood in every iteration, and eventually converges to a unique maximum.

References

1 2 3 4 5 Hunter, David R. (2004). "MM algorithms for generalized Bradley–Terry models". The Annals of Statistics. 32 (1): 384–406. doi:10.2307/3448514. JSTOR 3448514.
1 2 3 Agresti, Alan (2014). Categorical Data Analysis. John Wiley & Sons. pp. 436–439.
↑ E.E.M. van Berkum. "Bradley-Terry model". Encyclopedia of Mathematics. Retrieved 18 November 2014.
↑ Bradley, Ralph Allan; Terry, Milton E. (1952). "Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons". Biometrika. 39 (3/4): 324. doi:10.2307/2334029. JSTOR 2334029.
↑ Zermelo, Ernst (1929). "Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung". Mathematische Zeitschrift. 29 (1): 436–460. doi:10.1007/BF01180541.
↑ Heinz-Dieter Ebbinghaus (2007), Ernst Zermelo: An Approach to His Life and Work, pp. 268–269, ISBN 9783540495536
↑ Szummer, Martin; Yilmaz, Emine (2011). Semi-supervised learning to rank with preference regularization (PDF). CIKM.
↑ Radlinski, Filip; Joachims, Thorsten (2007). Active Exploration for Learning Rankings from Clickthrough Data (PDF). KDD '07 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 570–579. doi:10.1145/1281192.1281254.
↑ Fangzhao Wu; Jun Xu; Hang Li; Xin Jiang (2014). Ranking Optimization with Constraints. CIKM '14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. pp. 1049–1058. doi:10.1145/2661829.2661895.

This article is issued from Wikipedia - version of the 11/18/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Bradley–Terry model

History and applications

Definition

Estimating the parameters

See also

References