Ranking (information retrieval)

Ranking of query results is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query $q$ and a collection $D$ of documents that match the query, the problem is to rank, that is, sort, the documents in $D$ according to some criterion so that the "best" results appear early in the result list displayed to the user. Classically, ranking criteria are phrased in terms of relevance of documents with respect to an information need expressed in the query.

Ranking is often reduced to the computation of numeric scores on query/document pairs; a baseline score function for this purpose is the cosine similarity between tf–idf vectors representing the query and the document in a vector space model,^[1] BM25 scores, or probabilities in a probabilistic IR model. A ranking can then be computed by sorting documents by descending score. An alternative approach is to define a score function on pairs of documents $d ₁, d ₂$ that is positive if and only if $d ₁$ is more relevant to the query than $d ₂$ and using this information to sort.

Ranking functions are evaluated by a variety of means; one of the simplest is determining the precision of the first k top-ranked results for some fixed k; for example, the proportion of the top 10 results that are relevant, on average over many queries.

Frequently, computation of ranking functions can be simplified by taking advantage of the observation that only the relative order of scores matters, not their absolute value; hence terms or factors that are independent of the document may be removed, and terms or factors that are independent of the query may be precomputed and stored with the document.

References

↑ Computing vector scores.

This article is issued from Wikipedia - version of the 3/21/2014. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Ranking (information retrieval)

See also

References