Tversky index

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of Dice's coefficient and Tanimoto coefficient.

For sets X and Y the Tversky index is a number between 0 and 1 given by

,

Here, denotes the relative complement of Y in X.

Further, are parameters of the Tversky index. Setting produces the Tanimoto coefficient; setting produces Dice's coefficient.

If we consider X to be the prototype and Y to be the variant, then corresponds to the weight of the prototype and corresponds to the weight of the variant. Tversky measures with are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions [3] .

,

,

,

This formulation also re-arranges parameters and . Thus, controls the balance between and in the denominator. Similarly, controls the effect of the symmetric difference versus in the denominator.

Notes

  1. Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Reviews. 84 (4): 327–352.
  2. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
  3. Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.
This article is issued from Wikipedia - version of the 8/26/2013. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.