V-statistic

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947.^[1] V-statistics are closely related to U-statistics^[2]^[3] (U for “unbiased”) introduced by Wassily Hoeffding in 1948.^[4] A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals $T(F_{n})$ of the empirical distribution function $(F_{n})$ are called statistical functions.^[5] Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.^[1]

Examples of statistical functions

The k-th central moment is the functional $T(F)=\int (x-\mu )^{k}\,dF(x)$ , where $\mu =E[X]$ is the expected value of X. The associated statistical function is the sample k-th central moment,

$T_{n}=m_{k}=T(F_{n})={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\overline {x}})^{k}.$
The chi-squared goodness-of-fit statistic is a statistical function T(F_n), corresponding to the statistical functional

$T(F)=\sum _{i=1}^{k}{\frac {(\int _{A_{i}}\,dF-p_{i})^{2}}{p_{i}}},$

where A_i are the k cells and p_i are the specified probabilities of the cells under the null hypothesis.
The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional

$T(F)=\int (F(x)-F_{0}(x))^{2}\,w(x;F_{0})\,dF_{0}(x),$

where w(x; F₀) is a specified weight function and F₀ is a specified null distribution. If w is the identity function then T(F_n) is the well known Cramér–von-Mises goodness-of-fit statistic; if $w(x;F_{0})=[F_{0}(x)(1-F_{0}(x))]^{-1}$ then T(F_n) is the Anderson–Darling statistic.

Representation as a V-statistic

Suppose x₁, ..., x_n is a sample. In typical applications the statistical function has a representation as the V-statistic

V_{mn}={\frac {1}{n^{m}}}\sum _{i_{1}=1}^{n}\cdots \sum _{i_{m}=1}^{n}h(x_{i_{1}},x_{i_{2}},\dots ,x_{i_{m}}),

where h is a symmetric kernel function. Serfling^[6] discusses how to find the kernel in practice. V_mn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x₁, ..., x_n, the corresponding V-statistic is defined

V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}h(x_{i},x_{j}).

Example of a V-statistic

An example of a degree-2 V-statistic is the second central moment m₂.

If h(x, y) = (x − y)²/2, the corresponding V-statistic is

$V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}{\frac {1}{2}}(x_{i}-x_{j})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2},$

which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:

$s^{2}={n \choose 2}^{-1}\sum _{i<j}{\frac {1}{2}}(x_{i}-x_{j})^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}$ .

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above.^[1] Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics.^[7] Let A(m) be the property defined by:

A(m):

Var(h(X₁, ..., X_k)) = 0 for k < m, and Var(h(X₁, ..., X_k)) > 0 for k = m;
n^m/2R_mn tends to zero (in probability). (R_mn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(F_n) is asymptotically normal.

In the variance example (4), m₂ is asymptotically normal with mean $\sigma ^{2}$ and variance $(\mu _{4}-\sigma ^{4})/n$ , where $\mu _{4}=E(X-E(X))^{4}$ .

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and $E[h^{2}(X_{1},X_{2})]<\infty ,\,E|h(X_{1},X_{1})|<\infty ,$ and $E[h(x,X_{1})]\equiv 0$ . Then nV_2,n converges in distribution to a weighted sum of independent chi-squared variables:

nV_{2,n}{\stackrel {d}{\longrightarrow }}\sum _{k=1}^{\infty }\lambda _{k}Z_{k}^{2},

where $Z_{k}$ are independent standard normal variables and $\lambda _{k}$ are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V_2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional^[1] (Example 3) is an example of a degenerate kernel V-statistic.^[8]

Notes

1 2 3 4 von Mises (1947)
↑ Lee (1990)
↑ Koroljuk & Borovskich (1994)
↑ Hoeffding (1948)
↑ von Mises (1947), p. 309; Serfling (1980), p. 210.
↑ Serfling (1980, Section 6.5)
↑ Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)
↑ See Lee (1990, p. 160) for the kernel function.

References

Hoeffding, W. (1948). "A class of statistics with asymptotically normal distribution". Annals of Mathematical Statistics. 19 (3): 293–325. doi:10.1214/aoms/1177730196. JSTOR 2235637.
Koroljuk, V.S.; Borovskich, Yu.V. (1994). Theory of U-statistics (English translation by P.V.Malyshev and D.V.Malyshev from the 1989 Ukrainian ed.). Dordrecht: Kluwer Academic Publishers. ISBN 0-7923-2608-3.
Lee, A.J. (1990). U-Statistics: theory and practice. New York: Marcel Dekker, Inc. ISBN 0-8247-8253-4.
Neuhaus, G. (1977). "Functional limit theorems for U-statistics in the degenerate case". Journal of Multivariate Analysis. 7 (3): 424–439. doi:10.1016/0047-259X(77)90083-5.
Rosenblatt, M. (1952). "Limit theorems associated with variants of the von Mises statistic". Annals of Mathematical Statistics. 23 (4): 617–623. doi:10.1214/aoms/1177729341. JSTOR 2236587.
Serfling, R.J. (1980). Approximation theorems of mathematical statistics. New York: John Wiley & Sons. ISBN 0-471-02403-1.
Taylor, R.L.; Daffer, P.Z.; Patterson, R.F. (1985). Limit theorems for sums of exchangeable random variables. New Jersey: Rowman and Allanheld.
von Mises, R. (1947). "On the asymptotic distribution of differentiable statistical functions". Annals of Mathematical Statistics. 18 (2): 309–348. doi:10.1214/aoms/1177730385. JSTOR 2235734.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 11/25/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.