Cointegration

Cointegration is a statistical property of a collection (X1,X2,...,Xk) of time series variables. First, all of the series must be integrated of order 1 (see Order of Integration). Next, if a linear combination of this collection is integrated of order zero, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order 1, and there exist coefficients a,b,c such that aX+bY+cZ is integrated of order 0, then X,Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends — either deterministic or stochastic. In an influential paper, Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends — these are also called unit root processes, or processes integrated of order 1 — I(1).^[1] They also showed that unit root processes have non-standard statistical properties, so that conventional econometric theory methods do not apply to them.

Introduction

If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower order of integration, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated (I(1)) but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them. For instance, a stock market index and the price of its associated futures contract move through time, each roughly following a random walk. Testing the hypothesis that there is a statistically significant connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series.

History

The first to introduce and analyse the concept of spurious — or nonsense — correlations was Udne Yule in 1926.^[2] Before the 1980s many economists used linear regressions on (de-trended) non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation,^[3]^[4] since standard detrending techniques can result in data that are still non-stationary.^[5] Granger's 1987 paper with Robert Engle formalized the cointegrating vector approach, and coined the term.^[6]

For integrated I(1) processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with I(1) trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are I(1) series on both sides of the regression relationship, then it's possible for regressions to give misleading results.

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).^[3] The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.

For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high R-squared relationship (suggesting high explanatory power on Fiji's consumption from Afghanistan's GNP). This is called spurious regression. To be more mathematically precise, two integrated I(1) series which are statistically independent may nonetheless show a significant correlation; this phenomenon is called spurious correlation.

Tests

The three main methods for testing for cointegration are:

Engle–Granger two-step method

If $x_{t}$ and $y_{t}$ are non-stationary and cointegrated, then a linear combination of them must be stationary. In other words:

y_t - \beta x_t = u_t \,

where $u_{t}$ is stationary.

If we knew $u_{t}$ , we could just test it for stationarity with something like a Dickey–Fuller test, Phillips–Perron test and be done. But because we don't know $u_{t}$ , we must estimate this first, generally by using ordinary least squares, and then run our stationarity test on the estimated $u_{t}$ series, often denoted $\hat{u}_t$ .

A second regression is then run on the first differenced variables from the first regression, and the lagged residuals ${\hat {u}}_{{t-1}}$ is included as a regressor.

Johansen test

The Johansen test is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).^[7]^[8]

Phillips–Ouliaris cointegration test

Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration.^[9] Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.

Multicointegration

In practice, cointegration is often used for two I(1) series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.

Variable shifts in long time series

Tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people’s preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown structural break,^[10] and tests for cointegration with two unknown breaks are also available.^[11]

References

↑ Nelson, C. R.; Plosser, C. R. (1982). "Trends and random walks in macroeconmic time series". Journal of Monetary Economics. 10 (2): 139. doi:10.1016/0304-3932(82)90012-5.
↑ Yule, U. (1926). "Why do we sometimes get nonsense-correlations between time series? - A study in sampling and the nature of time series". Journal of the Royal Statistical Society. 89 (1): 11–63.
1 2 Granger, C.; Newbold, P. (1974). "Spurious Regressions in Econometrics". Journal of Econometrics. 2 (2): 111–120. doi:10.1016/0304-4076(74)90034-7.
↑ Mahdavi Damghani, Babak; et al. (2012). "The Misleading Value of Measured Correlation". Wilmott. 2012 (1): 64–73. doi:10.1002/wilm.10167.
↑ Granger, Clive (1981). "Some Properties of Time Series Data and Their Use in Econometric Model Specification". Journal of Econometrics. 16 (1): 121–130. doi:10.1016/0304-4076(81)90079-8.
↑ Engle, Robert F.; Granger, Clive W. J. (1987). "Co-integration and error correction: Representation, estimation and testing". Econometrica. 55 (2): 251–276. JSTOR 1913236.
↑ Giles, David. "ARDL Models - Part II - Bounds Tests". Retrieved 4 August 2014.
↑ Pesaran, M.H.; Shin, Y.; Smith, R.J. (2001). "Bounds testing approaches to the analysis of level relationships". Journal of Applied Econometrics. 16 (3): 289–326. doi:10.1002/jae.616.
↑ Phillips, P. C. B.; Ouliaris, S. (1990). "Asymptotic Properties of Residual Based Tests for Cointegration". Econometrica. 58 (1): 165–193. JSTOR 2938339.
↑ Gregory, Allan W.; Hansen, Bruce E. (1996). "Residual-based tests for cointegration in models with regime shifts". Journal of Econometrics. 70 (1): 99–126. doi:10.1016/0304-4076(69)41685-7.
↑ Hatemi-J, A. (2008). "Tests for cointegration with two unknown regime shifts with an application to financial market integration". Empirical Economics. 35 (3): 497–505. doi:10.1007/s00181-007-0175-9.

Enders, Walter (2004). "Cointegration and Error-Correction Models". Applied Econometrics Time Series (Second ed.). New York: Wiley. pp. 319–386. ISBN 0-471-23065-0.
Hayashi, Fumio (2000). Econometrics. Princeton University Press. pp. 623–669. ISBN 0-691-01018-8.
Maddala, G. S.; Kim, In-Moo (1998). Unit Roots, Cointegration, and Structural Change. Cambridge University Press. pp. 155–248. ISBN 0-521-58782-4.
Murray, Michael P. (1994). "A Drunk and her Dog: An Illustration of Cointegration and Error Correction" (PDF). The American Statistician. 48 (1): 37–39. doi:10.1080/00031305.1994.10476017. An intuitive introduction to cointegration.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

Authority control	GND: 4347470-6

This article is issued from Wikipedia - version of the 10/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.