Survival function

The survival function is the probability that a patient, device, or other object of interest will survive beyond a specified time.^[1]

The survival function is also known as the survivor function^[2] or reliability function.^[3]

The term reliability function is common in engineering while the term survival function is used in a broader range of applications, including human mortality. Another name for the survival function is the complementary cumulative distribution function.

Definition

Let T be a continuous random variable with cumulative distribution function F(t) on the interval [0,∞). Its survival function or reliability function is:

S(t)=P(\{T>t\})=\int _{t}^{\infty }f(u)\,du=1-F(t).

Parametric survival functions

In some cases, such as the air conditioner example, the distribution of survival times may be approximated well by a function such as the exponential distribution. Several distributions are commonly used in survival analysis, including the exponential, Weibull, gamma, normal, log-normal, and log-logistic.^[3]^[4] These distributions are defined by parameters. The normal (Gaussian) distribution, for example, is defined by the two parameters mean and standard deviation. Survival functions that are defined by parameters are said to be parametric.

In the four survival function graphs shown above, the shape of the survival function is defined by a particular probability distribution: survival function 1 is defined by an exponential distribution, 2 is defined by a Weibull distribution, 3 is defined by a log-logistic distribution, and 4 is defined by another Weibull distribution.

Exponential survival function

For an exponential survival distribution, the probability of failure is the same in every time interval, no matter the age of the individual or device. This fact leads to the "memoryless" property of the exponential survival distribution: the age of a subject has no effect on the probability of failure in the next time interval. The exponential may be a good model for the lifetime of a system where parts are replaced as they fail.^[5] It may also be useful for modeling survival of living organisms over short intervals. It is not likely to be a good model of the complete lifespan of a living organism.^[6] As Efron and Hastie ^[7] (p. 134) note, "If human lifetimes were exponential there wouldn't be old or young people, just lucky or unlucky ones".

Weibull survival function

A key assumption of the exponential survival function is that the hazard rate is constant. In an example given above, the proportion of men dying each year was constant at 10%, meaning that the hazard rate was constant. The assumption of constant hazard may not be appropriate. For example, among most living organisms, the risk of death is greater in old age than in middle age – that is, the hazard rate increases with time. For some diseases, such as breast cancer, the risk of recurrence is lower after 5 years – that is, the hazard rate decreases with time. The Weibull distribution extends the exponential distribution to allow constant, increasing, or decreasing hazard rates.

Other parametric survival functions

There are several other parametric survival functions that may provide a better fit to a particular data set, including normal, lognormal, log-logistic, and gamma. The choice of parametric distribution for a particular application can be made using graphical methods or using formal tests of fit. These distributions and tests are described in textbooks on survival analysis.^[1]^[3] Lawless ^[8] has extensive coverage of parametric models.

Parametric survival functions are commonly used in manufacturing applications, in part because they enable estimation of the survival function beyond the observation period. However, appropriate use of parametric functions requires that data are well modeled by the chosen distribution. If an appropriate distribution is not available, or cannot be specified before a clinical trial or experiment, then non-parametric survival functions offer a useful alternative.

Non-parametric survival functions

A parametric model of survival may not be possible or desirable. In these situations, the most common method to model the survival function is the non-parametric Kaplan–Meier estimator.

Properties

Every survival function S(t) is monotonically decreasing, i.e. $S(u)\leq S(t)$ for all $u>t$ .

It is a property of a random variable that maps a set of events, usually associated with mortality or failure of some system, onto time.

The time, t = 0, represents some origin, typically the beginning of a study or the start of operation of some system. S(0) is commonly unity but can be less to represent the probability that the system fails immediately upon operation.

Since the CDF is a right-continuous function, the survival function $S(t)=1-F(t)$ is also right-continuous.

References

1 2 Kleinbaum, David G.; Klein, Mitchel (2012), Survival analysis: A Self-learning text (Third ed.), Springer, ISBN 978-1441966452
↑ Tableman, Mara; Kim, Jong Sung (2003), Survival Analysis Using S (First ed.), Chapman and Hall/CRC, ISBN 978-1584884088
1 2 3 Ebeling, Charles (2010), An Introduction to Reliability and Maintainability Engineering (Second ed.), Waveland Press, ISBN 978-1577666257
↑ Klein, John; Moeschberger, Melvin (2005), Survival Analysis: Techniques for Censored and Truncated Data (Second ed.), Springer, ISBN 978-0387953991
↑ Mendenhall, William; Terry, Sincich (2007), Statistics for Engineering and the Sciences (Fifth ed.), Pearson / Prentice Hall, ISBN 978-0131877061
↑ Brostrom, Göran (2012), Event History Analysis with R (First ed.), Chapman & Hall/CRC, ISBN 978-1439831649
↑ Efron, Bradley; Hastie, Trevor (2016), Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (First ed.), Cambridge University Press, ISBN 978-1107149892
↑ Lawless, Jerald (2002), Statistical Models and Methods for Lifetime Data (Second ed.), Wiley, ISBN 978-0471372158

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 11/17/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.