Negative multinomial distribution

Notation	$\textrm{NM}(k_0,\,p)$
Parameters	k₀ ∈ N₀ — the number of failures before the experiment is stopped, p ∈ R^m — m-vector of “success” probabilities, p₀ = 1 − (p₁+…+p_m) — the probability of a “failure”.
Support	$k_i \in \{0,1,2,\ldots\}, 1\leq i\leq m$
PDF	$\Gamma\!\left(\sum_{i=0}^m{k_i}\right)\frac{p_0^{k_0}}{\Gamma(k_0)} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}},$ where Γ(x) is the Gamma function.
Mean	$\tfrac{k_0}{p_0}\,p$
Variance	$\tfrac{k_0}{p_0^2}\,pp' + \tfrac{k_0}{p_0}\,\operatorname{diag}(p)$
CF	$\bigg(\frac{p_0}{1 - p'e^{it}}\bigg)^{\!k_0}$

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(r, p)) to more than two outcomes.^[1]

Suppose we have an experiment that generates m+1≥2 possible outcomes, {X₀,…,X_m}, each occurring with non-negative probabilities {p₀,…,p_m} respectively. If sampling proceeded until n observations were made, then {X₀,…,X_m} would have been multinomially distributed. However, if the experiment is stopped once X₀ reaches the predetermined value k₀, then the distribution of the m-tuple {X₁,…,X_m} is negative multinomial. These variables are not multinomially distributed because their sum X₁+…+X_m is not fixed, being a draw from a negative binomial distribution.

Negative multinomial distribution example

The table below shows an example of 400 melanoma (skin cancer) patients where the Type and Site of the cancer are recorded for each subject.

Type	Site			Totals
Type	Head and Neck	Trunk	Extremities	Totals
Hutchinson's melanomic freckle	22	2	10	34
Superficial	16	54	115	185
Nodular	19	33	73	125
Indeterminant	11	17	28	56
Column Totals	68	106	226	400

The sites (locations) of the cancer may be independent, but there may be positive dependencies of the type of cancer for a given location (site). For example, localized exposure to radiation implies that elevated level of one type of cancer (at a given location) may indicate higher level of another cancer type at the same location. The Negative Multinomial distribution may be used to model the cancer rates at a given site and help measure some of the cancer type dependencies within each location.

If $x_{{i,j}}$ denote the cancer rates for each site ( $0\leq i \leq 2$ ) and each type of cancer ( $0\leq j \leq 3$ ), for a fixed site ( $i_{0}$ ) the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X has the following distribution:

X=\{X_1, X_2, X_3\} \sim NM(k_0,\{p_1,p_2,p_3\})

Different columns in the table (sites) are considered to be different instances of the random multinomially distributed vector, X. Then we have the following estimates of expected counts (frequencies of cancer):

\hat{\mu}_{i,j} = \frac{x_{i,.}\times x_{.,j}}{x_{.,.}}

x_{i,.} = \sum_{j=0}^{3}{x_{i,j}}

x_{.,j} = \sum_{i=0}^{2}{x_{i,j}}

x_{.,.} = \sum_{i=0}^{2}\sum_{j=0}^{3}{{x_{i,j}}}

Example:

\hat{\mu}_{1,1} = \frac{x_{1,.}\times x_{.,1}}{x_{.,.}}=\frac{34\times 68}{400}=5.78

For the first site (Head and Neck, j=0), suppose that $X=\left \{X_1=5, X_2=1, X_3=5\right \}$ and $X \sim NM(k_0=10, \{p_1=0.2, p_2=0.1, p_3=0.2 \})$ . Then:

p_0 = 1 - \sum_{i=1}^3{p_i}=0.5

NM(X|k_0,\{p_1, p_2, p_3\})= 0.00465585119998784

cov[X_1,X_3] = \frac{10 \times 0.2 \times 0.2}{0.5^2}=1.6

\mu_2=\frac{k_0 p_2}{p_0} = \frac{10\times 0.1}{0.5}=2.0

\mu_3=\frac{k_0 p_3}{p_0} = \frac{10\times 0.2}{0.5}=4.0

corr[X_2,X_3] = \left (\frac{\mu_2 \times \mu_3}{(k_0+\mu_2)(k_0+\mu_3)} \right )^{\frac{1}{2}}

and therefore,

corr[X_2,X_3] = \left (\frac{2 \times 4}{(10+2)(10+4)} \right )^{\frac{1}{2}} = 0.21821789023599242.

Notice that the pair-wise NM correlations are always positive, whereas the correlations between multinomial counts are always negative. As the parameter $k_{0}$ increases, the paired correlations tend to zero! Thus, for large $k_{0}$ , the Negative Multinomial counts $X_{i}$ behave as independent Poisson random variables with respect to their means $\left ( \mu_i= k_0\frac{p_i}{p_0}\right )$ .

The marginal distribution of each of the $X_{i}$ variables is negative binomial, as the $X_{i}$ count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of $X=\{X_1,\cdots,X_m\}$ is negative multinomial, i.e., $X \sim NM(k_0,\{p_1,\cdots,p_m\})$ .

Parameter estimation

Estimation of the mean (expected) frequency counts ( $\mu _{j}$ ) of each outcome ( $X_{j}$ ) using maximum likelihood is possible. If we have a single observation vector $\{x_1, \cdots,x_m\}$ , then $\hat{\mu}_i=x_i.$ If we have several observation vectors, like in this case we have the cancer type frequencies for 3 different sites, then the MLE estimates of the mean counts are $\hat{\mu}_j=\frac{x_{j,.}}{I}$ , where $0\leq j \leq J$ is the cancer-type index and the summation is over the number of observed (sampled) vectors (I). For the cancer data above, we have the following MLE estimates for the expectations for the frequency counts:

Hutchinson's melanomic freckle type of cancer (

X_{0}

) is

\hat{\mu}_0 = 34/3=11.33

Superficial type of cancer (

X_{1}

) is

\hat{\mu}_1 = 185/3=61.67

Nodular type of cancer (

X_{2}

) is

\hat{\mu}_2 = 125/3=41.67

Indeterminant type of cancer (

X_3

) is

\hat{\mu}_3 = 56/3=18.67

There is no MLE estimate for the NM $k_{0}$ parameter.^[1]^[2] However, there are approximate protocols for estimating the $k_{0}$ parameter using the chi-squared goodness of fit statistic. In the usual chi-squared statistic:

\Chi^2 = \sum_i{\frac{(x_i-\mu_i)^2}{\mu_i}}

, we can replace the expected-means (

\mu _{i}

) by their estimates,

\hat{\mu_i}

, and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:

\Chi^2(k_0) = \sum_{i}{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}

Next, we can estimate the

k_{0}

parameter by varying the values of

k_{0}

in the expression

\Chi^2(k_0)

and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above.

DF: The degree of freedom for the Chi-squared distribution in this case is:

df = (# rows – 1)(# columns – 1) = (3-1)*(4-1) = 6

Median: The median of a chi-squared random variable with 6 df is 5.261948.

Mean Counts Estimates: The mean counts estimates (

\mu _{j}

) for the 4 different cancer types are:

\hat{\mu}_1 = 185/3=61.67

;

\hat{\mu}_2 = 125/3=41.67

; and

\hat{\mu}_3 = 56/3=18.67

Thus, we can solve the equation above

\Chi^2(k_0) = 5.261948

for the single variable of interest -- the unknown parameter

k_{0}

. In the cancer example, suppose

x=\{x_1=5,x_2=1,x_3=5\}

. Then, the solution is an asymptotic chi-squared distribution driven estimate of the parameter

k_{0}

\Chi^2(k_0) = \sum_{i=1}^3{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}

\Chi^2(k_0) = \frac{(5-61.67)^2}{61.67(1+61.67/k_0)}+\frac{(1-41.67)^2}{41.67(1+41.67/k_0)}+\frac{(5-18.67)^2}{18.67(1+18.67/k_0)}=5.261948.

Solving this equation for

k_{0}

provides the desired estimate for the last parameter.

Mathematica provides 3 distinct (

k_{0}

) solutions to this equation: {50.5466, -21.5204, 2.40461}. Since

k_{0}>0

there are 2 candidate solutions.

Estimates of probabilities: Assume $k_0=2$ and $\frac{\mu_i}{k_0}p_0=p_i$ , then:

\frac{61.67}{k_0}p_0=31p_0=p_1

20p_0=p_2

9p_0=p_3

Hence,

1-p_0=p_1+p_2+p_3=60p_0

, and

p_0=\frac{1}{61}

p_1=\frac{31}{61}

p_2=\frac{20}{61}

and

p_3=\frac{9}{61}

Therefore, the best model distribution for the observed sample

x=\{x_1=5,x_2=1,x_3=5\}

X \sim NM\left (2, \left \{\frac{31}{61}, \frac{20}{61},\frac{9}{61}\right\} \right ).

Related distributions

Negative binomial distribution
Multinomial distribution
Inverted Dirichlet distribution, a conjugate prior for the negative multinomial

References

1 2 Le Gall, F. The modes of a negative multinomial distribution, Statistics & Probability Letters, Volume 76, Issue 6, 15 March 2006, Pages 619-624, ISSN 0167-7152, 10.1016/j.spl.2005.09.009.
↑ Zelterman, Daniel (2002). Advanced log-linear models using SAS. SAS Publishing. p. 196. ISBN 978-1-59047-080-0.

Waller LA and Zelterman D. (1997). Log-linear modeling with the negative multi- nomial distribution. Biometrics 53: 971-82.

Probability distributions

List

Discrete univariate with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric Poisson binomial Rademacher discrete uniform Zipf Zipf–Mandelbrot

Discrete univariate with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Gauss–Kuzmin geometric logarithmic negative binomial parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous univariate supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular Irwin–Hall Kumaraswamy logit-normal noncentral beta raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle

Continuous univariate supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi-squared chi Dagum Davis exponential-logarithmic Erlang exponential F folded normal Flory–Schulz Fréchet gamma gamma/Gompertz generalized inverse Gaussian Gompertz half-logistic half-normal Hotelling's T-squared hyper-Erlang hyperexponential hypoexponential inverse chi-squared scaled inverse chi-squared inverse Gaussian inverse gamma Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami noncentral chi-squared Pareto phase-type poly-Weibull Rayleigh relativistic Breit–Wigner Rice shifted Gompertz truncated normal type-2 Gumbel Weibull Discrete Weibull Wilks's lambda

Continuous univariate supported on the whole real line	Cauchy exponential power Fisher's z Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric Laplace logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel Tracy–Widom variance-gamma Voigt

Continuous univariate with support whose type varies	generalized extreme value generalized Pareto Tukey lambda q-Gaussian q-exponential q-Weibull shifted log-logistic

Mixed continuous-discrete univariate	rectified Gaussian

Multivariate (joint)	Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet generalized Dirichlet multivariate normal multivariate stable multivariate t normal-inverse-gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart

Directional	Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped asymmetric Laplace wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham

Degenerate and singular	Degenerate Dirac delta function Singular Cantor

Families	Circular compound Poisson elliptical exponential natural exponential location-scale maximum entropy mixture Pearson Tweedie wrapped

This article is issued from Wikipedia - version of the 10/10/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Negative multinomial distribution

Negative multinomial distribution example

Parameter estimation

Related distributions

References

Further reading