Friday, October 24, 2014

von Neumann the negative binomial distribution has two fitting parameters, as compared to the single parameter of the Poisson distribution. Two parameters is a lot. The famous mathematician John von Neumann is reported to have said, "'With four parameters I can fit an elephant and with five I can make him wiggle his trunk." So on von Neumann's scale, the negative binomial is half an elephant!

Lecture 5 —Friday, January 20, 2006

What was covered?

  • Negative binomial distribution as used by ecologists
  • Relationship of Poisson to negative binomial
  • Nonhomogeneous Poisson process

Terminology Defined

Negative Binomial Distribution in Ecology

  • Last time we derived the following formula for the negative binomial probability mass function.
where x is the number of failures before the rth success. I previously remarked that this is not the version that is usually seen in ecology. I derive that version next.
  • The ecological definition of the negative binomial is essentially a reparameterization of the definition we have here.
    • Step 1: The first step in the reparameterization is to express p in terms of the mean μ and use this expression to replace p. Last time I gave the formula for the mean of a negative binomial random variable. Solve for p.
From which it immediately follows that
    Plugging these two expressions into the expression for the probability mass function above yields the following.
    • Step 2: This step is purely cosmetic. Replace the symbol r. There is no universal convention as to what symbol should be used as the replacement. Venables and Ripley (2002) use θ. Krebs (1999) uses k. SAS makes the substitution . I will use the symbol θ.
    • Step 3: Write the binomial coefficient using factorials.
    • Step 4: Rewrite the factorials using gamma functions. This step requires a little bit of explanation.
Gamma Function
  • The gamma function is defined as follows.
Although the integrand contains two variables, x and α, x is the variable of integration and will disappear once the integral is evaluated. So the gamma function is solely a function of α.
  • The integral defining the gamma function is called an improper integral because infinity appears as an endpoint of integration. It's defined in the following way.
It turns out this limit is defined for all .
  • Let's calculate the integral for various choices of α. Start with .
  • Now if , but still an integer, the integral in the gamma function will be a polynomial times an exponential function. The standard approach for integrating such integrands is to use integration by parts. Integration by parts is essentially a reduction of order technique—after a finite number of steps the degree of the polynomial is reduced to 0 and the integral that remains to be computed is the same one we calculated for (but it is multiplied by a number of constants).
  • After one round of integration by parts is applied to the gamma function we obtain the following.
where in the last step I recognize that the integral is just the gamma function in which α has been replaced by . This is an example of a recurrence relation; it allows us to calculate one term in a sequence using the value of a previous term. We can use this recurrence relation to build up a catalog of values for the gamma function.
    • So when α is a positive integer, the gamma function is just the factorial function. But is defined for all positive α. For example, it can be shown that
and then using our recurrence relation we can evaluate others, such as
  • Step 4 (continued): So using the gamma function we can rewrite the negative binomial probability mass function as follows.
J. Wel's elephants
where I've chosen to leave x! alone just to remind us that x is the value whose probability we are computing.
  • So what's been accomplished in all this? It would seem not very much, but that's not true. The formula we're left with bears little resemblance to the one with which we started. In particular, all reference to r, the number of successes, has been lost having been replaced by the symbol θ. Having come this far, ecologists then take the next logical step. Since the gamma function does not require integer arguments, why not let θ be any positive number? And so θ is treated solely as a fitting parameter, it's original meaning having been lost (but see below).
    • Engineers sometimes follow the convention of reserving the term "negative binomial distribution" for only the first parameterization we've described, the one in which the parameter r takes on only positive integer values. In contrast they refer to the ecologist's parameterization with the positive continuous parameter θ as the Polya distribution.
    • As if this were not confusing enough the engineer's "true" negative binomial distribution is sometimes called the Pascal distribution. Thus in this approach the two parameterizations we've described are called the Pascal and Polya distributions respectively, and the term negative binomial distribution is not used at all.
  • Thus what we're left with is a pure, two-parameter distribution, i.e., , where the only restriction on μ and θ is that they be positive.
  • With this last change, the original interpretation of the negative binomial distribution has more or less been lost and it is best perhaps to think of the negative binomial as a probability distribution that can be flexibly fit to discrete data.
    • The flexibility arises from the fact that the negative binomial distribution has two fitting parameters, as compared to the single parameter of the Poisson distribution. Two parameters is a lot. The famous mathematician John von Neumann is reported to have said, "'With four parameters I can fit an elephant and with five I can make him wiggle his trunk." So on von Neumann's scale, the negative binomial is half an elephant!
    • In 1975 J. Wel decided to test von Neumann's assertion. He found that he needed 30 parameters to fit an elephant. Here's a scan of the drawing he used for the elephant and the results for various models he tried. (Wel 1975 taken from Burnham & Anderson 2002, p. 30.)

A Connection Between the Poisson and the Negative Binomial Distributions

  • It turns out that the Poisson is a special case of the negative binomial distribution. To see this we take the negative binomial probability mass function and explore what happens as θ is allowed to become infinite. I begin by first rewriting the negative binomial probability mass function so that its limiting behavior becomes more apparent. The steps used here should resemble what we did in deriving the probability mass function for a Poisson random variable in Lecture 4.
  • We're now ready to consider the limit of this last expression as .
  1. Since x and μ are fixed numbers it follows that terms of the form .
  2. since this term does not depend on θ.
  3. a result we've used before.
  • Thus we see
which we recognize as the probability mass function of a Poisson random variable. Thus a Poisson random variable is a special case of a negative binomial random variable when θ is allowed to become infinite. This is further evidence of the flexibility of the negative binomial distribution since there are infinitely many other choices for θ that yield something other than a Poisson distribution.
  • So in a sense θ is a measure of deviation from a Poisson distribution. For that reason θ is sometimes called the inverse index of aggregation (Krebs 1999)—inverse because small values of θ correspond to more clumping than is typically seen in the Poisson distribution. It is also called the size parameter (documentation for R), but most commonly of all, it is called the dispersion parameter (or overdispersion parameter).

The Variance of the Negative Binomial Distribution in Terms of μ and θ

  • I next express the variance of the negative binomial distribution in terms of the ecologist's parameterization.
  • Observe that the variance is quadratic in the mean. Since , this represents a parabola opening up that crosses the μ-axis at the origin and at the point .
  • θ controls how fast the parabola climbs. As , , and we have the variance of a Poisson random variable. For large θ, the parabola is very flat while for small θ the parabola is narrow. Thus θ can be used to describe a whole range of heteroscedastic behavior.
  • Note: In the parameterization of the negative binomial distribution used by SAS, . Thus the Poisson distribution corresponds to α = 0 and values of α > 0 correspond to overdispersion.

Why "Negative" Binomial?

  • The "negative" in negative binomial can be explained in two ways.
    • Explanation 1: Negative Binomial Theorem
      • Because the negative binomial is a probability distribution, if we sum over all possible values we get 1, i.e.,
Make the substitution q = 1 – p. Then this formula becomes
The last line states what's called the negative binomial theorem. It is the analog of the ordinary binomial theorem which for positive integer values of r is the following.
    • Explanation 2: Negative binomial coefficients
      • It turns out there is a way of defining binomial coefficients so that negative numbers are allowed. In particular, using this definition, it can be shown that
      • Using this fact the formula for the negative binomial probability mass function can be written as follows.

Ecological Interpretations of the Negative Binomial Distribution

  • The extreme flexibility of the negative binomial in fitting heteroscedastic discrete data would be enough to recommend it but it turns out that it can also be motivated on purely ecological grounds. Recall that two of the assumptions of the homogeneous Poisson process, homogeneity and independence, are unlikely to hold for most ecological data. It turns out that if either one of these assumptions is relaxed, then under certain circumstances the distribution that we observe, rather than being Poisson, turns out to be negative binomial. I next try to make this connection more precise.
  • In a homogeneous Poisson process the rate constant is constant for all observational units. In a nonhomogeneous Poisson process, the rate constant is allowed to vary according to some distribution. Given a particular realization from this distribution, say , the resulting random variable X will have a Poisson distribution with .
  • We can express this formally using the notion of conditional probability. We write
  • The fact that has a distribution is a bit of a nuisance, because what we want is the unconditional (or marginal) probability . But if we knew what the distribution of was, we could obtain this marginal probability as follows. Recall from the definition of conditional probability that
  • If our interest is in we can find it by summing out B in the joint distribution.
or for continuous distributions by integration.
  • Thus for the nonhomogeneous Poisson process, if we knew what the distribution of was, say, we could calculate as follows.
We'll pursue this argument further next time.

Cited References

  • Burnham, Kenneth P. and David R. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag: New York.
  • Krebs, Charles J. 1999. Ecological Methodology. Addison Wesley Longman: Menlo Park, CA
  • Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S, 4th edition. Springer-Verlag: New York.
  • Wel, J. 1975. Least squares fitting of an elephant. Chemtech February 128–129.

Jack Weiss
Phone: (919) 962-5930
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2006
Last Revised--August 8, 2008