Moment generating functions, and their close relatives (probability generating
functions and characteristic functions) provide an alternative way of representing
a probability distribution by means of a certain function of a single
variable.
These functions turn out to be useful in many different ways:
(a) They provide an easy way of calculating the moments of a distribution.
(b) They provide some powerful tools for addressing certain counting and com binatorial problems.
(c) They provide an easy way of characterizing the distribution of the sum of independent random variables.
(d) They provide tools for dealing with the distribution of the sum of a random number of independent random variables.
(e) They play a central role in the study of branching processes.
(f) They play a key role in large deviations theory, that is, in studying the asymptotics of tail probabilities of the form P(X ≥ c), when c is a large number.
(g) They provide a bridge between complex analysis and probability, so that complex analysis methods can be brought to bear on probability problems.
(h) They provide powerful tools for proving limit theorems, such as laws of large numbers and the central limit theorem.
(a) They provide an easy way of calculating the moments of a distribution.
(b) They provide some powerful tools for addressing certain counting and com binatorial problems.
(c) They provide an easy way of characterizing the distribution of the sum of independent random variables.
(d) They provide tools for dealing with the distribution of the sum of a random number of independent random variables.
(e) They play a central role in the study of branching processes.
(f) They play a key role in large deviations theory, that is, in studying the asymptotics of tail probabilities of the form P(X ≥ c), when c is a large number.
(g) They provide a bridge between complex analysis and probability, so that complex analysis methods can be brought to bear on probability problems.
(h) They provide powerful tools for proving limit theorems, such as laws of large numbers and the central limit theorem.
MOMENT GENERATING FUNCTIONS
Definition
The moment generating function associated with a random
variable X is a function
MX( : R → [0,∞] defined by '
MX((s) = E[esX].
The domain DX of MX is defined as the set DX = {s | MX(s) < ∞}.
If X is a discrete random variable, with PMF pX, then
MX(s) = ∑ esXpx(x)
If X is a continuous random variable with PDF fx, then
MX(s) =∫esXfx(x) dx.
Note
that this is essentially the same as the definition of the Laplace transform of a function fx, except that we are using s instead of −s in the exponent.
If X is a discrete random variable, with PMF pX, then
MX(s) = ∑ esXpx(x)
If X is a continuous random variable with PDF fx, then
MX(s) =∫esXfx(x) dx.
Note
that this is essentially the same as the definition of the Laplace transform of a function fx, except that we are using s instead of −s in the exponent.
The domain of the moment generating function
Note that 0 ∈ DX, because MX(0) = E[e0X] = 1. For a discrete random
variable that takes only a finite number of different values, we have DX = R.
For example, if X takes the values 1 , 2 and 3, with probabilities 1/2, 1/3, and
1/6, respectively, then
MX(s) = 1/2 * es + 1/3 * e2s + 1/6 * e3s, '................................ (1)
which is finite for every s ∈ R. On the other hand, for the Cauchy distribution,
fX(x) = 1/(Ï€(1 + x2)), for all x,
it is easily seen that MX(s) = ∞, for all
s = 0.
In general, DX is an interval (possibly infinite or semi-infinite) that contains
zero.
Exercise 1.
Suppose that MX(s) < ∞ for some s > 0. Show that MX(t) < ∞ for all
t ∈ [0, s]. Similarly, suppose that MX(s) < ∞ for some s < 0. Show that MX(t) < ∞
for all t ∈ [s, 0]
Exercise 2.
Suppose that
lim x→∞log P(X > x)⁄x ≜ −ν < 0. x→∞ x
Establish that MX(s) < ∞ for all s ∈ [0, ν).
Inversion of transforms
By inspection of the formula for MX(s) in Eq. (1), it is clear that the distribution
of X is readily determined. The various powers esX indicate the possible
values of the random variable X, and the associated coefficients provide the
corresponding probabilities.
At the other extreme, if we are told that MX(s) = ∞ for every s = 0, this
is certainly not enough information to determine the distribution of X.
On this subject, there is the following fundamental result. It is intimately
related to the inversion properties of Laplace transforms. Its proof requires sophisticated
analytical machinery and is omitted.
Theorem 1.
Inversion theorem
(a) Suppose that MX(s) is finite for all s in an interval of the form [−a, a],
where a is a positive number. Then, MX determines uniquely the CDF of
the random variable X.
(b) If MX(s) = MY (s) < ∞, for all s ∈ [−a, a], where a is a positive number,
then the random variables X and Y have the same CDF.
There are explicit formulas that allow us to recover the PMF or PDF of a random
variable starting from the associated transform, but they are quite difficult
to use (e.g., involving “contour integrals”). In practice, transforms are usually
inverted by “pattern matching,” based on tables of known distribution-transform
pairs.
Moment generating properties
There is a reason why MX is called a moment generating function. Let us consider
the derivatives of MX at zero. Assuming for a moment we can interchange
the order of integration and differentiation, we obtain
dMX(s)/ds |s=0 = d E[esX] /ds|s=0= E[XesX] |s=0= E[X],
dmMX(s)/dsm|s=0 = dm E[esX] /dsm|s=0 = E[Xm esX] |s=0= E[Xm],
Thus, knowledge of the transform MX allows for an easy calculation of the
moments of a random variable X.
Justifying the interchange of the expectation and the differentiation does
require some work. The steps are outlined in the following exercise. For simplicity,
we restrict to the case of non negative random variables.
Exercise 3.
Suppose that X is a non negative random variable and that MX(s) < ∞
for all s ∈ (−∞, a], where a is a positive number.
(a) Show that E[Xk] < ∞, for every k.
(b) Show that E[Xk esk] < ∞, for every s < a.
(c) Show that (ehX − 1)/h ≤ XehX.
(d) Use the DCT to argue that
E[ E[X] = E [lim h↓0 (ehX − 1)/h] = lim h↓0 (E[ehX] -1) / h
The probability generating function
For discrete random variables, the following probability generating function
is sometimes useful. It is defined by
gx(s) = E[sX],
with s usually restricted to positive values. It is of course closely related to the moment generating function in that, for s > 0, we have gx(s) = MX(log s). One difference is that when X is a positive random variable, we can define gx(s), as well as its derivatives, for s = 0. So, suppose that X has a PMF px(m), for m = 1, . . ..
Then, gx(s) = ∑(m→1,∞) smpx(m),
resulting in
dm/dsm gx(s)| s=0 = m! px(m).
(The interchange of the summation and the differentiation needs justification, but is indeed legitimate for small s.) Thus, we can use gx to easily recover the PMF px, when X is a positive integer random variable.
(b) If X and Y are independent, then MX+Y (s) = MX(s)MY (s).
(c) Let X and Y be independent random variables. Let Z be equal to X, with probability p, and equal to Y , with probability 1 − p. Then, MZ(s) = pMX(s) + (1 − p)MY (s).
For part (b), we have MX+Y (s) = E[exp(sX + sY )] = E[exp(sX)]E[exp(sY )] = MX(s)MY (s).
For part (c), by conditioning on the random choice between X and Y , we have MZ(s) = E[esZ] = pE[esX] + (1 − p)E[esY ] = pMX(s) + (1 − p)MY (s).
(a) Let X be a standard normal random variable, and let Y = σX +µ, which we know
to have a N(µ, σ2) distribution. We then find that MY (s) = exp(sµ + 1
2 s2σ2).
(b) Let X = d N(µ1, σ1 2) and Y = N(µ2, σ2 2). Then,
MX+Y (s)= exp( s(µ1 + µ2) + 2 s2(σ1 2 + σ2 2) .
Using the inversion property of transforms, we conclude that X + Y = d N(µ1 + µ2, σ1 2 + σ2 2), thus corroborating a result we first obtained using convolutions.
gx(s) = E[sX],
with s usually restricted to positive values. It is of course closely related to the moment generating function in that, for s > 0, we have gx(s) = MX(log s). One difference is that when X is a positive random variable, we can define gx(s), as well as its derivatives, for s = 0. So, suppose that X has a PMF px(m), for m = 1, . . ..
Then, gx(s) = ∑(m→1,∞) smpx(m),
resulting in
dm/dsm gx(s)| s=0 = m! px(m).
(The interchange of the summation and the differentiation needs justification, but is indeed legitimate for small s.) Thus, we can use gx to easily recover the PMF px, when X is a positive integer random variable.
Properties of moment generating functions
We record some useful properties of moment generating functions.Theorem 2.
(a) If Y = aX + b, then MY (s) = esbMX(as).(b) If X and Y are independent, then MX+Y (s) = MX(s)MY (s).
(c) Let X and Y be independent random variables. Let Z be equal to X, with probability p, and equal to Y , with probability 1 − p. Then, MZ(s) = pMX(s) + (1 − p)MY (s).
Proof:
For part (a), we have MX(aX + b) = E[exp(saX + sb)] = exp(sb)E[exp(saX)] = exp(sb)MX(as).For part (b), we have MX+Y (s) = E[exp(sX + sY )] = E[exp(sX)]E[exp(sY )] = MX(s)MY (s).
For part (c), by conditioning on the random choice between X and Y , we have MZ(s) = E[esZ] = pE[esX] + (1 − p)E[esY ] = pMX(s) + (1 − p)MY (s).
Example : (Normal random variables)
(b) Let X = d N(µ1, σ1 2) and Y = N(µ2, σ2 2). Then,
MX+Y (s)= exp( s(µ1 + µ2) + 2 s2(σ1 2 + σ2 2) .
Using the inversion property of transforms, we conclude that X + Y = d N(µ1 + µ2, σ1 2 + σ2 2), thus corroborating a result we first obtained using convolutions.
No comments:
Post a Comment