Conditional Probability

Let us reconsider the difference between choosing an item at random from a lot with or without replacement.The lot under consideration had the following makeup: 80 non defective and 20 defective items. Suppose that we choose two items from this lot,
(a) with replacement;
(b) without replacement.
We define the following two events
A = {the first item is defective} , B = {the second item is defective}.
If we are choosing with replacement, P(A) = P(B) = 20/100 = 1/5. For each time we choose from the lot there are 20 defective items among the total of 100. However, if we are choosing without replacement, the results are not quite immediate. It is still true, of course, that P(A) = 1/5. But what about P(B)? It is clear that in order to compute P(B) we should know the composition of the lot at the time the second item is chosen. That is, we should know whether A did or did not occur. This example indicates the need to introduce the following important concept.
Let A and B be two events associated with an experiment ε. We denote by P(B | A) the conditional probability of the event B, given that A has occurred. In the above example, P(B | A) = 19/99. For, if A has occurred, then on the second drawing there are only 99 items left, 19 of which are defective. Whenever we compute P(B| A) we are essentially computing P(B) with respect to the reduced sample space A, rather than with respect to the original sample space S. Consider the Venn diagram in When we evaluate P(B) we are asking ourselves how probable it is that we shall be in B, knowing that we must be in S. And when we compute P(B | A) we are asking ourselves how probable it is that we shall be in B, knowing that we must be in A. (That is, the sample space has been reduced from S to A.)
Shortly we shall make a formal definition of P(B | A). For the moment, however, we shall proceed with . our intuitive notion of conditional probability and consider an example.

EXAMPLE 1

Two fair dice are tossed, the outcome being recorded as (xi, x2), where Xi is the outcome of the ith die, i = l, 2. Hence the sample space S may be represented by the following array of 36 equally likely outcomes.
(1 , 1) (1 , 2) ............ (1 , 6)
(2 , 1) (2 , 2) ........... (2 , 6)
S={ . . . }
. . .
(6 , 1) (6 , 2) .......... (6 , 6)

Consider the following two events:

A={(x_{1 ,}x₂)|x₁+ x₂= 10} B={(x_{1 ,}x₂)|x₁> x₂}

Thus A = {(5, 5), (4, 6), (6, 4)}, and B = {(2, l), (3, l), (3, 2), . . . , (6, 5)}. Hence P(A) = 3/36 and P(B) =15/36 And P(B | A) = 1/3, since the sample space now consists of A (that is three outcomes), and only one of these three outcomes is consistent with the event B. In a similar way we may compute P(A | B) = 1/15. Finally, let us compute P(A ∩ B). The event A  ∩ B occurs if and only if the sum of the two dice is l 0 and the first die shows a larger value than the second die. There is only one such outcome, and hence P(A ∩ B) = :1/36. If we take a long careful look at the various numbers we have computed above, we note the following:
P(A | B) = P(A  ∩ B) / P(B) and P(B | A) = P(A ∩ B) / P(A)
These relationships did not just happen to arise in the particular example we considered. Rather, they are quite general and give us a means of formally defining conditional probability.
To motivate this definition let us return to the concept of relative frequency. Suppose that an experiment 8 has be!n repeated n times. Let n_A, n_B, and n_A∩B be the number of times the events A, B, and A  ∩ B, respectively, have occurred among the n repetitions. What is the meaning of  n_{A∩B /}n_A? It represents the relative frequency of B among those outcomes in which A occurred. That is, n_{A∩B /}n_A is the conditional relative frequency of B, given that A occurred. We may write n_{A∩B /}n_A  as follows:

n_A∩B / n_A = (n_A∩B / n) / (n_A / n) = f_A∩B / f_A

where f_A∩B and f_A are the relative frequencies of the events A ∩ B and A, respectively. As we have already indicated (and as we shall show later), if n, the number of repetitions is large, f_A∩Bwill be close to P(A ∩ B) and f_A will be close to P(A). Hence the above relation suggests that  n_A∩B / n_A  will be close to P(B|A). Thus we make the following formal definition.

Definition

P(B | A) = P(A ∩ B) ' P(A) provided that P(A) > 0. (3.1)

Notes:

(a) It is important to realize that the above is not a theorem (we did not prove anything), nor is it an axiom. We have simply introduced the intuitive notion of conditional probability and then made the formal definition of what we mean by this notion. The fact that our formal definition corresponds to our intuitive notion is substantiated by the paragraph preceding the definition.
(b) It is a simple matter to verify that P(B | A), for fixed A, satisfies the various postulates of probability That is, we have
(1) 0 ≤ P(B | A) ≤ 1,
(2) P(S | A) = 1,
(3) P(B₁ ∪ B₂ | A) = P(B₁ | A) + P(B₂ | A) if B₁∩ B₂ = ∅, ................................(2)
(4) P(B₁ ∪ B₂ ∪ · · · | A) = P(B₁ | A) + P(B₂| A) + · · · if B_i∩ B_j= ∅ for i ≠j.
(c) If A = S, P(B|S) = P(B ∩ S)/P(S) = P(B).
(d) With every event B ⊂ S we can associate two numbers, P(B), the (unconditional) probability of B, and P(B | A), the conditional probability of B, given that some event A (for which P(A) > 0) has occurred. In general, these two probability measures will assign different probabilities to the event B, as the preceding examples indicated. Shortly we shall study an important special case for which P(B) and P(B | A) are the same.
(e) Observe that the conditional probability is defined in terms of the unconditional probability measure P. That is, if we know P(B) for every B ⊂ S, we can compute P(B | A) for every B ⊂ S. Thus we have two ways of computing the conditional probability P(B | A):
(a) Directly, by considering the probability of B with respect to the reduced sample space A.
(b) Using the above definition, where P(A ∩ B) and P(A) are computed with respect to the original sample space S. Note: If A = S, we obtain P(B | S) = P(B ∩ S)/ P(S) = P(B), since P(S) = 1 and B ∩ S = B. This is as it should be, for saying that S has occurred is only saying that the experiment has been performed.

EXAMPLE 2.

Suppose that an office has 100 calculating machines. Some of these machines are electric (£) while others are manual (M). And some of the machines are new (N) while others are used ( U). Table 3.1 gives the number of machines in each category. A person enters the office, picks a machine at random,

		E	M
N	40	30	70
U	20	10	30
	60	40	100

and discovers that it is new. What is the probability that it is electric? In terms of the notation introduced we wish to compute P(E | N). Simply considering the reduced sample space N (e.g., the 70 new machines), we have P(E | N) =40/70 =4/7 Using the definition of conditional probability, we have that

P(E | N) = P(E ∩ N) / P(N) = 40/100 / 70/100 = 4 / 7

The most important consequence of the above definition of conditional probability is obtained by writing it in the following form:

P(A ∩ B) = P(B | A)P(A)

or, equivalently,

P(A∩ B) = P(A | B)P(B).

This is sometimes known as the multiplication theorem of probability. We may apply this theorem to compute the probability of the simultaneous occurrence of two events A and B.

EXAMPLE 3

Consider again the lot consisting of 20 defective and 80 non defective items discussed at the beginning of Section 3.1. If we choose two items at random, without replacement, what is the probability that both items are defective? As before, we define the events A and B as follows:
A = {the first item is defective} , B = {the second item is defective}.
Hence we require P(A ∩ B), which we may compute, according to the above formula, as P(B / A)P(A). But P(B | A) = 19/99 while P(A) = 1/5 Hence

P(A ∩ B) =19/495

Note: The above multiplication theorem (3.3a) may be generalized to more than two events in the following way: P[A₁ ∩ A₂ ∩ ... ∩ A,,] = P(A₁)P(A₂ | A₁)P(A₃ | A₁, A₂) ... P(A_n | A₁, ... , A_n-1).
Let us consider for a moment whether we can make a general statement about the relative magnitude of P(A | B) and P(A). We shall consider four cases, which are illustrated by the Venn diagrams . We have
(a) P(A | B) = 0  ≤ P(A), since A cannot occur if B has occurred.
(b) P(A | B) = P(A ∩ B)/P(B) = [P(A)/P(B)] ≥ P(A), since 0  ≤ P(B)  ≤  1.
(c) P(A | B) = P(A ∩ B)/P(B) = P(B)/P(B) = 1 ≥ P(A).
(d) In this case we cannot make any statement about the relative magnitude of P(A | B) and P(A).

Note that in two of the above cases, P(A) ≤  P(A | B), in one case, P(A) ≥ P(A | B), and in the fourth case, we cannot make any comparison at all. Above, we used the concept of conditional probability in order to evaluate the probability of the simultaneous occurrence of two events. We can apply this concept in another way to compute the probability of a single event A. We need the following definition.

Definition

We say that the events B₁. B₂, •.. , B_k represent a partition of the sample space S if
(a) B_i∩ B_j = ∅ for all i ≠j.
(b) \[\bigcup\limits_{i=1}^{k}{{{B}_{i}}}\]= S.
(c) P(B_i) > 0 for all i
In words: When the experiment ε is performed one and only one of the events B; occurs. (For example,for the tossing of a die B₁ = {l,2}, B₂ = {3,4,5},and B₃ = {6} would represent a partition of the sample space, while C₁ = {I, 2, 3, 4} and C₂ = {4, 5, 6} would not.)
Let A be some event with respect to S and let B₁, B₂, .•. , B_k be a partition of S. The Venn diagram illustrates this for k = 8. Hence we may

write
A = A ∩ B₁ ∪ A ∩ B₂ ∪ ... ∪ A ∩ B_k.
Of course, some of the sets A ∩ Bi may be empty, but this does not invalidate the above decomposition of A. The important point is that all the events A ∩ B₁. ... , A ∩ B_k are pairwise mutually exclusive. Hence we may apply the addition property for mutually exclusive events and write

P(A) = P(A ∩ B₁) + P(A ∩ B₂) + ...................... P(A ∩ B_k)

However each term P(A ∩ B_i) may be expressed as P(A | B_j)P(Bj) and hence we obtain what is called the theorem on total probability:

P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂) + .......................+ P(A | B_k)P(B_k)

This result represents an extremely useful relationship. For often when P(A) is required, it may be difficult to compute it directly. However, with the additional information that B_j has occurred, we may be able to evaluate P(A | Bj) and then use the above formula.

EXAMPLE 4

Consider (for the last time) the lot of 20 defective and 80 non defective items from which· we choose two items without replacement. Again defining A and B as
A ={the first chosen item is defective},
B = {the second chosen item is defective},
we may now compute P(B) as follows:
P(B) = P(B | A)P(A) + P(B | \[\bar{A}\])P(\[\bar{A}\]).
Using some of the calculations performed in Example 3, we find that
P(B) =(19/99)*(1/5). + (29/99)*(4/5) = 1/5.
This result may be a bit startling, particularly if the reader recalls that at the beginning we found that P(B) =1/5 when we chose the items with replacement. ,

EXAMPLE 5.

A certain item is manufactured by three factories, say 1, 2, and 3. It is known that 1 turns out twice as many items as 2, and that 2 and 3 turn out the same number of items (during a specified production period). It is also known that 2 percent of the items produced by 1 and by 2 are defective, while 4 percent of those manufactured by 3 are defective. All the items produced are put into one stockpile, and then one item is chosen at random. What is the probability that this item is defective?
Let us introduce the following events:
A = {the 'item is defective},
B₂ = {the item came from 2},
B₁ = {the item came from l},
B₃ = {the item came from 3}.
We require P(A) and, using the above result, we may write
P(A) = P(A | B₁)P(B₁) + P(A | B_z)P(B₂) + P(A | B₃)P(B₃).
Now P(B₁) = 1/2. while P(B₂) = P(B₃) = 1/4. Also P(A | B₁) = P(A | B₂) =0.02, while P(A | B₃) = 0.04. Inserting these values into the above expression, we obtain P(A) = 0.025.

Note:

The following analogy to the theorem on total probability has been observed in chemistry: Suppose that we have k beakers containing different solutions of the same salt, totaling, say one liter. Let P(B_i) be the volume of the ith beaker and let P(A | B_i) be the concentration of the solution in the ith beaker. If we combine all the solutions into one beaker and let P(A) denote the concentration of the resulting solution, we obtain,

Breaking

NayiPathshala

Total Pageviews

Search Here

1/22/2018

Conditional Probability