Chapter 17

Probability: Axioms and Fundaments

Chapter 13, The Meaning of Probability: Theories of probability, discussed how the mathematical theory of probability is connected to the world through philosophical theories of probability. Chapter 14, Set Theory: The Language of Probability, reviewed the basic tool needed to discuss probability mathematically, Set Theory. This chapter introduces the mathematical theory of probability, in which probability is a function that assigns numbers between 0 and 100% to events, subsets of outcome space. Starting with just three axioms and a few definitions, the mathematical theory develops powerful and beautiful consequences. The chapter presents the axioms of probability and some consequences of the axioms. Conditional probability is then defined, which leads to two useful formulae—the Multiplication Rule and Bayes' Rule—and to the definition of independence. All these concepts and formulae play important roles in the sequel.

The Axioms of Probability

The axioms of probability are mathematical rules that probability must satisfy. Let A and B be events. Let P(A) denote the probability of the event A. The axioms of probability are these three conditions on the function P:

  1. The probability of every event is at least zero. (For every event A, P(A) ≥ 0. There is no such thing as a negative probability.)
  2. The probability of the entire outcome space is 100%. (P(S) = 100%. The chance that something in the outcome space occurs is 100%, because the outcome space contains every possible outcome.)
  3. If two events are disjoint, the probability that either of the events happens is the sum of the probabilities that each happens. (If AB = {}, P(A ∪ B) = P(A) + P(B).)

In place of axiom 3, the following axiom sometimes is used:

3') If {A1, A2, A3, … } is a partition of the set A, then P(A) = P(A1) + P(A2) + P(A3) + …

Axiom 3' is more restrictive than axiom 3.

Both axiom 3 and axiom 3' hold for every probability function used in this book. Any function P that assigns numbers to subsets of the outcome space S and satisfies the Axioms of Probability is called a probability distribution on S.

Example 17-1: The Uniform Probability Distribution.

Let S be a set containing n>0 elements, for example, S= {1, 2, … , n}. For any subset A of S, define #A to be the number of elements of A. For example, #{} = 0, #{1, 2} = 2, and #{n, n−1, n−2} = 3. The function # is called the cardinality function and #A is called the cardinality of A.

The cardinality of a finite set is the number of elements it contains, so in this example, where S = {1, 2, 3, … , n}, #S = n.

Let P(A) = #A/n, the number of elements in the subset A, divided by the total number of elements in S. Then the function P is called the uniform probability distribution on S. The function P satisfies the axioms of probability. Let us see why.

  1. The number of elements in any subset A of S is at least zero (#A≥0), so P(A) ≥ 0/n = 0. Thus P satisfies Axiom 1.

  2. P(S) = #S/n = n/n = 100%. Thus P satisfies Axiom 2.

  3. If A and B are disjoint, then the number of elements in the union A∪B is the number of elements in A plus the number of elements in B:

    #(A∪B) = #A + #B.

    Therefore,

    P(A∪B) = #(A∪B)/n = (#A + #B)/n = #A/n + #B/n = P(A) + P(B).

    Thus P satisfies Axiom 3.

We shall use the uniform probability distribution very often. For example, we shall use the uniform probability distribution on the outcome space S = {0, 1} to model the number of heads in a single toss of a fair coin. We shall use the uniform probability distribution on the outcome space S = {1, 2, … , 6} to model the number of spots that show on the top face of a fair die when it is rolled. We shall use the uniform probability distribution on the outcome space S of the 36 pairs

{(i, j): i = 1, 2, … , 6 and j = 1, 2, … , 6}

to model rolls of a fair pair of dice. We shall use the uniform probability distribution on the outcome space S of all 52! permutations of a deck of cards to model shuffling the deck well. We shall use the uniform probability distribution to model drawing a ticket from a well-stirred box of numbered tickets; in that case, the outcome space S is the collection of numbers written on the tickets (including duplicates as often as they occur on the tickets). The uniform probability distribution is the same as the distribution postulated by the Theory of Equally Likely Outcomes (if the outcomes are defined suitably).

Example 17-2: The Probability Distribution for a Single Trial.

Consider a random trial that can result in failure or success. Let 0 stand for failure, and let 1 stand for success. Then we can consider the outcome space to be S = {0, 1}. For any number p between 0 and 100%, define the function P as follows:

Then P is a probability distribution on S, as we can verify by checking that it satisfies the axioms:

  1. Because p is between 0 and 100%, so is 100% − p. The outcome space S has but four subsets: {}, {0}, {1}, and {0, 1}. The values assigned to them by P are 0, 1 − p, p, and 100%, respectively. All these numbers are at zero or larger, so P satisfies Axiom 1.

  2. By definition, P(S) = 100%, so P satisfies Axiom 2.

  3. The empty set and any other set are disjoint, and it is easy to see that

    P({}∪A) = P({}) + P(A) for any subset A of S.

    The only other pair of disjoint events in S is {0} and {1}. We can calculate

    P({0}∪{1}) = P(S) = 100% = (100% − p) + p = P({0}) + P({1}).

    Thus P satisfies Axiom 3.

In later chapters this probability distribution will be the building block for more complex distributions involving sequences of trials.

Consequences of the Axioms of Probability

Everything that is mathematically true of probability is a consequence of the Axioms of Probability, and of further definitions.

For example, if S is countable—that is, if its elements can be put into 1:1 correspondence with a subset of the integers—the sum of the probabilities of the elements of S must be 100%. This follows from Axioms 2 and 3': Axiom 3' tells us that because the elements of S partition S, the probability of S is the sum of the probabilities of the elements of S. Axiom 2 tells us that that sum must be 100%.

The Complement Rule

Another consequence of the axioms is the Complement Rule: The probability that an event occurs is always equal to 100% minus the probability that the event does not occur:

P(Ac) = 100% − P(A).

The Complement Rule is extremely useful, because in many problems it is much easier to calculate the probability that A does not occur than to calculate the probability that A does occur. The complement rule can be derived from the axioms: the union of A and its complement Ac is S (either A happens or it does not, and there is no other possibility), so

P(A∪Ac) = P(S) = 100%,

by axiom 2. The event A and its complement are disjoint (if "A does not happen" happens, A does not happen; if A happens, "A does not happen" does not happen), so

P(A∪Ac) = P(A) + P(Ac)

by axiom 3. Putting these together, we get

P(A) + P(Ac) = 100%.

Subtracting P(A) from both sides of this equation yields what we sought:

P(Ac) = 100%-P(A).

Example 17-3: An Application of the Complement Rule.

Consider tossing a fair coin 10 times in such a manner that every sequence of 10 heads and/or tails is equally likely. What is the probability that the coin lands heads at least once?

This would be quite difficult to calculate directly, because there are very many ways in which the coin can land heads at least once. However, there is only one way the coin can fail to land heads at least once: All the tosses must yield tails. That makes it easy to calculate the probability that the coin lands heads at least once, using the Complement Rule.

Every sequence of heads and tails is equally likely, by assumption: The probability distribution is the uniform distribution on sequences of 10 heads and/or tails, so the probability of any particular sequence is 100%/(total number of sequences). By the Fundamental Rule of Counting, there are

2×2× … ×2 = 210 = 1,024

sequences of 10 heads and tails.

One of those sequences is (tails, tails, … , tails), so the probability that the coin lands tails in all 10 tosses is

100%/210 = 0.0977%.

By the complement rule, the probability that the coin lands heads at least once is therefore

100% − 0.0977% = 99.902%.

A special case of the Complement Rule is that the probability of the empty set is always zero (P({}) = 0%), because P(S) = 100%, and Sc= {}.

An event A whose probability is 100% is said to be certain or sure. S is certain.

The Probability of the Union of Two Events

The third Axiom of Probability tells us how to find the probability of a union of disjoint events in terms of their individual probabilities. The Axioms can be used together to find a formula for the probability of a union of two events that are not necessarily disjoint in terms of the probability of each of the events and the probability of their intersection.

The union of two events, A∪B, can be partitioned into three disjoint sets:

Together, these three disjoint sets contain every element of A∪B:

A∪B = ABc ∪ AcB ∪ AB.

That is, the three sets partition A∪B. The third axiom implies that the chance that either A or B occurs is

P(A∪B) = P(ABc) + P(AcB) + P(AB).

On the other hand,

P(A) = P(ABc ∪ AB) = P(ABc) + P(AB),

because ABc and AB are disjoint. Similarly,

P(B) = P(AcB ∪ AB) = P(AcB) + P(AB),

because AcB and AB are disjoint. Adding, we find that

P(A) + P(B) = P(ABc) + P(AcB) +2×P(AB).

This would be equal to P(A∪B), but for the fact that P(AB) is counted twice, not once. It follows that in general

P(A∪B) = P(A) + P(B) − P(AB).

This is a true statement, but it is not one of the axioms of probability. In the special case that AB = {}, this result is equivalent to the third axiom, because P({}) = 0%.

Bounds on Probabilities

It follows from the fact that P(A∪B) = P(A) + P(B) − P(AB) that

P(A∪B) ≤ P(A) + P(B),

because Axiom 1 guarantees that P(AB) ≥ 0. Moreover, taking a union cannot exclude any outcomes already present, so P(A ∪B) ≥ P(A). And taking an intersection cannot include additional outcomes, so P(AB) ≤ P(A).

Thus

0 ≤ P(AB) ≤ P(A) ≤ P(A∪B) ≤ P(A) + P(B).

More generally, if {A1, A2, A3, … } is a countable collection of events, then

0 ≤ P(A1A2 A3 …) ≤ P(Ak) ≤ P(A1 ∪ A2 ∪ A3 ∪ …) ≤ P(A1) + P(A2) + P(A3) + … . , for k = 1, 2, 3, … .

Useful Consequences of the Axioms of Probability

Probability is analogous to area or volume or mass. Consider the unit square, each of whose sides has length 1. Its total area is 1×1 = 1 = 100%. Let's call the square S, just like outcome space. Now consider regions inside the square S (subsets of S). The area of any such region is at least zero, the area of S is 100%, and the area of the union of two regions is the sum of their areas, if they do not overlap (i.e., if they are disjoint). These facts are direct analogues of the axioms of probability, and we shall often use this model to get intuition about probability.

It might help your intuition to consider the square S to be a dartboard. The experiment consists of throwing a dart at the board once. The event A occurs if the dart sticks in the set A. The event AB occurs if the dart sticks in both A and B on that one toss. Clearly, AB cannot occur unless A and B overlap—the dart cannot stick in two places at once. A∪B occurs if the dart sticks in either A or B (or both) on that one throw. A and B need not overlap for A∪B to occur.

This analogy is also useful for thinking about the connection between Set Theory and logical implication. If A is a subset of B, the occurrence of A implies the occurrence of B; We shall sometimes say that A implies B. In the dartboard model, the dart cannot stick in A without sticking in B as well, so if A occurs, B must occur also. If A implies B, AB=A, so P(AB)=P(A). If AB = {}, A implies Bc and B implies Ac: If the dart sticks in A it did not stick in B, and vice versa. If A implies B, then if B does not occur A cannot occur either: Bc implies Ac, so Bc is a subset of Ac.

The following exercises test your understanding of the axioms of probability and their consequences.

Videos of Exercises

(Reminder: Examples and exercises may vary when the page is reloaded; the video shows only one version.)

Exercise 17-1

Consider two events, A and B. Suppose P(A) = 79% and P(B) = 68%. The chance that either A or B occurs is (select all that apply) check answer to exercise 1

Exercise 17-2

Consider dealing 5 cards from a well shuffled deck. Let A be the event that the first card is an Ace, and let B be the event that the second card is an Ace. The chance that A∪B occurs is check answer to exercise 2

Exercise 17-3

Consider two events, A and B. Suppose every outcome in S has probability greater than zero, and that P(A) = 55%, P(B) = 96%, and P(AB) = 55%. Select the true statement. check answer to exercise 3

Exercise 17-4

Consider three events, A, B, and C. Suppose P(A) = 56%, P(B) = 71%, and P(C) = 86%. Then P(ABC) check answer to exercise 4

Exercise 17-5

Consider three events, A, B, and C. Suppose P(A) = 29%, P(B) = 56%, and P(C) = 70%. Then P(A∪B∪C) check answer to exercise 5

Conditioning

In probability, conditioning means incorporating new restrictions on the outcome of an experiment: updating probabilities to take into account new information. This section describes conditioning, and how conditional probability can be used to solve complicated problems.

Conditional Probability

The conditional probability of A given B, P(A | B), is the probability of the event A, updated on the basis of the knowledge that the event B occurred. Suppose that AB = {} (A and B are disjoint). Then if we learn that B occurred, we know A did not occur, so we should revise the probability of A to be zero (the conditional probability of A given B is zero). On the other hand, suppose that AB = B (B is a subset of A, so B implies A). Then if we learn that B occurred, we know A must have occurred as well, so we should revise the probability of A to be 100% (the conditional probability of A given B is 100%). For in-between cases, the conditional probability of A given B is defined to be

P(AB)
P(A | B) = ------------ ,
P(B)

provided P(B) is not zero (division by zero is undefined). "P(A | B)" is pronounced "the (conditional) probability of A given B."

Why does this formula make sense? First of all, note that it does agree with the intuitive answers we found above: if AB = {}, then P(AB) = 0, so

P(A | B) = 0/P(B) = 0;

and if AB = B,

P(A | B) = P(B)/P(B) = 100%.

Similarly, if we learned that S occurred, this is not really new information (by definition, S always occurs, because it contains all possible outcomes), so we would like P(A | S) to equal P(A). That is how it works out: A<S = A, so

P(A | S) = P(A)/P(S) = P(A)/100% = P(A).

Now suppose that A and B are not disjoint. Then if we learn that B occurred, we can restrict attention to just those outcomes that are in B, and disregard the rest of S, so we have a new outcome space that is just B. We need P(B) = 100% to consider B an outcome space; we can make this happen by dividing all probabilities by P(B). For A to have occurred in addition to B requires that AB occurred, so the conditional probability of A given B is P(AB)/P(B), just as we defined it above.

Example 17-4: Conditional Probability in Card Shuffling.

We shall deal two cards from a well shuffled deck. What is the conditional probability that the second card is an Ace (event A), given that the first card is an Ace (event B)?

Solution. By definition, this is P(AB)/P(B). The (unconditional) chance that the first card is an Ace is 100%/13 = 7.7%, because there are 13 possible faces for the first card, and all are equally likely (this is what we mean by a well well-shuffled deck).

The chance that both cards are Aces can be computed as follows: From the four suits, we need to pick two; there are 4C2 = 6 ways that can happen. The total number of ways of picking two cards from the deck is 52C2 = 52×51/2 = 1326, so the chance that the two cards are both Aces is (6/1326)×100% = 0.5%. The conditional probability that the second card is an Ace given that the first card is an Ace is thus 0.5%/7.7% = 5.9%. As we might expect, it is somewhat lower than the chance that the first card is an Ace, because we know one of the Aces is gone.

We could approach this more intuitively as well: Given that the first card is an Ace, the second card is an Ace too if it is one of the three remaining Aces among the 51 remaining cards. These possibilities are equally likely if the deck was shuffled well, so the chance is 3/51 × 100% = 5.9%.

Conditional probability behaves just like probability: It satisfies the axioms of probability and all their consequences. Thus, for example,

Independence

Two events are independent if learning that one occurred gives us no information about whether the other occurred. That is, A and B are independent if P(A | B) = P(A) and P(B | A) = P(B). A slightly more general way to write this is that A and B are independent if P(AB) = P(A)×P(B). (This covers the cases that P(A), P(B) or both are equal to zero, while the definition of independence in terms of conditional probability requires the probability in the denominator to be different from zero.) To reiterate: Two events are independent if and only if the probability that both events happen simultaneously is the product of their unconditional probabilities. If two events are not independent, they are dependent.

Independence and Mutual Exclusivity Are Different! In fact, the only way two events can be both mutually exclusive and independent is if at least one of them has probability equal to zero. If A and B are mutually exclusive, learning that B happened tells us that A did not happen. This is clearly informative: The conditional probability of A given B is zero! This changes the (conditional) probability of A unless its (unconditional) probability was zero.

Independent events bear a special relationship to each other. Independence is a very precise point between being disjoint (so that the occurrence of one event implies that the other did not occur), and one event being a subset of the other (so that the occurrence of one event implies the occurrence of the other). Here is a summary of the contrast between independent events and mutually exclusive events:

Figure 17-1 contains a Venn diagram that represents two events, A and B, as subsets of a rectangle S. The probabilities of the events are proportional to their areas. Initially, the probability of A is 30% and the probability of B is 20%. The figure also shows the probability of AB and of A∪B. Try to make A and B independent by dragging them to make the area of their intersection equal to the product of their areas, so that P(AB) = P(A)×P(B) = 30%×20% = 6%. It is hard to get just the right amount of overlap: Independence is a very special relationship between events.

Figure 17-1: Venn Diagram to Illustrate Independence.

S
A
B

If A and B are independent, so are

What kinds of events are (generally assumed to be) independent? The outcomes of successive fair tosses of a fair coin, the outcomes of random draws from a box with replacement, etc. Draws without replacement are dependent, because what can happen on a given draw depends on what happens on previous draws. The next two examples illustrate the contrast between independent and dependent events.

Example 17-5: Drawing at Random from a Box of Tickets.

Suppose I have a box with four tickets in it, labeled 1, 2, 3, and 4. I stir the tickets and then draw one from the box, stir the remaining tickets again without returning the ticket I drew the first time, and draw another ticket. Consider the event A = {I get the ticket labeled 1 on the first draw} and the event B = {I get the ticket labeled 2 on the second draw}. Are A and B dependent or independent?

Solution: The chance that I get the 1 on the first draw is 25%. The chance that I get the 2 on the second draw is 25%. The chance that I get the 2 on the second draw given that I get the 1 on the first draw is 33%, which is much larger than the unconditional chance that I draw the 2 the second time. Thus A and B are dependent.

Now suppose that I replace the ticket I got on the first draw and stir the tickets again before drawing the second time. Then the chance that I get the 1 on the first draw is 25%, the chance that I get the 2 on the second draw is 25%, and the conditional chance that I get the 2 on the second draw given that I drew the 1 the first time is also 25%. A and B are thus independent if I draw with replacement.

 

Example 17-6: Rolling a Fair Pair of Dice.

Two fair dice are rolled independently; one is blue, the other is red. What is the chance that the number of spots that show on the red die is less than the number of spots that show on the blue die?

Solution: The event that the number of spots that show on the red die is less than the number that show on the blue die can be broken up into mutually exclusive events, according to the number of spots that show on the blue die. The chance that the number of spots that show on the red die is less than the number that show on the blue die is the sum of the chances of those simpler events. If only one spot shows on the blue die, the number that shows on the red die cannot be smaller, so the probability is zero. If two spots show on the blue die, the number that shows on the red die is smaller if the red die shows exactly one spot. Because the numbers of spots that show on the blue and red dice are independent, the chance that the blue die shows two spots and the red die shows one spot is (1/6)(1/6) = 1/36. If three spots show on the blue die, the number that shows on the red die is smaller if the red die shows one or two spots. The chance that the blue die shows three spots and the red die shows one or two spots is (1/6)(2/6) = 2/36. If four spots show on the blue die, the number that show on the red die is smaller if the red die shows one, two, or three spots; the chance that the blue die shows four spots and the red die shows one, two, or three spots is (1/6)(3/6) = 3/36.

Proceeding similarly for the cases that the blue die shows five or six spots gives the ultimate result:

P(red die shows fewer spots than the blue die) = 1/36 + 2/36 + 3/36 + 4/36 + 5/36 = 15/36.

Alternatively, one could just count the ways: There are 36 possibilities, which can be written in a square table as follows.

The 36 possible outcomes of rolling two dice
  Blue Die
R
e
d
 
D
i
e
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

The outcomes above the diagonal comprise the event whose probability we seek. There are 36 outcomes in all, of which 6 are on the diagonal. Half of the remaining 36-6=30 are above the diagonal; half of 30 is 15. The 36 outcomes are equally likely, so the chance is 15/36. The outcomes highlighted in yellow—(1,4), (2,4) and (3,4)—comprise one of the mutually exclusive pieces used in the computation in Example 17-6 : namely, the three ways the red die can show a smaller number of spots than the blue die, when the blue die shows exactly 4 spots.

The following exercises check your understanding of independence.

Exercise 17-6

A die has six sides; one has one spot, one has two spots, … , and one has six spots. Consider rolling a fair die twice, independently, in such a way that every side is equally likely to land on top in each roll. The probability that the sum of the numbers of spots that show in the two rolls is even is check answer to exercise 6

The conditional probability that the sum of the numbers of spots that show in the two rolls is even, given that the first die lands with 2 spots showing is check answer to exercise 7

The event that the sum of the numbers of spots that show in the two rolls is even and the event that the first die lands with 2 spots showing check answer to exercise 8

The probability that the sum of the numbers of spots that show in the two rolls is 9 is check answer to exercise 9

The conditional probability that the sum of the numbers of spots that show in the two rolls is 9, given that the first die lands with 2 spots showing is check answer to exercise 10

The event that the sum of the numbers of spots that show in the two rolls is 9 and the event that the first die lands with 2 spots showing check answer to exercise 11

Exercise 17-7

It has been said that if you put enough monkeys in front of typewriters, and wait long enough, one of them will type the complete works of Shakespeare.

Consider the sentence "the simians are jubilant." Neglecting the period at the end, but including the spaces, there are 24 characters in this sentence. Suppose we put 1,000 monkeys in front of special typewriters that have only lowercase letters and a spacebar—no numbers, punctuation marks, or special characters, so there are 27 keys in all. Every time a monkey types 24 characters, we change the paper in the typewriter. Assume that monkeys type independently of each other, and that they pick which character to type independently each time and with equal probability of striking each of the 27 keys. Each monkey types 24 characters per minute for 8 hours a day, 365.25 days a year, for 20 years. Assume that it does not take any time to change the paper in the typewriter. What is the chance that at least one of the monkeys types the sentence? check answer to exercise 12

The Multiplication Rule

We can rearrange the definition of conditional probability to solve for the probability that both A and B occur (that AB occurs) in terms of the probability that B occurs and the conditional probability of A given B:

P(AB) = P(A | B)×P(B).

This is called the Multiplication Rule. The following two examples illustrate the Multiplication Rule.

Example 17-7: The Multiplication Rule in Card Shuffling.

A deck of cards is shuffled well, then two cards are drawn. What is the chance that both cards are aces?

Solution: Apply the Multiplication Rule.

P(card 1 is an Ace and card 2 is an Ace) = P(card 2 is an Ace | card 1 is an Ace)×P(card 1 is an Ace)

= 3/51 × 4/52 = 0.5%.

You can see that the Multiplication Rule can save you a lot of time!

 

Example 17-8: Using the Multiplication Rule.

Suppose there is a 50% chance that you catch the 8:00am bus. If you catch the bus, you will be on time. If you miss the bus, there is a 70% chance that you will be late. What is the chance that you will be late?

Solution: Apply the Multiplication Rule.

P(late) = P(miss the bus and late) = P(late | miss the bus) × P(miss the bus)

= 0.5 × 0.7 = 35%.

The following exercises check your ability to apply the Multiplication Rule.

One Example of Exercise 17-8
(Reminder: Examples and exercises may vary when the page is reloaded; the video shows only one version.)

Exercise 17-8

Suppose you are taking a Statistics course with the following grading policy: to pass, you need to get a C or better on the homework, midterm, and final, or make a B or better on the final. Suppose you do all your assignments and exams by guessing randomly (say, based on rolls of a die); you guess independently on everything. The chance you will get a C or better on the homework by guessing is 25%; the chance you will get a C or better on the midterm by guessing is 16%; the chance you will get a C or better on the final by guessing is 11%; and the chance you will get a B or better on the final by guessing is 6%. The chance of getting a C on the final by guessing is check answer to exercise 13

The chance of passing the course by guessing is check answer to exercise 14

Exercise 17-9

A particular construction project has four stages. The first two stages can be undertaken concurrently, and their completion times are independent. However, the first stage must be complete before the third stage can start. The third stage starts as soon as the first stage finishes. The second stage is independent of the first and third stages. If the third stage starts late, it will end late. The second and third stages must be complete before the fourth stage can start; the fourth stage starts as soon as the second and third stages finish. For the project to finish on time, the fourth stage has to finish on time. If the fourth stage does not start on time, it cannot finish on time. The chance that the first stage finishes on time is 70%. The chance that the second stage finishes on time is 79%. The chance that the third stage finishes on time given that the first stage finishes on time is 91%. The chance that the fourth stage finishes on time given that the second and third stages finish on time is 79%. The chance that the project finishes on time is check answer to exercise 15

Bayes' Rule

Bayes' Rule is a formula that expresses P(A | B) in terms of P(B | A), P(B | Ac), P(A), and P(Ac):

P(A | B) = P(B | A) × P(A)/( P(B | A) × P(A) + P(B | Ac)×P(Ac)).

The numerator on the right is P(AB), computed using the Multiplication Rule. The denominator is just P(B), computed by partitioning B into the mutually exclusive sets AB and AcB, and finding the probability of each of those pieces using the Multiplication Rule.

Bayes' Rule is useful to find the conditional probability of A given B in terms of the conditional probability of B given A, which is the more natural quantity to measure in some problems, and the easier quantity to compute in some problems. For example, in screening for a disease, the natural way to calibrate a test is to see how well it does at detecting the disease when the disease is present, and to see how often it raises false alarms when the disease is not present. These are, respectively, the conditional probability of detecting the disease given that the disease is present, and the conditional probability of incorrectly raising an alarm given that the disease is not present. However, the interesting quantity for an individual is the conditional chance that he or she has the disease, given that the test raised an alarm. An example will help.

Example 17-9: Bayes' Rule in Disease Screening

Suppose that 10% of a given population has benign chronic flatulence. Suppose that there is a standard screening test for benign chronic flatulence that has a 90% chance of correctly detecting that one has the disease, and a 10% chance of a false positive (erroneously reporting that one has the disease when one does not). We pick a person at random from the population (so that everyone has the same chance of being picked) and test him/her. The test is positive. What is the chance that the person has the disease?

Solution: We shall combine several things we have learned. Let D be the event that the person has the disease, and let T be the event that the person tests positive for the disease. The problem statement told us that:

The problem asks us to find P(D | T) = P(DT)/P(T). We shall find P(T) by partitioning T into two mutually exclusive pieces, DT and DcT, corresponding to testing positive and having the disease (DT) and testing positive falsely (DcT). Then P(T) is the sum of P(DT) and P(DcT). We will find those two probabilities using the Multiplication Rule. We need P(DT) for the numerator, and it will be one of the terms in the denominator as well. The probability of DT is, by the Multiplication Rule,

P(DT) = P(T | D) × P(D) = 90% × 10% = 9%.

The probability of DcT is, by the multiplication rule and the complement rule,

P(DcT) = P(T | Dc) × P(Dc) = P(T | Dc) × (100% − P(D) ) = 10% × 90% = 9%.

By the third axiom,

P(T) = P(DT) + P(DcT) = 9% + 9% = 18%,

because DT and DcT are mutually exclusive. Finally, plugging in the definition of P(D | T) gives:

P(D | T) = P(DT)/P(T) = 9%/18% = 50%.

Because only a small fraction of the population actually have benign chronic flatulence, the chance that a positive test result for someone selected at random from the population is a false positive is 50%, even though the test is 90% accurate. The computation we just made is equivalent to using Bayes' rule:

P(D | T) = P(T | D)×P(D)/(P(T | D)×P(D) + P(T | Dc)×P(Dc) )

= 90%×10%/( 90%×10% + 10%×90%)

= 50%.

The Base Rate Fallacy consists of ignoring P(A) or P(B) in computing P(B | A) from P(A | B) and P(A | Bc). For instance, in the example above, the base rate for chronic benign flatulence is 10%. The test is 90% accurate (both for false positives and for false negatives). The base rate fallacy is to conclude that since the test is 90% accurate, it must be true that 90% of people who test positive in fact have the disease—ignoring the base rate of the disease in the population and the frequency of false positive test results. We just saw that that conclusion is wrong: if people are tested at random, of those who test positive, only 50% have the disease, on average.

The Prosecutor's Fallacy consists of confusing P(B | A) with P(A | B). For instance, P(A | B) might be the probability of some evidence if the accused is guilty, P(B | A) is the probability that the accused is guilty given the evidence. The second "conditional probability" generally does not make sense at all;

even when it does, its numerical value need not be close to the value of P(A | B).

The following exercises check your ability to work with conditional probability, the Multiplication Rule, and Bayes' Rule.

Videos of Exercises

(Reminder: Examples and exercises may vary when the page is reloaded; the video shows only one version.)

Exercise 17-10

A hypothetical urine test for narcotics use has a 1% rate of false positives: on average, one in 100 people who do not use narcotics will test positive for narcotics use. Suppose that a large population of people is screened for narcotics use using this test. The vast majority of people who test positive for narcotics use (according to this test) actually use narcotics. check answer to exercise 16

Exercise 17-11

A box has two drawers. One drawer contains one gold coin and one silver coin. The other drawer contains 7 gold coins and 3 silver coins. A drawer is picked at random with chance 1/2 of picking each drawer, then a coin is picked at random from that drawer, with equal chance of picking each coin in the drawer. The conditional chance that the drawer picked contains at least one more gold coin, given that the coin picked from the drawer is gold, is check answer to exercise 17

Exercise 17-12

A fair die is rolled three times, independently. The probability that the sum of the numbers of spots that show on the first two dice is less than the number of spots that show on the third die is check answer to exercise 18

Summary

The Axioms of Probability are mathematical rules that must be followed in assigning probabilities to events: The probability of an event cannot be negative, the probability that something happens must be 100%, and if two events cannot both occur, the probability that either occurs is the sum of the probabilities that each occurs. A function that assigns numbers to events and satisfies the axioms is called a probability distribution.

The axioms have numerous consequences, including the following: The probability of the empty set is zero. The probability that a given event does not occur is 100% minus the probability that the event occurs. The probability that either of two events occurs is the sum of the probabilities that each occurs, minus the probability that both occur. The probability that either of two events occurs is at least as large as the probability that each occurs, and no larger than the sum of the probabilities that each occurs. The probability that two events both occur is no larger than either of their individual probabilities.

Conditioning describes updating probabilities to incorporate new knowledge. For example, how should we update the probability of the event A if we learn that the event B occurs? The updated probability is the conditional probability of A given B, which is equal to the probability that A and B both occur, divided by the probability that B occurs, provided that the probability that B occurs is not zero. Conditional probability satisfies the axioms of probability.

Rearranging the definition of conditional probability yields the Multiplication Rule: The probability that A and B both occur is the conditional probability of A given B, times the probability that B occurs. Two events are independent if the occurrence of one is uninformative with respect to the occurrence of the other: if P(A | B) = P(A). A slightly more general definition is that A and B are independent if P(AB) = P(A)×P(B).

Bayes' Rule expresses P(A | B) in terms of P(B | A), P(B | Ac), and P(A), which in some problems are easier to calculate than P(A | B). Bayes' Rule says that

P(A | B) = P(B | A)×P(A)/( P(B | A)×P(A) + P(B | Ac)×P(Ac) )

The base rate fallacy consists of ignoring P(A) or P(B) in computing P(B | A) from P(A | B) and P(A | Bc). The prosecutor's fallacy consists of confusing P(A | B) for P(B | A).

Key Terms


Chapter:

Preface | Introduction | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |

©1997–2025. P.B. Stark. All rights reserved.
Last generated 5/4/2025, 11:06:43 AM. Content last modified 2 September 2019 16:05 PDT.