Monday, August 15, 2011

Thinking in Probability

This blogpost is based on a lecture I gave at Jacob Sir's classes yesterday. The intent of my lecture was to urge student to ask questions and read "stuff" beyond the textbook. To get the interest of the students, I started with 2 examples, which explained the dependent and independent events and also the need to stick to mathematical rigor rather than intuition.

First example was classic Monte Hall problem, which suprisingly none of my students had heard about. This was fortunate because this meant everyone would have to think about it on their own, rather than provide a prepared answer (thought-through by someone else). So, here is the problem:

There are three doors D1, D2 and D3; behind two of them there is a goat and behind the other is a car. The objective of the game is to win a car. So, you have to guess which door to chose ... Does it make difference whether you chose D1, D2 or D3 ? ... The consensus was NO, because the Prob(win)=0.33 for each of the door. To make the problem interesting the game show host (who know which door has car and which doors has goats), opens one of the remaining door that has goat. He now asks you based on this information whether you would like to switch your earlier choice. For example, earlier you chose D1 and the game show host opens D2 and shows that it has goat. Now you can either stick to D1 or switch to D3. The real question we are interested in is:
Does switching the door make more sense or staying with the same door ? or It does not matter whether you chose D1 or D3.

Interesting but incorrect answers:
1. It does not matter whether you chose D1 or D3, because Prob(win) for each door is now 0.5
2. Staying with the door makes more sense, since we have increased the Prob(win) from 0.3 to 0.5
3. We really don't have plausible reason to switch, hence stay with the same door; especially since the game show host may want to trick us.

The correct answer is switching doubles the Prob(win), hence it makes more sense.
Consider the case where you don't switch:
Prob(win) = Prob(D1 = car) = 0.33 (The probability does not change because of an event outside its scope)
Now consider the case where you switch:
Prob(win) = Prob(D1 = goat) = 0.66

The other way to understand the problem is by enumerating
D1 : C G G
D2 : G C G
D3 : G G C
Swap: G C C => Prob(car) = 0.66
Stay: C G G => Prob(car) = 0.33

For people who chose the third incorrect answer, I asked them not to use information not provided in the problem and show as much restrain as possible to use intuition over logic. Deductive thinking says go from Step 2 to Step 3 only if there is a valid theorem, axiom, or logical reason (not intuition).

Let's move to the second problem:
Say I have a fair coin. I toss the coin four times and I get {Head, Head, Head, Head}, what will you bet on ?
Some students earlier based their answer on gambler's fallacy, stating that Tail is more likely since they have seen four consecutive heads. I then asked them to sit in circle and discuss with each other and then answer my question again. Through this exercise, I wanted them to learn the idea of collaboration to get an answer. And vola ... they came with the correct answer ... It doesnot matter whether we bet on head or tail, since the events are INDEPENDENT and hence each outcome is equally likely => Prob(T5 | H1, H2, H3, H4) = Prob(T5) = 0.5

Finally, I gave them two different explanation of probability:
1. Deterministic view for probability (explained in my earlier blogpost)

After explaining the deterministic view, I emphasized the importance of probability while solving real life problem. For example, in coin tossing experiment, you are able to make some rational predictions, even though you don't know (hidden) parameters like initial angle of coin, weight and density of coin, shape of the coin, velocity of the coin, resistance due to air, characteristics of surface where it lands, etc ... Similarly, you can build simpler probabilistic model for complex scenarios like weather forecasting, stock prediction, traffic congestion, ...


1 comment:

Niketan Pansare said...

Let's examine the Monte Hall problem using Bayes formula:
P(A|B) = P(B|A).P(A) / ( sum_j P(B|A_j) P(A_j))

Since the denominator is same as the numerator, except for all possible j's, let's rewrite the formula
P(A|B) propto P(B|A) P(A)
where propto => proportional to (i.e. denominator is just for normalization).

Let's use following notation for Monte Hall:
C1 => car is behind door 1
D2 => game show host opened door 2
S3 => before game show host revealing any doors, you selected door 3

Let's say you pick door 2 and game show host opens door 3. Note, the choice of our doors is arbitrary and hence there is no loss of generality.

Before game show reveals the door, car is equally likely to be behind any of the doors:
P(C1) = P(C2) = P(C3) = 1/3
This does not change if you select any arbitrary door with no information from the game show host:
P(C1|S1) = P(C1|S2) = P(C1|S3) = P(C2|S1) = P(C2|S2) = ... = 1/3

P(D3 | C1, S2) => Car is behind door 1 and you picked door 2. Therefore, the game show host is forced to open door 3 => P(D3 | C1, S2) = 1

P(D3 | C2, S2) => Car is behind door 2 and you picked door 2. Assuming the game show host has no preference for door 1 or door 3, he can randomly open any of the two remaining doors => P(D3 | C2, S2) = 1/2

P(D3 | C3, S2) => Car is behind door 3 and you picked door 2. Since the car is behind door 3, the game show host cannot open door 3 => P(D3 | C3, S2) = 0

Using above information, examine the probability of car behind each of the door using this information:
P(C1 | D3, S2) propto P(D3 | C1, S2) P(C1 | S2) = 2/6
P(C2 | D3, S2) propto P(D3 | C2, S2) P(C2 | S2) = 1/6
P(C3 | D3, S2) propto P(D3 | C3, S2) P(C3 | S2) = 0

This means If you have selected door 2, its better to switch your choice to door 1 because considering the structure of the game, the car is twice as likely to be in the door 1 (i.e. the door you didn't select) than in door 2 (i.e. door you did select).