Applied statistics/Tutorials: Difference between revisions
imported>Nick Gardner No edit summary |
imported>Meg Taylor No edit summary |
||
(30 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{subpages}} | {{subpages}} | ||
==Rules of chance== | ==Rules of chance== | ||
===The addition rule=== | ===The addition rule=== | ||
For two mutually exclusive events, A and B,<br> | For two mutually exclusive events, A and B,<br> | ||
Line 13: | Line 12: | ||
===Bayes' theorem=== | ===Bayes' theorem=== | ||
The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, | The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, multiplied by the probability that A will occur divided by the probability that B will occur,<br> | ||
:::P(A/B) = P(B/A) x P(A)/P(B). | :::P(A/B) = P(B/A) x P(A)/P(B). | ||
==The | ==Common fallacies== | ||
The | |||
===The double birthday fallacy=== | |||
The | ====The fallacy==== | ||
2 | That it is very unlikely that 2 people in a group of 24 have the same birthday. | ||
Proof | |||
Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes'theorem) | ====The truth==== | ||
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken | That there is a better than 50 percent probability that 2 people in any group of 23 or more will have the same birthday. | ||
====Proof==== | |||
'''Step 1''': the following chain of argument proves that the number of different pairs in a group of 23 people is 253<br><small> | |||
(a) the number of pairs that there would be if each of 23 people paired with one of 23 people is 23 x 23 = 529; | |||
(b) deducting the 23 cases in which a person would be paired with himself leaves 529 - 23 ;506; and, | |||
(c) deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2 = 253<br></small> | |||
'''Step 2''': the following argument proves that probability that the two people making up one particular pair '''do not''' have the same birthday is 99.726 per cent<br><small> | |||
(a) of the 365 days in a year there are 364 days that are not A's birthday; (b) there is a one in 365 chance that B's birthday falls on any one of those days; (c) therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365 = 0.99726 or 99.726 per cent; <br></small> | |||
'''Step 3''': the following argument proves that the probability that none of all the 253 different pairs have the same birthdays is 49.94 per cent<br><small> | |||
(a) since the probability that one particular pair do not have the same birthday is 99.726 per cent, the probability that neither of two selected pairs have the same birthday must be .99726 x .99726 or (0.99726)<sup>2</sup>, and that for none of three selected pairs it is (0.99726)<sup>3</sup>... and so on (b) so, the probability that none of the 253 possible pairs of step 1, have a birthday in common is (0.99726)<sup>253</sup> = 0.4994 or 49.94 per cent.<br></small> | |||
'''Step 4'''; since the probability that none of the 256 pairs has the same birthday is 0.4994, the probability that one of the pairs does have the same birthday must be 1 - 0.4995 = 0.5006 or 50.06 per cent. | |||
===The false positive fallacy=== | |||
====The fallacy==== | |||
That if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.<br> | |||
====The truth==== | |||
The true probability is 2 per cent. | |||
====Proof==== | |||
Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes' theorem),<br> | |||
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;<br> | |||
And P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<<br> | And P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<<br> | ||
And P(B) is the probability of being tested positive, which can be arrived at by 3 steps:<br> | And P(B) is the probability of being tested positive, which can be arrived at by 3 steps:<br> | ||
Step 1 is to observe that since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy. | '''Step 1''' is to observe that since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.<br> | ||
Step 2 is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20. | '''Step 2''' is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.<br> | ||
Step 3 is to apply the multiplication rule and get the answer:<br> | '''Step 3''' is to apply the multiplication rule and get the answer:<br> | ||
::P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.<br> | ::P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.<br> | ||
So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by | So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by | ||
::: P(A/B) = P(B/A) x P(A)/P(B), or: | ::: P(A/B) = P(B/A) x P(A)/P(B), or: | ||
:::: = 1 x (1/1000)/(1/20) - which is 0.02, or 2%. | :::: = 1 x (1/1000)/(1/20) - which is 0.02, or 2%. | ||
===The prosecutor's fallacy=== | |||
====The fallacy (an example)==== | |||
The fact that the accused's DNA matched that of the sperm found on the victim in a test which has a one in a thousand chance of giving a false positive result means that there is only a one in a thousand chance of the accused's innocence. | |||
====The truth==== | |||
In fact it means nothing of the sort. One in a thousand of the rest of the population would give the same result, so if the accused is one of half a million people who could have committed the crime, there would be 500 people (in addition to the real rapist) giving the same result. So, in the absence of other evidence, the positive result establishes only a one in 500 probability of the accused's guilt. (DNA evidence can, of course, provide valid proof of guilt when it is used to establish who, among a restricted group of suspects, had committed the crime). |
Latest revision as of 22:00, 25 October 2013
Rules of chance
The addition rule
For two mutually exclusive events, A and B,
the probability that either A or B will occur is equal to the probability that A will occur plus the probability that B will occur,
- P(A or B) = P(A) + P(B).
The multiplication rule
For two independent (unrelated) events, A and B,
the probability that A and B will both occur is equal to the probability that A will occur multiplied by the probability that B will occur,
- P(A and B) = P(A) x P(B)
Bayes' theorem
The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, multiplied by the probability that A will occur divided by the probability that B will occur,
- P(A/B) = P(B/A) x P(A)/P(B).
Common fallacies
The double birthday fallacy
The fallacy
That it is very unlikely that 2 people in a group of 24 have the same birthday.
The truth
That there is a better than 50 percent probability that 2 people in any group of 23 or more will have the same birthday.
Proof
Step 1: the following chain of argument proves that the number of different pairs in a group of 23 people is 253
(a) the number of pairs that there would be if each of 23 people paired with one of 23 people is 23 x 23 = 529;
(b) deducting the 23 cases in which a person would be paired with himself leaves 529 - 23 ;506; and,
(c) deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2 = 253
Step 2: the following argument proves that probability that the two people making up one particular pair do not have the same birthday is 99.726 per cent
(a) of the 365 days in a year there are 364 days that are not A's birthday; (b) there is a one in 365 chance that B's birthday falls on any one of those days; (c) therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365 = 0.99726 or 99.726 per cent;
Step 3: the following argument proves that the probability that none of all the 253 different pairs have the same birthdays is 49.94 per cent
(a) since the probability that one particular pair do not have the same birthday is 99.726 per cent, the probability that neither of two selected pairs have the same birthday must be .99726 x .99726 or (0.99726)2, and that for none of three selected pairs it is (0.99726)3... and so on (b) so, the probability that none of the 253 possible pairs of step 1, have a birthday in common is (0.99726)253 = 0.4994 or 49.94 per cent.
Step 4; since the probability that none of the 256 pairs has the same birthday is 0.4994, the probability that one of the pairs does have the same birthday must be 1 - 0.4995 = 0.5006 or 50.06 per cent.
The false positive fallacy
The fallacy
That if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.
The truth
The true probability is 2 per cent.
Proof
Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes' theorem),
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;
And P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<
And P(B) is the probability of being tested positive, which can be arrived at by 3 steps:
Step 1 is to observe that since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.
Step 2 is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.
Step 3 is to apply the multiplication rule and get the answer:
- P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.
- P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.
So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by
- P(A/B) = P(B/A) x P(A)/P(B), or:
- = 1 x (1/1000)/(1/20) - which is 0.02, or 2%.
- P(A/B) = P(B/A) x P(A)/P(B), or:
The prosecutor's fallacy
The fallacy (an example)
The fact that the accused's DNA matched that of the sperm found on the victim in a test which has a one in a thousand chance of giving a false positive result means that there is only a one in a thousand chance of the accused's innocence.
The truth
In fact it means nothing of the sort. One in a thousand of the rest of the population would give the same result, so if the accused is one of half a million people who could have committed the crime, there would be 500 people (in addition to the real rapist) giving the same result. So, in the absence of other evidence, the positive result establishes only a one in 500 probability of the accused's guilt. (DNA evidence can, of course, provide valid proof of guilt when it is used to establish who, among a restricted group of suspects, had committed the crime).