This test is used when the variable is categorical (for example, smoker/ nonsmoker, political party, support/oppose an opinion, and so on) and you're interested in the proportion of individuals with a certain characteristic — for example, the proportion of smokers. In this case, two populations or groups are being compared (such as the proportion of female smokers versus male smokers). In order to conduct this test, two separate random samples need to be selected, one from each population. The null hypothesis is that the two population proportions are the same; in other words, that their difference is equal to 0. The notation for the null hypothesis is H
o
:
p
1
−
p
2
= 0, where
p
1
is the proportion from the first population, and
p
2
is the proportion from the second population.
The formula for the test statistic comparing two proportions is
. To calculate it, do the following:
Calculate the sample proportions
and
for each sample. Let
n
1
and
n
2
represent the two sample sizes (they need not be equal).
Find the difference between the two sample proportions,
−
Calculate the overall sample proportion,
, which is the total number of individuals from both samples who have the characeristic of interest (for example, the total number of individuals from both samples (
n
1
+
n
2
).
Calculate the standard error:
. Save your answer.
Divide your result from Step 2 by your result from Step 4.
To interpret the test statistic, look up your test statistic on the standard normal distribution (
Table 8-1
in
Chapter 8
) and calculate the
p
-value (see
Chapter 14
for more on
p
-values).
For example, consider those drug ads that pharmaceutical companies put in magazines. The front page of an ad shows a serene picture of the sun shining, flowers blooming, people smiling — their lives changed by the drug. The
company claims that its drugs can reduce allergy symptoms, help people sleep better, lower blood pressure, or fix whichever other ailment it's targeted to help. The claims may sound too good to be true, but when you turn the page to the back of the ad, you see all the fine print where the drug company justifies how it's able to make its claims. (This is typically where statistics are buried!) Somewhere in the tiny print, you'll likely find a table that shows adverse effects of the drug when compared to a control group (subjects who take a fake drug, for fair comparison to those who actually took the real drug. See
Chapter 17
for more on this). For example Adderall, a drug for attention deficit hyperactivity disorder (ADHD), reported that 26 of the 374 subjects (7%) who took the drug experienced vomiting as a side effect, compared to 8 of the 210 subjects (4%) who were on a
placebo
(fake drug). Note that patients didn't know which treatment they were given. In the sample, more people on the drug experienced vomiting, but is this percentage enough to say that the entire population would experience more vomiting? You can test it to see.
In this example, you have H
o
:
p
1
−
p
2
= 0 versus H
o
:
p
1
−
p
2
> 0, where
p
1
represents the proportion of subjects who vomited using Adderall, and
p
2
represents the proportion of subjects who vomited using the placebo.
TECHNICAL STUFF | Why does H |
The next step is calculating the test statistic:
First,
and
. The sample sizes are
n
1
= 374 and
n
2
= 210, respectively.
Next, take the difference between these sample proportions to get 0.07
−
0.04 = 0.03.
The overall sample proportion,
is (26 + 8) ÷ (374 + 210) = 34 ÷ 584 = 0.058
The standard error is
. Whew!
Finally, take the difference from Step 2, 0.03, divided by 0.02 to get 0.03 ÷ 0.02 = 1.5, which is the test statistic.