The goal of many surveys and studies is to compare two populations, such as men versus women, low versus high income families, and Republicans versus Democrats. When the characteristic being compared is numerical (for example, height, weight, or income) the object of interest is the amount of difference in the means (averages) for the two populations. For example, you may want to compare the difference in average age of Republicans versus Democrats, or the difference in average incomes of men versus women. You estimate the difference between two population means by taking a sample from each population and using the difference of the two sample means, plus or minus a margin of error. The result is a
confidence interval for the difference of two population means.
The formula for a CI for the difference between two population means (averages) is
, where
x
,
s
1
, and
n
1
are the mean, standard deviation and size of the first sample, and
y
,
s
2
, and
n
2
are the mean, standard deviation and size of the second sample. Z is the appropriate value from the
standard normal distribution for your desired confidence level. (See
Chapter 3
for formulas and calculations for means and standard deviations; see
Chapter 10
(
Table 10-1
) for values of Z for certain confidence levels.)
To calculate a CI for the difference between two population means, do the following:
Determine the confidence level and find the appropriate Z-value.
See
Chapter 10
(
Table 10-1
).
Find the mean (
x
), standard deviation (
s
1
) and sample size (
n
1
) of the first sample and the mean (
y
), standard deviation (
s
2
) and sample size (
n
2
) of the second sample.
See
Chapter 3
.
Find the difference, (
x
−
y
), between the sample means.
Square
s
1
and divide it by
n
1
; square
s
2
and divide it by
n
2
. Add the results together and take the square root.
Multiply your answer from Step 4 by Z.
This is the margin of error.
Take (
x
−
y
) plus or minus the margin of error to obtain the CI.
The lower end of the CI is (
x
−
y
)
minus
the margin of error, while the upper end of the CI is (
x
−
y
-)
plus
the margin of error.
Suppose you want to estimate with 95% confidence the difference between the mean (average) length of the cobs of two varieties of sweet corn (allowing them to grow the same number of days under the same conditions). Call the two varieties Corn-e-stats and Stats-o-sweet.
Because you want a 95% confidence interval, your Z is 1.96.
Suppose your random sample of 100 cobs of the Corn-e-stats variety averages 8.5 inches, with a standard deviation of 2.3 inches, and your random sample of 110 cobs of Stats-o-sweet averages 7.5 inches, with a standard deviation of 2.8 inches. This means
x
= 8.5,
s
1
= 2.3, and
n
1
= 100;
y
= 7.5,
s
2
= 2.8, and
n
2
= 110.The difference between the sample means, (
x
−
y
), from Step 3, is 8.5
−
7.5 = +1 inch. This means the average for Corn-e-stats minus the average for Stats-o-sweet is positive, making Corn-e-stats the larger of the two varieties, in terms of this sample. Is that difference enough to generalize to the entire population, though? That's what this confidence interval is going to help you decide.Square
s
1
(2.3) to get 5.29; divide by 100 to get 0.0529. Square
s
2
(2.8) and divide by 110: 7.84 ÷ 110 = 0.0713. The sum is 0.0529 + 0.0713 = 0.1242; the square root of this is 0.3524.Multiply 1.96 times 0.3524 to get 0.69 inches, the margin of error.
Your 95% confidence interval for the difference between the average lengths for these two varieties of sweet corn is 1 inch, plus or minus 0.69 inches. (The lower end of the interval is 1
−
0.69 = 0.31 inches; the upper end is 1 + 0.69 = 1.69 inches.) That means you can say, with 95% confidence, that the Corn-e-stats variety is longer, on average, than the Stats-o-sweet variety, by somewhere between 0.31 and 1.69 inches. (Notice all the values in this interval are positive. That means Corn-e-stats should always on average be longer than Stats-o-sweet, based on your data.)
HEADS UP | Notice that you could get a negative value for ( |
In the case where your sample size is small (under 30), see
Chapter 15
for the slight modifications that you need to make to your calculations.
When a characteristic, such as opinion on an issue (support/don't support), of the two groups being compared is
categorical
, people want to report on the differences between the two population proportions — for example, the difference between the proportion of women who support a four-day work week, and the proportion of men who support a four-day work week. You estimate the difference between two population proportions by taking a sample from each population and using the difference of the two sample proportions, plus or minus a margin of error. The result is called a
confidence interval for the difference of two population proportions.
The formula for a confidence interval for the difference between two population proportions is
where
and
n
1
are the sample proportion and sample size of the first sample, and
and
n
2
are
the sample proportion and sample size of the second sample. Z is the appropriate value from the standard normal distribution for your desired confidence level. (See
Chapter 3
for sample proportions and
Chapter 10
[
Table 10-1
] for Z-values.)
To calculate a CI for the difference between two population proportions, do the following:
Determine the confidence level and find the appropriate Z-value.
See
Chapter 10
(
Table 10-1
).
Find the sample proportion
for the first sample by taking the total number from the first sample that are in the category of interest and dividing by the sample size,
n
1
. Similarly, find
for the second sample.
Take the difference between the sample proportions )
−
).
Find
times (1
−
) and divide that by
n
1
. Find
time (1
−
) and divede that by
n
2
. Add these two results together and take the square root.
Multiply Z times the result from Step 4.
This is the margin of error.
Take (
−
) plus or minus the margin of error from Step 5 to obtain the CI.
The lower end of the CI is (
−
) minus the margin of error and the (
−
) plus the margin of error.