Statistics for Dummies (32 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
11.83Mb size Format: txt, pdf, ePub

 

Factoring In the Sample Size

The relationship between margin of error and sample size is simple: As the sample size increases, the margin of error decreases. This confirms what you hope is true: The more information you have, the more accurate your results are going to be. (That of course, assumes that the information is good, credible information. See
Chapter 2
for how statistics can go wrong.)

TECHNICAL STUFF 

Looking at the formula for margin of error for the sample mean, notice that it has an
n
in the denominator of a fraction (this is the case for most any margin of error formula):
As
n
increases, the denominator of this fraction increases, which makes the overall fraction get smaller. That makes the margin of error smaller and results in a narrower confidence interval.

HEADS UP 

When you need a high level of confidence, you have to increase the Z-value and, hence, margin of error, resulting in a wider confidence interval, which isn't good. But, you can offset this wider confidence interval by increasing the sample size and bringing the margin of error back down, thus narrowing the confidence interval. The increase in sample size allows you to still have the confidence level you want, but also ensures that the width of your confidence interval will be small (which is what you ultimately want). You can even determine this information before you start a study: If you know the margin of error you want to get, you can set your sample size accordingly (see
Chapter 9
).

Tip 

When your statistic is going to be a percentage (like the percentage of people who prefer to wear sandals during summer), a rough way to figure margin of error is to take 1 divided by the square root of
n
(the sample size). You can try different values of
n
and you can see how the margin of error is affected.

Approximately what sample size is needed to have a narrow confidence interval with respect to polls? Using the formula in the preceding paragraph, you can make some quick comparisons. A survey of 100 people will have a margin of error of about
or plus or minus 10% (meaning the width of the confidence interval is 20%, which is pretty large.) However, if you survey 1,000 people, your margin of error decreases dramatically, to plus or minus about 3%; the width now becomes only 6%. A survey of 2,500 people results in a margin of error of plus or minus 2% (so the width is down to 4%). That's quite a small sample size to get so accurate, when you think about how large the population is (the U.S. population, for example, is over 280 million!).

Keep in mind, however, you don't want to go too high with your sample size because a point comes where you have a diminished return. For example, moving from a sample size of 2,500 to 5,000 narrows the width of the confidence interval to about 2 × 1.4 = 2.8%, down from 4%. Each time you survey one more person, the cost of your survey increases, so adding another 2,500 people to the survey just to narrow the interval by little more than 1% may not be worthwhile.

HEADS UP 

Real accuracy depends on the quality of the data as well as on the sample size. A large sample size that has a great deal of bias (see
Chapter 2
) may appear to have a narrow confidence interval but means nothing. That's like competing in an archery match and shooting your arrows consistently, but finding out that the whole time you're shooting at the next person's target; that's how far off you are. With the field of statistics, though, you can't measure bias, you can only try to minimize it.

REMEMBER 

The larger the sample size is, the smaller the margin of error will be, and the narrower the confidence interval will get, assuming that everything else stays the same and that the quality of the data is good.

 

Counting On Population Variability

One of the factors influencing variability in sample results is the fact that the population itself contains variability. If every value in the population were exactly the same, imagine how boring the world would be. (In fact, statisticians wouldn't exist if not for variability.) For example, in a population of houses in a large city like Columbus, Ohio, you see a great deal of variety in not only the types of houses, but also the sizes, and the prices. And the variability in prices of houses in Columbus, Ohio, should be more than the variability in prices of houses in a selected housing development in Columbus.

That means if you take a sample of houses from the entire city of Columbus and find the average price, the margin of error should be larger than if you take a sample from that single housing development in Columbus, even if you have the same confidence level and the same sample size each time. Why? Because the houses in the entire city have more variability in price, and your sample average would change more from sample to sample than it would if you took the sample only from that single housing development, where the prices tend to be very similar. That means you need to sample more houses if you're sampling from the entire city of Columbus in order to have the same amount of accuracy that you would get from that single housing development.

TECHNICAL STUFF 

Variability is measured by the standard deviation. The standard deviation of the population (
σ
) isn't typically known, so you estimate it with
s
, the standard deviation of the sample (see
Chapter 4
). Notice that
s
appears in the numerator of the standard error in the formula for margin of error for the sample mean:
. Therefore, as the standard deviation (the numerator) increases, the standard error (the entire fraction) also increases. This results in a larger margin of error and a wider confidence interval.

REMEMBER 

More variability in the original population increases the margin of error, making the confidence interval wider. This increase can be offset by increasing the sample size.

 

Chapter 13:
Commonly Used Confidence Intervals—Formulas and Examples

Whenever you want to determine the mean of the population but you can't find it exactly due to time/money constraints (which is usually the case), the next best thing to do is take a sample of the population, find
its
mean, and use that to estimate the mean for the whole population. Then (and see
Chapters 11
and
12
for details), you must include some measure of how accurate you expect your sample results to be; after all, you know that those results would change at least a little if you took a different sample. So along with your sample mean, you must include a margin of error (by how much you expect your sample result to change from sample to sample), and your sample mean plus or minus the margin of error combines to form a confidence interval for the population mean.

But figuring the confidence interval can be a little confusing, so in this chapter, I outline the formulas for the four most commonly used confidence intervals (CIs), explain the calculations, and walk you through some examples.

Calculating the Confidence Interval for the Population Mean

When the characteristic that's being measured (such as income, IQ, price, height, quantity, or weight) is
numerical
, most people want to report the mean (average) value for the population, because the average is a one number summary of the population, telling where the center of the population is. You estimate the population mean by using a sample mean, plus or minus a margin of error. The result is called a
confidence interval for the population mean.

The formula for a CI for a population mean is
, where
x
is the sample mean,
s
is the sample standard deviation,
n
is the sample size and Z is the appropriate value from the standard normal distribution for your desired confidence level. (See
Chapter 3
for formulas for
x
and
s
; see
Chapter 10
(
Table 10-1
) for values of Zfor given confidence levels.)

To calculate a CI for the population mean (average), do the following:

  1. Determine the confidence level and find the appropriate Z-value.

    See
    Chapter 10
    (
    Table 10-1
    ).

  2. Find the sample mean
    (x)
    ,
    the sample standard deviation
    (s)
    ,
    and the sample size
    (n).

    See
    Chapter 3
    .

  3. Multiply Z times
    s
    and divide that by the square root of
    n.

    This is the margin of error.

  4. Take
    x
    plus or minus the margin of error to obtain the CI.

    The lower end of the CI is
    x
    minus the margin of error, while the upper end of the CI is
    x
    plus the margin of error.

For example, suppose you work for the Department of Natural Resources, and you want to estimate, with 95% confidence, the mean (average) length of walleye fingerlings in a fish hatchery pond.

Because you want a 95% confidence interval, your Z is 1.96.

Suppose you take a random sample of 100 fingerlings, and you determine that the average length is 7.5 inches and the standard deviation
(s)
is 2.3 inches. (See
Chapter 4
for calculating the mean and standard deviation.) This means
x
= 7.5,
s
= 2.3, and
n
= 100.

Multiply 1.96 times 2.3 divided by the square root of 100 = (10). The margin of error is, therefore, plus or minus 1.96 × (2.3 ÷ 10) = 1.96 × 0.23 = 0.45 inches.

Your 95% confidence interval for the mean length of walleye fingerlings in this fish hatchery pond is 7.5 inches plus or minus 0.45 inches. (The lower end of the interval is 7.5

0.45 = 7.05 inches; the upper end is 7.5 + 0.45 = 7.95 inches.) You can say then, with 95% confidence, that the average length of walleye fingerlings in this entire fish hatchery pond is between 7.05 and 7.95 inches, based on your sample.

When your sample size is small (under 30), a slight modification in your calculations will be needed. This is discussed in
Chapter 15
.

Other books

11 Eleven On Top by Janet Evanovich
Amber Treasure, The by Denning, Richard
Mercenary by Lizzy Ford
Beyond Ruin by Crystal Cierlak