Statistics for Dummies (37 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
9.09Mb size Format: txt, pdf, ePub

 

Weighing the Evidence and Making Decisions: P-Values

To test whether the claim is true, you're looking at your test statistic taken from your sample, and seeing whether it supports the claim. And how do you determine that? For most cases, by looking at where your test statistic ends up on the standard normal distribution (Z-distribution) — see
Chapter 9
. The Z-distribution has a mean of 0 and a standard deviation of 1. If your test statistic is close to 0, or at least within that range where most of the results should fall, then you say yes, the claim (H
o
) is probably true, and the sample results verify it. If your test statistic is out in the tails of the standard normal distribution, then you say no, the chance of my sample results ending up this far out on the distribution is too small; my sample results don't verify the claim (H
o
).
But how far is "too far" from 0? As long as you have a large enough sample size, you know that your test statistic falls somewhere on a standard normal distribution, according to the central limit theorem (see
Chapter 10
). If the null hypothesis is true, most (about 95%) of the samples will result in test statistics that lie roughly within 2 standard errors of the claim. If H
a
is the not equal-to alternative, any test statistic outside this range will result in H
o
being rejected (see
Figure 14-1
).

Figure 14-1:
Test statistics and your decision.
TECHNICAL STUFF 

Note that if the alternative hypothesis is the less-than alternative, you reject H
o
only if the test statistic falls in the left tail of the distribution. Similarly, if H
a
is the greater-than alternative, you reject H
o
only if the test statistic falls in the right tail.

P
-value basics

You can be more specific about your conclusion by noting exactly how far out on the standard normal distribution the test statistic falls, so everyone knows where the result stands and what that means in terms of how strong the evidence is against the claim. You do this by looking up the test statistic on the standard normal distribution (Z-distribution) and finding the probability of being at that value or beyond it (in the same direction) by using
Table 8-1
(see
Chapter 8
). The
p
-value measures how likely it was that you would have gotten your sample results if the null hypothesis were true. The farther out your test statistic is on the tails of the standard normal distribution, the smaller the
p
-value will be, and the more evidence you have against the null hypothesis being true.

REMEMBER 

All
p
-values are probabilities between 0 and 1.

To find the
p
-value for your test statistic (means/porportions, large samples):

  1. Look up the location of your test statistic on the standard normal distribution (see
    Table 8-1
    in
    Chapter 8
    ).

  2. Find the percentage chance of being at or beyond that value in the same direction:

    1. If H
      a
      contains a less-than alternative, find the percentile from
      Table 8-1
      in
      Chapter 8
      that corresponds to your test statistic.

    2. If H
      a
      contains a greater-than alternative, find the percentile from
      Table 8-1
      in
      Chapter 8
      that corresponds to your test statistic, and then take 100% minus that. (You want the percentage to the right of your test statistic in this case, and percentiles give you the percentage to the left. See
      Chapter 5
      .)

  3. Double this percentage if (and only if) H
    a
    is the not-equal-to alternative.

    This accounts for both the less-than and the greater-than possibilities.

  4. Change the percentage to a probability by dividing by 100 or moving the decimal point two places to the left.

To interpret a
p
-value:

  • For small
    p
    -values (generally less than 0.05), reject H
    o
    . Your data don't support H
    o
    , and your evidence is beyond a reasonable doubt.

  • For large
    p
    -values (generally greater than 0.05), you can't reject H
    o
    . You don't have enough evidence against it.

  • If your
    p
    -value is on or close to the borderline between accepting and rejecting, your results are marginal. (They could go either way.)

Generally, statisticians stay with H
o
unless the evidence is beyond a reasonable doubt, just like in a courtroom. What probability reflects that cutoff point? It can be rather arbitrary (the term "small
p
-value" can mean something different to each person). For most statisticians, if the
p
-value is less than 0.05 given the data they collect, they'll reject H
o
, and choose H
a
. Some people may have stricter cutoffs, such as 0.01, requiring more evidence before rejecting H
o
. Each reader makes his/her own decision. That's why researchers need to report
p
-values, rather than just their decisions, so that people can come to their own conclusions based on their own internal cutoff points. For example, if your
p
-value is 0.026 when testing H
o
:
p
= 0.25 versus H
a
:
p
< 0.25 in the varicose veins example, a reader with a personal cutoff point of 0.05 would conclude that H
o
is false, because the
p
-value (of 0.026) is less than 0.05. However, a reader with a personal cutoff of 0.01 would not have enough evidence (based on your sample) to reject H
o
, because the
p
-value of 0.026 is greater than 0.01.

Caution: Interpretations will vary!

Some people do like to set a cutoff probability before doing a hypothesis test; this is called an
alpha level
(
α
). Typical values for
α
are 0.05 or 0.01. Here's how they interpret their results in that case:

  • If the
    p
    -value is greater than or equal to
    α
    , accept H
    o
    .

  • If the
    p
    -value is less than
    α
    , reject H
    o
    .

  • P
    -values on the borderline (very close to
    α
    ) are treated as marginal results.

Other people don't set a predetermined cutoff; they just report the
p
-value and interpret their results by looking at the size of the
p
-value. Generally speaking,

  • If the
    p
    -value is less than 0.01 (very small), the results are considered highly statistically significant — reject H
    o
    .

  • If the
    p
    -value is between 0.05 and 0.01 (but not close to 0.05), the results are considered statistically significant — reject H
    o
    .

  • If the
    p
    -value is close to 0.05, the results are considered marginally significant — decision could go either way.

  • If the
    p
    -value is greater than (but not close to) 0.05, the results are considered non-significant — accept H
    o
    .

HEADS UP 

When you hear about a result that has been found to be statistically significant, ask for the
p
-value and make your own decision. Cutoff points and resulting decisions vary from researcher to researcher.

 

Knowing That You Could Be Wrong: Errors in Testing

After you make a decision to either reject H
o
or accept H
o
, the next step is living with the consequences, in terms of how people respond to your decision.

  • If you conclude that a claim isn't true but it actually
    is
    true, will that result in a lawsuit, a fine, unnecessary changes in the product, or consumer boycotts that shouldn't have happened?

  • If you conclude that a claim is true but it actually isn't, what happens then? Will products continue to be made in the same way as they are now? Will no new law be made, no new action taken, because you showed that nothing was wrong?

REMEMBER 

Every hypothesis test decision has impact; otherwise, why do the tests?
So, a consequence can result from any decision: You could be wrong! The
X-Files
motto applies here: "The truth is out there." But the thing is, you don't know what the truth is; that's why you did the hypothesis test in the first place.

Making a false alarm: Type-1 errors

Suppose a company claims that its average package delivery time is 2 days, and a consumer group tests this hypothesis and concludes that the claim is false: They believe that the average delivery time is actually more than 2 days. This is a big deal. If the group can stand by its statistics, it has done well to inform the public about the false advertising issue. But what if the group is wrong? Even if the study is based on a good design, collects good data, and makes the right analysis, the group can still be wrong.

Why? Because its conclusions were based on a sample of packages, not on the entire population. And
Chapter 9
tells you, sample results vary from sample to sample. If your test statistic falls on the tail of the standard normal distribution, these results are unusual, if the claim is true, because you expect them to be much closer to the middle of the standard normal distribution (Z-distribution). Just because the results from a sample are unusual, however, doesn't mean they're impossible. A
p
-value of 0.04 means that the chance of getting your particular test statistic (out on the tail of the standard normal distribution), even if the claim is true, is 4% (less than 5%). That's why you reject H
o
in this case, because that chance is so small. But a chance is a chance!

Perhaps your sample, while collected randomly, just happens to be one of those atypical samples whose result ended up far out on the distribution. So H
o
could be true, but your results lead you to a different conclusion. How often does that happen? Five percent of the time (or whatever your given cutoff probability is for rejecting H
o
).

Rejecting H
o
when you shouldn't is called a
type-1 error.
I don't really like this name, because it seems so nondescript. I prefer to call a type-1 error a
false alarm.
In the case of the packages, if the consumer group made a type-1 error when it rejected the company's claim, they created a false alarm. What's the result? A very angry delivery company, I guarantee that!

Missing a detection: Type-2 errors

On the other hand, suppose the company really wasn't delivering on its claim. Who's to say that the consumer group's sample will detect it? If the actual delivery time is 2.1 days instead of 2 days, the difference would be pretty hard to detect. If the actual delivery time is 3 days, a fairly small sample would show that something's up. The issue lies with those in-between
values, like 2.5 days. If H
o
is indeed false, you want to find out about it and reject H
o
. Not rejecting H
o
when you should have is called a
type-2 error
. I like to call it a
missed detection.

Sample size is the key to being able to detect situations where H
o
is false and to avoiding type-2 errors. The more information you have, the less variable your results will be (see
Chapter 8
) and the more ability you have to zoom in on detecting problems that exist with a claim.

This ability to detect when H
o
is truly false is called the
power
of a test. Power is a pretty complicated issue, but what's important for you to know is that the higher the sample size, the more powerful a test is. A powerful test has a small chance for a type-2 error.

HEADS UP 

Take any statistically significant results with a grain of salt, no matter how well the study was conducted. Whatever decision was made, that decision could be wrong. If the study is set up right, however (see
Chapter 16
for surveys and
Chapter 17
for experiments) that chance should be fairly small.

REMEMBER 

Statisticians recommend two preventative measures to minimize the chances of a type-1 or type-2 error:

  • Set a low cutoff probability for rejecting H
    o
    (like 5 percent or 1 percent) to reduce the chance of false alarms (minimizing type-1 errors).

  • Select a large sample size to ensure that any differences or departures that really exist won't be missed (minimizing type-2 errors).

Drawing conclusions about their conclusions

Even if you never conduct a hypothesis test of your own, just knowing how they are supposed to be done can sharpen your critiquing skills. After the test is finished, the next step for researchers is to publish the results and offer press releases to the media indicating what they found. This is another place where you need to be watchful. While many researchers are good about stating their results carefully and pointing out the limitations of their data, others take a bit more liberty with their conclusions (whether they intend to do that or not is a separate issue).

Other books

Passion After Dark by J.a Melville
A Crown Imperiled by Raymond E. Feist
Close to You by Kate Perry
Pawn by Aimee Carter
Don't Close Your Eyes by Lynessa James
The Gift by A.F. Henley
Arrival of the Prophecy by Ray, Robin Renee