Statistics for Dummies (50 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
7.54Mb size Format: txt, pdf, ePub
Monitoring the process

After the control limits have been set, the next step is to monitor the process. In most cases, this involves taking samples of the products at various times,
finding their average weights, and marking these averages on a control chart. When it looks like the process is going off track, either in terms of accuracy or consistency, the manufacturing process is stopped, the problem is identified, and repairs or adjustments are made.

If a process is in control, you should see about 68% of the sample means falling within 1 standard error, 95% of the sample means falling within 2 standard errors and 99% of the sample means falling within 3 standard errors of the target value, according to the empirical rule (see
Chapter 8
). The overall sample mean should be the target value, and you should see about as many values below the target as above the target, with no particular patterns to those values.

HEADS UP 

In actual production, when the tube-filler has to be stopped, the entire production line comes to a standstill. Valuable time and production capacity are lost while maintenance and other personnel locate and correct the problem. This puts added pressure on the statistical process to always provide accurate and reliable information. Before deciding to stop the process, the workers responsible for the quality of the product want to be absolutely sure that a problem really does exist; on the other hand, if a problem has developed on the production line, they don't want to let it go on too long before making the necessary corrections. A delicate balance must be found.

So the next question is, how do you determine when a process is "out of control"? And how do you do it without having too many false alarms, costing the company time and money? Like most issues involving statistics, no single, correct, end-all and be-all answer exists (as it does for math, in most cases). Some people say that's what they love about statistics, and some say that's what they hate about statistics.

HEADS UP 

If the control limits are set at plus or minus 2 standard errors, then 95% of the means should lie within these control limits for the process to be in control. But this means that you should expect 5% of the results to lie outside these limits just by chance, and that should be okay! Here's the tricky part. You don't want to stop the process the first time a mean falls outside of the control limits; that's expected to happen 5% of the time, just by random chance (see
Chapter 10
for more on this). So, in order to avoid too many false alarms, you need more than one mean to fall outside the limits to make the decision to stop the process.

You want to stop the process only if you're pretty sure something has gone awry, and one sample mean outside of the limits isn't anything out of the ordinary. Now, what if you saw 2, 3, 4, 5, or more results in a row fall outside your pre-set limits? Where's the cutoff point? Welcome to the wonderfully vague world of statistics! Here are four examples of rules that are often used to determine whether a process is out of control and should be stopped:

  • Five sample means in a row are all either above the target or below the target (as in
    Figure 19-5a
    ). Suspected cause: systematic overfilling or underfilling due to problems in the process.

    Figure 19-5:
    Toothpaste filling processes that are out of control.

  • Six sample means in a row are either steadily increasing or steadily decreasing (as in
    Figure 19-5b
    ). Suspected cause: The products coming off the line are slowly drifting farther and farther away from the intended average value, probably due to problems with one or more machines.

  • Fourteen sample means in a row alternating above and below the target value (as in
    Figure 19-5c
    ). Suspected cause: two different operators, machines, or suppliers are feeding into one system but are not in agreement.

  • Fifteen sample means in a row are only within 1 standard error of the target (as in
    Figure 19-5d
    ). Suspected cause: the process is more consistent than the specifications call for. (If this overly consistent process costs time or money, it should be loosened up. If the overly consistent process does not add time or money, finding out why the process changed — and replicating this change in the future — may be worthwhile.)

TECHNICAL STUFF 

These rules are based on probability; you stop the process when the chance of the process still being in control is very small given the data you're getting. Note that the chance of any one sample mean falling above or below the target is 50%, or 0.5. So, for the first rule listed in the preceding bullet list,
the probability of getting 5 sample means in a row that are all on the same side of the target is (0.5) × (0.5) × (0.5) × (0.5) × (0.5) = 0.03 = 3%. This is under the typical cutoff probability of 5% (see
Chapter 14
). You conclude that the process is not in control. The chance that the process is still in control, given the data, is too small.

The next time you crack open a new tube of toothpaste, think about all of the statistics that went into ensuring it was filled with quality.

 

Part VIII:
The Part of Tens
Chapter List
Chapter 20:
Ten Criteria for a Good Survey
Chapter 21:
Ten Common Statistical Mistakes

Where would a statistics book be without some statistics of its own? This part contains ten criteria for a good survey and ten common statistical mistakes.

This part gives you a quick, concise reference that you can use to help critique or design a survey and detect common statistical abuses.

 

Chapter 20:
Ten Criteria for a Good Survey

Surveys are all around you: I guarantee that at some point in your life, you'll be asked to complete a survey. This means that you're also inundated with the results of those surveys, and before you consume the information, you need to evaluate whether a survey was properly designed and implemented — in other words, don't assume the survey is okay until you check it out (see
Chapter 16
for the lowdown on surveys). The two important goals for a survey are to be
accurate
(that is, based on enough data so the results wouldn't change much if another sample were taken) and to have a minimum amount of
bias
(systematic overestimation or underestimation of the true result, like the bathroom scale that is always five pounds too high!). In this chapter, you find ten criteria that you can use to evaluate or plan a survey.

The Target Population Is Well Defined

The
target population
is the entire group of individuals that you're interested in studying. For example, suppose you want to know what the people in Great Britain think of reality TV. The target population in this case would be all the residents of Great Britain.

REMEMBER 

Note that sometimes, the target population needs a bit of refinement for clarity. For example, what age groups do you want to include in your target population? For the reality TV example, you probably don't want to include children under a certain age, say 12. So your target population is actually all residents of Great Britain aged 12 and over.

Many researchers don't do a good job of defining their target populations clearly. For example, if the American Egg Board wants to say "Eggs are good for you!" it needs to specify who the "you" is. For example, is the Egg Board prepared to say that eggs are good for people who have high cholesterol?

(One of the studies the group cited was based only on young people who were healthy and eating low-fat diets — is that who they mean by "you"?)

HEADS UP 

If the target population isn't well defined, the survey results are likely biased. This is because the sample that's actually studied may contain people outside the intended population, or the survey may exclude people who should have been included.

 

The Sample Matches the Target Population

When you're conducting a survey, you typically can't ask every single member of the target population to provide the information you're looking for. You usually don't have the time or money to do that. The best you can do is select a
sample
(a subset of individuals from the population) and get the information from them. Because this sample of individuals is your only link to the entire target population, you want that sample to be really good.

A good sample represents the target population. The sample doesn't systematically favor certain groups within the target population, and it doesn't systematically exclude certain people, either. This sounds easy enough, right? All you need to do is get a list of all the individuals in the target population (this is called a
sampling frame
) and select a sample of people from it. How difficult can that be?

Pretty difficult. Suppose your target population is all registered voters in the United States who are likely to vote in the next presidential election. Getting a list of these individuals isn't easy. You can look at voter registration lists, but you don't know which people are likely to vote in the next election. You could check out those who voted in the last election, but many of those folks moved or died, and you're not including those who turned 18 since the last election. Suddenly, the situation gets a bit complicated. Welcome to the world of surveys!

Tip 

One potential solution to this problem is to obtain updated voter registration lists, take a sample of individuals from those lists, and ask them whether they plan to vote in the upcoming election. When someone doesn't, stop asking questions and don't count that person in your survey. For those who do plan to vote, ask who they plan to vote for, and include those answers in your survey results.

REMEMBER 

A good survey has an updated and comprehensive sampling frame that lists all the members of the target population, if possible. If such a list isn't possible, some mechanism is needed that gives everyone in the population an equal opportunity to be chosen to participate in the survey. For example, if a house-to-house survey of a city is needed, an updated map including all houses in that city should be used as the sampling frame.

 

The Sample Is Randomly Selected

An important feature of a good survey is that the sample is randomly selected from the target population.
Randomly
means that every member of the target population has an equal chance of being included in the sample. In other words, the process you use for selecting your sample can't be biased.

Suppose you have a herd of 1,000 steers, and you need to take a random sample of 50 of them to test for a disease. Taking the first 50 steers that come up to you in the field wouldn't fit the definition of a random sample. The steers that are able to come up to you may be less likely to have any kind of disease, or they may be the older, more friendly ones, who actually may be more susceptible to disease. Either way, bias is introduced in the survey. How do you take a random sample of steers? The animals are likely tagged with ID numbers, so you get a list of all the ID numbers, take a random sample of those, and locate those animals. Or, if the animals sleep in cages or stalls, number those and take a random sample of cage numbers. Sometimes being a statistician means being very inventive about how you take a truly random sample!

For surveys involving people, reputable polling organizations such as The Gallup Organization use a random digit dialing procedure and telephone the members of their sample. This excludes people without phones, of course, so this survey has a bit of bias. In this case, though, most people do have phones (over 95%, according to The Gallup Organization), so the bias against people who don't have phones is not a big problem.

REMEMBER 

A good survey contains a random sample of individuals from the target population. Be sure to find out how the sample was selected, if that process isn't described.

 

The Sample Size Is Large Enough

You've heard the saying, "Less is more"? With surveys, the saying is, "Less good information is better than more bad information, but more good information is better." (Not really catchy, is it?)

Here's the basic idea. If you have a large sample size, and the sample is representative of the target population (meaning randomly selected), you can count on that information to be pretty accurate. How accurate depends on the sample size, but the bigger the sample size, the more accurate the information will be.

Tip 

A quick and dirty formula to calculate the accuracy of a survey is to divide by the square root of the sample size. For example, a survey of 1,000 (randomly selected) people is accurate to within,
which is 0.032 or 3.2%. This percentage is called the
margin of error
.

HEADS UP 

Beware of surveys that have a large sample size that
is not
randomly selected. Internet surveys are the biggest culprit here. A company can say that 50,000 people logged on to its Web site to answer a survey, which means the survey results have a lot of information. But that information is biased, because it doesn't represent the opinions of anyone except those who chose to participate in the survey; that is, they had access to the Internet, went to the Web site, and chose to complete the survey. In this case, less would have been more: The company should have sampled fewer people but done so randomly.

Other books

The Miranda Contract by Ben Langdon
Dirt by Stuart Woods
The Oligarchs by David Hoffman
Collision Course by Gordon Korman
Pull (Push #2) by Claire Wallis
Every Dead Thing by John Connolly
Wake of the Perdido Star by Gene Hackman
Boxcar Children 56 - Firehouse Mystery by Warner, Gertrude Chandler, Charles Tang