Statistics for Dummies (23 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
13.91Mb size Format: txt, pdf, ePub
Trying to win at slots

The old one-armed bandit (that is, the slot machine) is a powerful force. These machines have pans made of a special material that loudly projects the sound of coins hitting the pans. The machines blink and make beeping noises every time you win anything; they even beep when you put your money into them, just to add more of that winning sound to the casino atmosphere. Some say that the
loosest slots
(the ones with the best payouts) are by the entryway of the casino or at the end of the rows. Some say the
tightest ones
are by the blackjack tables because people don't want to be distracted too much by the beeping noise. Casino owners never tell their secrets, so no one can tell for sure, but one thing is certain: Slot machines can take your money very quickly. No skill is needed, and each pull of the handle takes only a few seconds to spend anywhere from five cents (in my case) to thousands of dollars (in the high-dollar slot areas of some casinos). All of that spending and pulling is done in pursuit of the big jackpot that may come on your very next spin.

One of the most common misconceptions about slot machines is that the longer you play them, the greater your chances of winning. This is something that the casinos are betting on. In fact, just the opposite is true, because of the law of averages. The
law of averages
says that in the long run, averages will be close to their expected value. In statistical terms, the
expected value
is a weighted average of the outcomes based on their probabilities.

With any casino game, the house has a slightly higher chance of winning any single round. This means in the long run, you should expect to lose a small amount with a very high chance every single time, and you should expect
to win a large amount with a tiny chance every single time. The casino sets their odds and payouts so that, in the long term, everything averages out in their favor, even taking the big jackpots into account.

What this means for you is that you'll end up with a small negative expected value every time you play. And the more you play, the more money you have to expect to lose in the long term, because your overall expected value is the sum of the expected values for each time you play; and each one of those expected values is a negative number. Now you know why the casino gives you free beverages while you're gambling and why you won't see any clocks on the walls, or windows to look out of to watch the changing seasons while you sit on that slot-machine stool. The casinos are betting that you'll forget all about the law of averages as it sneaks up on you while you play.

Tip 

The only way to beat the slots is to quit while you're ahead, before the law of averages takes over. Remember, the casinos have probability on their side because they're in it for the long term. You can't fool with Mother Nature, and you can't fool with probability. And if you do win, take your money and run!

 

Part IV:
Wading through the Results
Chapter List
Chapter 8:
Measures of Relative Standing
Chapter 9:
Caution—Sample Results Vary!
Chapter 10:
Leaving Room for a Margin of Error

This part helps you understand the underpinnings of statistics — the information that helps you understand the deeper issues that are working behind the scenes whenever statistics are formulated. You find out how to measure variability from sample to sample, how to come up with a formula to measure accuracy of a statistic, and how to measure where an individual stands with respect to the rest of the population (called a measure of relative standing). All of these topics help build your confidence in interpreting and understanding statistics from the ground up.

 

Chapter 8:
Measures of Relative Standing

The only way to really be able to interpret statistical results is to have something to compare them to, so that you can put the results into some type of perspective. For example, suppose a hypothetical physical therapy student named Rhodie takes a standardized test for physical therapy certification and gets a score of 235. What does 235 mean in this case? Nothing, if that's all you have to go on. You need to be able to put this score into some perspective by determining where it stands relative to the other scores on the test. Is a light bulb that lasts more than 1,200 hours a freak of nature or just a standard light bulb? You can't tell without knowing how long most light bulbs last. Suppose Bob's average exam score at the end of a math course was 78; is that a B or a C? It depends on how his average score compares to the other average scores in his class (and how nice his professor is!).

In this chapter, you discover how to find and interpret the relative standing of individual results; the goal here is to describe where an individual stands, relative to all of the other individuals in the population. In
Chapters 9
and
10
, I discuss how to find and interpret the relative standing of the results from a sample (for example, the sample mean or the sample proportion). The goal in that case is to determine where your sample mean or sample proportion stands, compared to the population of all possible values of the sample mean or sample proportion.

Straightening Out the Bell Curve

The first step in determining where a particular result stands is to get a listing or picture of all of the possible values that the variable can take on in the population and how often those values occur; this is called a
distribution
.

Many different types of distributions are possible. For example grades for one class (call them Mr. Average's class) are distributed uniformly, with an equal number of scores in each grouping (see the top portion of
Figure 8-1
) while grades from another class (call them Mr. Mean's class) are distributed in a polarized way, with everyone getting either a very high or a very low score (as in the bottom portion of
Figure 8-1
). (Most distributions tend to fall somewhere in between.) Notice that, for any distribution, the total of all the percentages has to be 100%, because every value has to appear somewhere on the distribution.

Figure 8-1:
Grade distributions from two classes.

A
bell-shaped curve
describes data from a variable that has an infinite (or very large) number of possible values, and these values are distributed among the population in such a way that when they're plotted in a histogram, the resulting figure has the shape of a bell. This basically means you have a big group of individuals near the middle of the distribution, with fewer and fewer individuals trailing off as you move farther and farther away from the middle in either direction. Many variables in the real world (for example, standardized test scores, lifetimes of products, heights, weights, and so on) have distributions that look like a bell-shaped curve. That makes the bell-shaped curve (sometimes called simply a
bell curve
) important enough to be singled out among all the other possible distributions.

Statisticians call a distribution that has the shape of a bell curve a
normal distribution
. You can see a picture of a normal distribution in
Figure 8-2
. In this example, the variable is the number of hours that a certain company (call it Lights Out) expects its light bulbs to last. (How would you like to be the person collecting the data to test that little fun fact?)

Figure 8-2:
Distribution of the lifetimes of light bulbs from Lights Out.
Characterizing the normal distribution

Every bell-shaped curve (normal distribution) has certain properties. You can use these properties to help determine the relative standing of any particular result in the distribution. The following is a list of properties shared by every normal distribution. These properties are explained in more detail in the following sections.

  • The shape of the curve is symmetric.

  • It has a bump in the middle, with tails going off to the left and right.

  • The mean is directly in the middle of the distribution. The mean of the population is designated by the Greek letter
    μ
    .

  • The mean and the median are the same value, due to symmetry.

  • The standard deviation represents a typical (almost average) distance between the mean and all of the data. The standard deviation of the population is designated by the Greek letter
    σ
    .

  • About 95% of the values are within two standard deviations of the mean.

Describing the shape and center

A normal distribution is
symmetric
, meaning that if you fold it in half right down the middle, the two halves are mirror images of each other. Because its curve is symmetric, the
mean
(the balancing point) and the
median
(the point where half of the data lie on either side) are equal, and they both occur at the middle of the distribution. The lifetimes of the light bulbs shown in
Figure 8-2
have a normal distribution with a mean (and median) of 1,000 hours. (See
Chapter 5
for information on the mean and median; see
Chapter 4
for more on symmetry.)

Measuring the variability

The shape and the mean aren't the only important characteristics to consider when looking at a distribution. The variability in the values is also extremely important, even though much of the media ignores this characteristic and typically reports only the mean. Referring to
Figure 8-2
, you can see that the bulk of the light bulbs from Lights Out has a range of lifetimes that vary from under 700 hours to over 1,300 hours, with a good many of the bulbs lasting between 900 and 1,100 hours. As a consumer, do you want that much variability in lifetimes when you buy a package of light bulbs? Maybe not. A competing company (call them Lights Up) is going to try to produce light bulbs with less variability; the lifetime of their light bulbs will still have a mean of 1,000 hours, but this company is able to produce bulbs with more consistent lifetimes, ranging from around 940 to 1,060 hours, with a good many of their light bulbs lasting between 980 and 1,020 hours (see
Figure 8-3
).

Figure 8-3:
Distribution of the lifetimes of light bulbs from Lights Up.

Variability in a distribution is measured and marked off in terms of number of
standard deviations
. (See
Chapter 3
for the formula for standard deviation.) On a normal distribution, the standard deviation has a special significance because it's the distance from the mean to a place on the distribution called the
saddle point.
Each normal distribution has two saddle points; each is the same distance from the mean. To find a saddle point, start at the mean and move either right or left until the curvature changes from being an upside-down bowl (concave down) to a right-side-up bowl (concave up).

In
Figures 8-2
and
8-3
, the saddle points are marked with dots. The standard deviation of the light bulb lifetimes from Lights Out (refer to
Figure 8-2
) is 100 hours. The standard deviation of the more consistent light bulbs from Lights Up (see
Figure 8-3
) is 20 hours. (For more information on standard deviation, see
Chapter 5
.)

HEADS UP 

Before examining any results, be sure to both examine the scale on the horizontal axis of any distribution and know what the standard deviation is. Depending on the scale used, a distribution can look more squeezed together or more spread out than it should.
Figures 8-2
and
8-3
, for example, look similar, but their scales are very different. A better way to compare the light bulb lifetimes of the two companies is to put them on the same scale, as shown in
Figure 8-4
. Now you can see how much more spread out the lifetimes are for the bulbs made by Lights Out compared to those made by Lights Up; the lifetimes of the bulbs made by Lights Up are much more concentrated around the mean.

Figure 8-4:
Variability in light bulb lifetimes for Lights Out versus Lights Up.
Looking for most of the values: The empirical rule

As long as a distribution has a mound shape in the middle — and the normal distribution certainly fits that criterion — you can make some general statements about where most of the values will be, using distances of 1, 2, or 3 standard deviations from the mean to mark off certain milestones. The rule that allows you to do this is called the
empirical rule
.

The empirical rule says that if a distribution has a mound shape, then:

  • About 68% of the values lie within 1 standard deviation of the mean (or between the mean minus 1 times the standard deviation, and the mean plus 1 times the standard deviation). In statistical notation, this is represented as:
    μ
    ±
    σ
    .

  • About 95% of the values lie within 2 standard deviations of the mean (or between the mean minus 2 times the standard deviation, and the mean plus 2 times the standard deviation). The statistical notation for this is:
    μ
    ± 2
    σ
    .

  • About 99% (actually, 99.7%) of the values lie within 3 standard deviations of the mean (or between the mean minus 3 times the standard deviation and the mean plus 3 times the standard deviation). Statisticians use the following notation to represent this:
    μ
    ± 3
    σ
    .

TECHNICAL STUFF 

In the formulas for the empirical rule, if you don't know the population mean and standard deviation, estimate (replace) the population standard deviation,
σ
, with the sample standard deviation,
s
. And you can also estimate (replace) the population mean,
μ
, with the sample mean,
x
. See
Chapter 3
for details.

Figure 8-5
illustrates the empirical rule. The reason that 68% of the values lie within 1 standard deviation of the mean is because the majority of the values on a normal distribution are mounded up in the middle, close to the mean (as
Figure 8-5
shows). Remember, it has a bell shape. Moving out 1 more standard deviation on either side of the mean adds more values, but less than 30% more (for a total of 95% of the values) because now you're picking up less of the mound part and more of the tail part. Finally, going out 1 more standard deviation on either side of the mean gets you that last little bit of the tail areas, picking up 4.7% (nearly all of the rest) of the remaining values, to go from 95% to 99.7% of the data. Most researchers stay with the 95% range for reporting their results, because going out 3 standard deviations on either side of the mean doesn't seem worthwhile, just to pick up that last 4.7% of the values.

Figure 8-5:
The empirical rule (68%, 95%, and 99.7%).
HEADS UP 

I need to stress the word
about
in the preceding description of the empirical rule. These results are approximations only (but they're good approximations). Later in this chapter (see the "
Converting to a Standard Score
" section), you see how to give more precise information regarding what percent of the values in the distribution are between, above, or below certain values. However, the empirical rule is an important rule in statistics, and the concept of "going out two standard deviations gets you about 95% of the values" is one that you see mentioned often throughout this book.

With the light bulbs from Lights Out (refer to
Figure 8-2
), the standard deviation is 100 hours, and the mean is 1,000 hours. Using the empirical rule, you can discuss the relative standing of certain milestones in the data. For example, according to this model, about 68% of the light bulbs are expected to last between 900 and 1,100 hours (1,000 ± 100), about 95% of the light bulbs should last between 800 and 1,200 hours (1,000 ± 2 × 100), and 99.7% of the light bulbs should last between 700 and 1,300 hours.

Tip 

You can use the symmetry of the normal distribution in combination with the empirical rule to answer other questions about the light bulb lifetimes. For example, what percentage of light bulbs should last 1,000 hours or more? The answer is 50%, because the median is at 1,000, and half of the values are greater than the median. What percentage of light bulbs from Lights Out should last more than 1,200 hours (refer to
Figure 8-2
)? The answer is 2.5%. Why? Because 95% of the light bulbs have lifetimes that are between 800 and 1,200 hours, and given that the total percentage under the whole curve has to be 100%, the remaining two tail areas must add up to 5%. Light bulbs lasting more than 1,200 hours make up the right tail only, and because of symmetry, you cut that 5% exactly in half to get 2.5%. So a light bulb that lasts more than 1,200 hours is pretty much a freak of nature, because that happens only 2.5% of the time (at least with Lights Out). With Lights Up (refer to
Figure 8-3
), a light bulb lasting that long would be unheard of, because 1,200 is much more than 3 standard deviations above the mean for the bulbs produced by that company (refer to
Figure 8-4
).

The moral of the story here is that if you like to gamble, buy your light bulbs from Lights Out, because you'll have a greater chance of getting either a very long-lasting bulb or a bulb that lasts for a very short time; in other words, the bulbs from Lights Out have more variability in their expected performance. If you're the conservative type, get your light bulbs from Lights Up; these bulbs are more consistent, with fewer surprises.

HEADS UP 

The empirical rule does
not
apply when a distribution doesn't have a mound shape in the middle. You can still approximate or determine where certain milestones are in the data by making a histogram and/or finding percentiles (see
Chapters 4
and
5
for more on histograms and percentiles, respectively).

Other books

Hotel Indigo by Aubrey Parker
Reality Jane by Shannon Nering
Eternal Pleasure by Nina Bangs
Colors of a Lady by Chelsea Roston
A Dream of Daring by LaGreca, Gen
Fireman Dodge by Penelope Rivers