Statistics for Dummies (20 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
9.2Mb size Format: txt, pdf, ePub
Determining where you stand: Percentiles

Everyone wants to know how they compare to everyone else. In school, what you got on a test mattered less than how your test score compared to the scores of the other kids in the class. Exams such as the GRE and ACT often keep the total number of points the same each year, while student performances vary as each test changes from year to year. So, along with your score, you always get an accounting of what your score means relative to the others who took the same exam with you. In other words, you find out your relative standing in the group.

Understanding percentiles

The most common way to report relative standing is by using
percentiles.
A percentile is the percentage of individuals in the data set who are below you. If you're at the 90th percentile, for example, that means 90% of the people taking the exam with you scored lower than you did. And that also means that 10 percent scored higher than you did, because the total has to add up to 100%. (Everybody taking the test has to show up somewhere relative to your score, right?)

HEADS UP 

A percentile is
not
a score in and of itself. Suppose your score on the GRE was reported to be the 80th percentile. This doesn't mean you scored 80% of the questions correctly. It means that 80% of the students' scores were lower than yours, and 20% of the students' scores were higher than yours.

Calculating percentiles

To calculate the
k
th
percentile (where
k
is any number between one and one hundred), do the following steps:

  1. Put all of the numbers in the data set in order from smallest to largest.

  2. Multiply
    k
    percent times the total number of numbers
    ,
    n.

  3. Take that result and round it up to the nearest whole number.

  4. Count the numbers from left to right (from the smallest to the largest number) until you reach the value from Step 3.

For example, suppose you have 25 test scores, and when they're put in order from lowest to highest, they look like this: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99. Suppose further that you want to find the 90th percentile for these scores. Because the data are already ordered, the next step is to multiply 90% times the total number of scores, which gives 90% × 25 = 0.90 × 25 = 22.5. Rounding up to the nearest whole number, you get 23. This means that counting from left to right (from the smallest to the largest number in the data set), you go until you find the 23rd number in the data set. That number is 98, and it's the 90th percentile for this data set.

Tip 

The 50th percentile is the point in the data where 50% of the data fall below that point and 50% fall above that point. You may recognize this under a different name — the median. Indeed, the median is a special percentile; it's the 50th percentile.

HEADS UP 

A high percentile doesn't always constitute a good thing. For example, if your city is at the 90th percentile in terms of crime rate compared to cities of the same size, that means that 90% of cities similar to yours have a crime rate that is lower than yours, which is not good for you.

Interpreting percentiles

The U.S. government often reports percentiles among their data summaries. For example, U.S. Census Bureau reported the median household income for 2001 was $42,228. The Bureau also reported various percentiles for house-hold income, including the 10th, 20th, 50th, 80th, 90th, and 95th.
Table 5-3
shows the values of each of these percentiles.

Table 5-3:
U.S. Household Income for 2001

Percentile

2001 Household Income

10th

  $ 10,913

20th

  $ 17,970

50th

  $ 42,228

80th

  $ 83,500

90th

$ 116,105

95th

$ 150,499

Looking at these percentiles, you can see that the bottom half of the incomes are closer together than are the top half of the incomes. The difference between the 50th percentile and the 20th percentile is about $25,000, whereas the spread between the 50th percentile and the 80th percentile is more like $41,000. And the difference between the 10th and 50th percentiles is only about $31,000, whereas the difference between the 90th and the 50th percentiles is a whopping $74,000.

TECHNICAL STUFF 

By looking at these percentiles and how they're distributed among the data, you can tell that this data set, if shown with a histogram, would be skewed to the right. (A
histogram
is basically a bar chart that breaks the data into groups and shows the number in each group. See
Chapter 4
for more on histograms.) That's because the higher incomes are more spread out and trail off more than low incomes, which are more clumped together. In this report, the mean wasn't shown because it would have been greatly influenced by those outliers (the households with very high incomes), which would have driven the mean upward, artificially inflating the overall description of household incomes in the United States.

Percentiles do occur in the media and in many public documents; they can yield some interesting information about the data, including how evenly or unevenly the data are distributed, how symmetric the data are, and some
important milestones in the data, such as what the median is. Percentiles can also tell you where you (your test score, your income, and so on) stand in a data set. Sometimes, the value of the average isn't important, as long as you know how far above or below average you are. For more information on these other applications of percentiles, see
Chapter 8
.

REMEMBER 

No matter what type of data is being summarized or what type of statistics is being used, remember that summary statistics can't tell you everything about the data. But if these statistics are well chosen and they're not misleading, they can tell a great deal of information quickly. Errors of omission can happen, however, so be sure to be on the lookout for some of those lesser-known statistics that can fill in some important clues to the real story behind the data.

 

Part III:
Determining the Odds
Chapter List
Chapter 6:
What Are the Chances? —Understanding Probability
Chapter 7:
Gambling to Win

Get your dice ready to roll! In this part, you uncover some of the secrets of the gambling scene (and rule number one is to quit while you're ahead!). You also look at the basics of probability so that you know what you're up against when gambling or dealing with any type of chance or uncertainty. And you may be surprised to discover that probability and your intuition don't always mix!

 

Chapter 6:
What Are the Chances?—Understanding Probability

In this chapter, you discover how probability is used in everyday life and in the workplace and explore some of the rules of probability. You also see how probability and intuition don't always mix, find out ways to avoid some common probability misconceptions, and discover what probability has to do with statistics.

Taking a Chance with Probability

Have you ever said, "What are the chances of that happening?" You read, for example, about two tornados hitting the same tiny Kansas town within a 50-year span. You see a friend on a plane to whom you haven't talked in years. You have two flat tires in one day. Your underdog team wins the NCAA basketball championship during March Madness. Strange things happen, and sometimes these events leave you wondering, "What are the odds? Who would have ever predicted this? What's the chance of that ever happening again?" All of these questions have to do with probability.

But probability isn't just about examining the oddities of life (although that admittedly is a fun pastime for those who engage in it). Probability is really about dealing with the unknown in a systematic way, by scoping out the
possibilities, figuring out the most likely scenarios, or having a backup plan in case those most likely scenarios don't happen.

Life is a sequence of unpredictable events, but probability can be used to help predict the likelihood of certain events occurring. Here are some of the more mundane ways that probability may cross your path on a daily basis:

  • The weather reporter predicts an 80% chance of rain today, so you decide to wear your raincoat to work.

  • You know based on experience that going slightly over the speed limit increases your chances of hitting more green lights in a row on your way to work (as long as you don't get a ticket doing it).

  • On your way to work, you wonder whether your assistant, Bob, is going to call in sick today, because it's Friday, and he takes about 75% of his sick days on Fridays. (You also ponder the chance that Bob will tell you he's found another job, an event that has a much lower chance of occurring, you suppose.)

  • You buy a lottery ticket on your lunch hour because "Someone's got to win, and it may as well be me!" (By the way, your chances of winning the jackpot are 1 in 89 million this time, so don't hold your breath.)

  • On TV, you hear about the latest health report that says that if you take a small power nap during the day, you'll reduce your chances of insomnia by 35%. (You fall asleep during the rest of the report.)

  • You end the day by watching your favorite baseball team win another game, and you dream about the chances of winning the World Series.

Probability is also used in virtually every workplace, from marketing companies to investing firms, from government agencies to manufacturing facilities, and from hospitals to restaurants. The following list includes just some of the many examples of how probability is used in the workplace:

  • A small company conducts a survey to find out whether customers like a product enough for the company to offer it on the Home Shopping Network. If the company is right, it can make piles of money; if it's wrong, the company can go broke.

  • A company that makes potato chips has to ensure that the bags are being filled to proper specifications: too few chips, and they'll get in trouble for misrepresenting their product; too many chips, and they'll lose profits. They sample bags of chips and based on those samples, figure out the probability that something is wrong with the machines.

  • Mr. I.M. Hopeful decided to explore the idea of running for governor, but before he goes to the trouble of raising the millions needed to run a campaign, he conducts a poll to determine his chances of winning an election.

  • A pharmaceutical company has a new drug for high blood pressure. Based on the clinical trials on volunteers, the company determines the probability that someone taking the drug will improve his or her condition and/or develop certain side effects.

  • A genetics engineer uses probabilities to predict genetic patterns and outcomes in a variety of areas, from designing new crops to identifying hereditary diseases early in a person's life.

  • A restaurant manager thinks about probability in terms of when and how many customers will come into his restaurant at a given time. He then tries to prepare accordingly.

  • A stock broker uses probability in her decision-making every day. She constantly wonders whether a given stock goes up or down, whether she should buy or sell, and what she should tell her clients.

 

Gaining the Edge: Probability Basics

Probability is everywhere, yet it can be hard to understand at times, because it can seem counterintuitive. The first step in gaining the edge on probability is to understand some basic rules of probability and how these rules are applied. When statisticians talk about probability, they talk about the probability of an
outcome
, which is one particular result of a random process being studied. What's a
random process
, you ask? It's any process for which the outcome is not set in stone, but can vary in a random way. For example, if you roll a sixsided die one time, the outcome (the number on the side facing up) will be one of six possible numbers: 1, 2, 3, 4, 5, or 6.

Getting the rules down
Tip 

Consider the following basic rules of probability:

  • The probability of an outcome is the percentage of times that the outcome is expected to happen. This can often be calculated by taking the number of ways that the outcome can happen divided by the total number of possible outcomes. For example, the probability of the number 1 appearing when a single die is rolled is 1 out of 6, or 1/6 (or 16.7%).

  • Every probability is a number (a percentage) between 0% and 100%. (Note that statisticians often express percentages as proportions — numbers between 0 and 1.) If an outcome has a probability of 0%, it can
    never
    happen, no matter what. If an outcome has probability of 100%, it
    always
    happens, no matter what. Most probabilities are neither 0% nor 100%, but fall somewhere in between.

  • The sum of the probabilities of all possible outcomes is 1 (or 100%).

  • To get the probability of obtaining one of a set of outcomes, you add up the probabilities of each outcome individually. For example, the probability of rolling an odd number (1, 3, or 5) on a single die is the sum of the probabilities of rolling a 1, a 3, and a 5:
    = ½, or 50%.

  • The
    complement
    of an event is all possible outcomes
    except
    those that make up the event. The probability of the complement of an event is 1 minus the probability of the event. For example, rolling a 1, 2, 3, 4, or 5 is the complement of rolling a 6 on a single die, so the probability of rolling either a 1, 2, 3, 4, or 5 is 1 minus the probability of rolling a 6, or 1 –

Tip 

When the complement of an event is complicated, it's often easier to find the probability of the event itself, and take 1 minus the calculated probability. Why take 1 minus this probability? Because the sum of the probabilities of all the outcomes is 1, so the probability of the complement of an event plus the probability of the event must be 1.

Rolling the dice

In the gambling game of craps, two dice are rolled, and the number 7 plays an important role in this game. In craps, each outcome is composed of the two numbers on the dice (for example, the combination 6, 2 is one outcome). The numbers on the two dice are added together to get the sum. (See
Table 6-1
.) The sum of 7 is the sum that happens most often, and, therefore, has the highest probability of occurring. The shooter (the person rolling the dice) rolls the dice and whatever he/she gets is called the
come-out
roll (for example a 6, 2 combination makes a sum of 8 for the come-out roll). If the come-out roll sums up to 7, the shooter is done with his turn, and everyone that placed a bet loses. If the come-out roll does not sum up to 7, the shooter keeps rolling the dice until either a sum of 7 appears or the sum that showed on the come-out roll appears (in this case, 8). Anyone around the table can bet that a sum of 7 will or won't come up before the sum of the come-out roll comes up again. And that's why everyone at the craps table gets so excited and cheers on the shooter. They're hoping that the shooter will bring them good luck and roll the combinations they're betting on.

You can use the rules of probability listed in the preceding section to look at the outcomes of the sum of two dice and assign probabilities to them. Do you know which sum(s) have the second highest probability of occurring?

When two dice are rolled, each die has six possible results; together, these six possible results on each die yield 36 (6 × 6) possible combinations of two numbers, or 36 possible pairs. Because in this example, an outcome is the
sum of the numbers obtained from the two rolled dice, you have eleven different possible outcomes, which range from 2 (that is, 1 + 1) to 12 (that is, 6 + 6), and everything in between.
Table 6-1
shows the 36 possible results of the dice rolls, as well as the 11 different sums.

Table 6-1:
Outcomes for the Sum of Two Dice

Result of Dice Roll

Sum

Result of Dice Roll

Sum

Result of Dice Roll

Sum

Result of Dice Roll

Sum

Result of Dice Roll

Sum

Result of Dice Roll

Sum

1, 1

2

2, 1

3

3, 1

4

4, 1

5

5, 1

6

6, 1

7

1, 2

3

2, 2

4

3, 2

5

4, 2

6

5, 2

7

6, 2

8

1, 3

4

2, 3

5

3, 3

6

4, 3

7

5, 3

8

6, 3

9

1, 4

5

2, 4

6

3, 4

7

4, 4

8

5, 4

9

6, 4

10

1, 5

6

2, 5

7

3, 5

8

4, 5

9

5, 5

10

6, 5

11

1, 6

7

2, 6

8

3, 6

9

4, 6

10

5, 6

11

6, 6

12

You can use the first rule of probability (see the "
Getting the rules down
" section) to calculate the probabilities for each of the 11 possible sums. A list of all outcomes and their probabilities is called a
probability model
. For example, a sum of 7 can happen in 6 different ways: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1). With 36 possible combinations for the two dice, the probability that the sum is 7 is 6 ÷ 36, or
. Similarly, you can figure out the probabilities of getting sums 2 through 12. The probability model for the sum of two dice is shown in
Table 6-2
. You can see that two sums have the second-highest probability of
; these are the sums on either side of 7 (6 and 8). Note that the sum of all of the probabilities in
Table 6-2
is equal to 1. Also note that the probabilities steadily increase as the sum of the dice goes from 2 to 3 to 4, 5, 6, and peaks out when the sum of the dice is 7 (that's because the number of combinations that can result in a sum of 7 is higher than for any other sum). The probabilities steadily decrease again as the sum goes from 8 to 9, and so on, up to 12.

Table 6-2:
Probability Model for the Sum of Two Dice

Sum of Dice

Probability

2

1/36

3

2/36

4

3/36

5

4/36

6

5/36

7

6/36

8

5/36

9

4/36

10

3/36

11

2/36

12

1/36

Payouts in any gambling game are based on probabilities. In craps, for example, you can make side bets on what the sum is going to be for any given roll. If you bet that on a given roll the shooter will come up with a sum of 2 and that actually happens, you'll win more than if you bet that on a given roll the sum of 8 will come up. Why? Because getting a sum of 2 on two dice is much less likely to happen than getting a sum of 8 on two dice, according to
Table 6-2
. That's why they call it gambling. (For more on probability and gambling, see
Chapter 7
.)

HEADS UP 

The probabilities for the sum of two dice were fairly straightforward to calculate. However, other probabilities can be more involved, for example the probabilities for different poker hands such as a full house, straight flush, or two pairs. What's important to remember, though, is that the ranking of the hands in poker is directly related to the probability of getting that hand; the highest hand in poker is a royal flush (10, Jack, Queen, King, and Ace, all of the same suit). The reason the royal flush is the highest hand is because it's the one with the lowest probability of occurring.

Other books

Stalking the Nightmare by Harlan Ellison
Spark by Holly Schindler
Dragons of the Watch by Donita K. Paul
I Trust You by Katherine Pathak
Barbara Metzger by Rakes Ransom