Statistics for Dummies (18 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
7.25Mb size Format: txt, pdf, ePub
Crawling with a baby

How much ground does an 8-month-old crawling baby cover?
Figure 4-16
shows two histograms that represent the same data set (distances my baby crawled during a six-hour testing period). In each case, distances were rounded to the nearest foot. In the top portion of the figure, the measurements are broken into 5-foot increments, and the data seem to be distributed in a uniform way. In other words, the number of times he crawled each distance (0–5 feet, 5–10 feet, and 10–15 feet) was approximately the same. The data don't look very interesting. But in the bottom portion of the figure, I break the distances into smaller, one-foot increments, and the histogram looks different and a lot more interesting.

Figure 4-16:
Baby's crawling distance.

In this histogram, you can see two distinct groupings of distances, indicating that my baby tended to either crawl a shorter distance (around 5 feet) or a longer distance (around 10 feet) to get where he wanted to go. This makes sense because at the time I collected the data, my baby's toy basket was about 5 feet from the starting point, and the newspaper pile (another favorite toy) was 10 feet away. The second histogram is a much better representation of the data: how far my baby crawled in the given setting.

So how much ground did he cover in those six hours? Using the bottom portion of
Figure 4-16
, you can find the total crawling distance because the bars are in one-foot increments. Multiply the height of each bar times the distance, and then sum them all up. My baby's total crawling distance in this six-hour period was a whopping 398 feet, or 132.7 yards — more than the length of one football field!

HEADS UP 

Note that I could have broken down the distances into even smaller increments, but that would only make the histogram look cluttered and busy and wouldn't have given you any additional information. A happy medium exists in terms of the number of groupings used and the range of values they represent. Each histogram is slightly different, but somewhere between 6 and 12 groupings is a generally a good number of bars for a histogram. If the histogram has too few bars, the data don't show anything; if it has too many, the data are too disjointed and patterns get lost.

REMEMBER 

Be sure to take the scale into account on both the horizontal and vertical axes when examining the results presented in a histogram. The same data can be made to look different, depending on how they're grouped (for example, into few versus many groups) and depending on the scale of the vertical axis, which can make the bars appear taller or shorter than you'd otherwise expect.

Interpreting a histogram

You can use a histogram to tell you three main features of numerical data:

  • How the data are distributed (symmetric, skewed right, skewed left, bell-shaped, and so on)

  • The amount of variability in the data

  • Where the center of the data is (approximately)

Evaluating a histogram
Tip 

To picture the statistical quality of a histogram:

  • Examine the scale used for the vertical (frequency or relative frequency) axis and beware of results that appear exaggerated or played down through the use of inappropriate scales.

  • Check out the units on the vertical axis to see whether the histogram reports frequencies (numbers) or relative frequencies (percentages), and then take this into account when evaluating the information.

  • Look at the scale used for the groupings of the numerical variable (on the horizontal axis). If the range for each group is very small, the data may look overly volatile. If the ranges are very large, the data may appear to be smoother than they really are.

 

Chapter 5:
Means, Medians, and More

Astatistic
is a number that summarizes some characteristic about a set of data. Of the hundreds of statistics that exist, a few of them are used so often that they commonly appear in the workplace and in other facets of everyday life. In this chapter, you find out which statistics are used most often, how these statistics are used, what they mean, and how they're misused.

Every data set has a story, and if used properly, statistics do a good job of telling that story. Statistics that are improperly used can tell a different story, or only part of the story, so knowing how to make good decisions about the information you're given is very important. In this chapter, you see some of the most common summary statistics. You find out more about what these summary statistics say and what they don't say about the data, which can be grouped as either numerical or categorical.

Summing Up Data with Statistics

Statistics are used to summarize some of the most basic information in a data set. Summarizing information has several different purposes. Picture your boss coming to you and asking, "What's our client base like these days and who's buying our products?" How would you like to answer that question — with a long, detailed, and complicated stream of numbers and statistics that are sure to glaze over her eyes? Probably not. You want clean, clear, and concise numbers that sum up the client base for her, so that she can see how brilliant you are, and then send you off to collect even more data to see how she can include more people in the client base. (That's what you get for being efficient.) So, statistics are often used to provide people with information that is easy to understand and that answers their questions (if answering their questions is possible).

Summarizing statistics has other purposes, as well. After all of the data have been collected from a survey or some other kind of study, the next step is for the researcher to try to make sense out of the data. Typically, the first step researchers take is to run some basic statistics on the data to get a rough idea about what's happening in the data. Later in the process, researchers can do more analyses to formulate or test claims made about the population, estimate certain characteristics about the population, look for links between items they measured, and so on.

Another big part of research is reporting the results, not only to your peers, but to the media and to the general public. While a researcher's peers may be waiting and expecting to hear about all the complex analyses that were done on a data set, the general public is neither ready for nor interested in that. What does the public want? Basic information. So, statistics that make your point clearly and concisely are commonly used to relay information to the media and to the public.

HEADS UP 

Many times, statistics are used to give a quick and dirty summary of a situation that's actually pretty complicated. In such a situation, less is not more, and sometimes the real story behind the data can get lost in the shuffle. While you have to accept that getting sound bytes of information is a fact of life these days, be sure the group putting out the data isn't watering it down at the same time. Think about which statistics are reported, what these statistics really mean, and what information is missing. This chapter focuses on these questions.

 

Summarizing Categorical Data

Categorical data
capture qualities or characteristics about the individual, such as a person's eye color, gender, political party, or opinion on some issue (using categories such as agree, disagree, or no opinion). Categorical data tend to fall into groups or categories pretty naturally. "Political party", for example, typically has four groups: Democrat, Republican, Independent, and other. Categorical data often come from survey data, but they can also be collected in experiments. For example, in an experimental test of a new medical treatment, researchers may use three categories to assess the outcome of the experiment: Did the patient get better, worse, or stay the same while undergoing the treatment?

Categorical data are often summarized by reporting the percentage of individuals falling into each category. For example, pollsters may report the percentage of Republicans, Democrats, Independents, and others who took part in a survey. To calculate the percentage of individuals in a certain category, find the number of individuals in that category, divide by the total number of people in the study, and then multiply by 100%. For example, if a
survey of 2,000 teenagers included 1,200 females and 800 males, the resulting percentages would be (1,200 ÷ 2,000) × 100% = 60% female and (800 ÷ 2,000) × 100% = 40% male.

You can further break down categorical data by creating something called crosstabs.
Crosstabs
(also called
two-way tables
) are tables with rows and columns. They summarize the information from two categorical variables at once, such as gender and political party, so you can see (or easily calculate) the percentage of individuals in each combination of categories. For example, if you had data about the gender and political party of your respondents, you would be able to look at the percentage of Republican females, Republican males, Democratic females, Democratic males, and so on. In this example, the total number of possible combinations in your table would be 2 × 4 = 8, or the total number of gender categories times the total number of party affiliation categories.

The U.S. government calculates and summarizes loads of categorical data using crosstabs. The U.S. Census Bureau doesn't just count the population; it also collects and summarizes data from a subset of all Americans (those who fill out the long form) on various demographic characteristics, such as gender and age. Typical age and gender data, reported by the U.S. Census Bureau for a survey conducted in 2001, are shown in
Table 5-1
. (Normally, age would be considered a numerical variable, but the way the U.S. government reports it, age is broken down into categories, making it a categorical variable. See the following section for more on numerical data.)

Table 5-1:
Population, Broken Down by Age and Gender (2001)

Age

Total

%

# Males

% Males

# Females

% Females

Under 5 years

19,369,341

6.80

  9,905,282

7.08

  9,464,059

6.53

5 to 9 years

20,184,052

7.09

10,336,616

7.39

  9,847,436

6.79

10 to 14 years

20,881,442

7.33

10,696,244

7.65

10,185,198

7.03

15 to 19 years

20,267,154

7.12

10,423,173

7.46

  9,843,981

6.79

20 to 24 years

19,681,213

6.91

10,061,983

7.20

  9,619,230

6.63

25 to 29 years

18,926,104

6.65

  9,592,895

6.86

  9,333,209

6.44

30 to 34 years

20,681,202

7.26

10,420,677

7.45

10,260,525

7.08

35 to 39 years

22,243,146

7.81

11,104,822

7.94

11,138,324

7.68

40 to 44 years

22,775,521

8.00

11,298,089

8.08

11,477,432

7.92

45 to 49 years

20,768,983

7.29

10,224,864

7.31

10,544,119

7.27

50 to 54 years

18,419,209

6.47

  9,011,221

6.45

  9,407,988

6.49

55 to 59 years

14,190,116

4.98

  6,865,439

4.91

  7,324,677

5.05

60 to 64 years

11,118,462

3.90

  5,288,527

3.78

  5,829,935

4.02

65 to 69 years

  9,532,702

3.35

  4,409,658

3.15

  5,123,044

3.53

70 to 74 years

  8,780,521

3.08

  3,887,793

2.78

  4,892,728

3.37

75 to 79 years

  7,424,947

2.61

  3,057,402

2.19

  4,367,545

3.01

80 to 84 years

  5,149,013

1.81

  1,929,315

1.38

  3,219,698

2.22

85 to 89 years

  2,887,943

1.01

      926,654

0.66

  1,961,289

1.35

90 to 94 years

  1,175,545

0.41

      303,927

0.22

      871,618

0.60

95 to 99 years

    291,844

0.10

      58,667

0.04

      233,177

0.16

100 years and over

      48,427

0.02

      9,860

0.01

      38,567

0.03

Total,all ages

284,796,887

100

139,813,108

100

144,983,779

100

You can examine many different facets of the population by looking at and working with different numbers from
Table 5-1
. Looking at gender, notice that women slightly outnumber men, because the population in 2001 was 51% female (divide total number of females by total population size and multiply by 100%) and 49% male (divide total number of males by total population size and multiply by 100%). You can also look at age: The percentage of the entire population that is age 5 and under was 6.8%; the largest group belongs to the 40–44 year olds, who made up 8% of the population. Next, you can explore a possible relationship between gender and age by comparing various parts of the table. You can compare, for example, the percentage of females to males in the 80-and-over age group. Because these data are reported in five-year increments, you have to do a little math in order to get your answer, though. The percentage of the population that's female and aged 80 and above is 2.22% + 1.35% + 0.6% + 0.16% + 0.03% = 4.36%. The percentage of males aged 80 and over is 1.38% + 0.66% + 0.22% + 0.04% + 0.01% = 2.31%. This shows that the 80-and-over age group contains almost twice as many women as men. These data seem to confirm the notion that women tend to live longer than men.

HEADS UP 

If you're given the number of individuals in each group, you can always calculate your own percents. But if you're only given percentages without the total number in the group, you can never retrieve the original number of individuals in each group. For example, you could hear that 80% of the people surveyed
prefer Cheesy cheese crackers over Crummy cheese crackers. But how many were surveyed? It could be only 10 people, for all you know, because 8 out of 10 is 80%, just as 800 out of 1,000 is 80%. These two fractions (8 out of 10 and 800 out of 1,000) have different meanings for statisticians, because in the first case, the statistic is based on very little data, and in the second case, it's based on a lot of data. (See
Chapter 10
for more information on data accuracy and margin of error.)

TECHNICAL STUFF 

After you have the crosstabs that show the breakdown of two categorical variables, you can conduct statistical tests to determine whether a significant relationship or link between the two variables exists. (See
Chapter 18
for more information on these statistical tests.)

Other books

Now You See Me by Kris Fletcher
Strictly Professional by Sandy Sullivan
Standing Strong by Fiona McCallum
Apportionment of Blame by Keith Redfern
Crossing Values by Carrie Daws
Love on a Deadline by Kathryn Springer