A Field Guide to Lies: Critical Thinking in the Information Age (12 page)

BOOK: A Field Guide to Lies: Critical Thinking in the Information Age
4.01Mb size Format: txt, pdf, ePub

The people who complete a study may well be different from
those who stop before it’s over. Some of the people you contact simply won’t respond. This can create a bias when the types of people who respond to your survey are different from the ones who don’t, forming a special kind of sampling bias called non-response error.

Let’s say you work for Harvard University and you want to show that your graduates tend to earn large salaries just two years after graduation. You send out a questionnaire to everyone in the graduating class. Already you’re in trouble: People who have moved without telling Harvard where they went, who are in prison, or who are homeless won’t receive your survey. Then, among the ones who respond, those who have high incomes and good feelings about what Harvard did for them might be more likely to fill out the survey than those who are jobless and resentful. The people you don’t hear from contribute to non-response error, sometimes in systematic ways that distort the data.

If your goal in conducting the Harvard income-after-two-years survey is to show that a Harvard education yields a high salary, this survey may help you show that to most people. But the critical thinker will realize that the kinds of people who attend Harvard are not the same as the average person. They tend to come from higher-income families, and this is correlated with a student’s future earnings. Harvard students tend to be go-getters. They might have earned as high a salary if they had attended a college with a lesser reputation, or even no college at all. (Mark Zuckerberg, Matt Damon, and Bill Gates are financially successful people who dropped out of Harvard.)

If you simply can’t reach some segment of the population, such as military personnel stationed overseas, or the homeless and institutionalized, this sampling bias is called
coverage error
because
some members of the population from which you want to sample cannot be reached and therefore have no chance of being selected.

If you’re
trying to figure out what proportion of jelly beans in a jar are red, orange, and blue, you may not be able to get to the bottom of the jar. Biopsies of organs are often limited to where the surgeon can collect material, and this is not necessarily a representative sample of the organ. In psychological studies, experimental subjects are often college undergraduates, who are not representative of the general population. There is a great diversity of people in this country, with differing attitudes, opinions, politics, experiences, and lifestyles. Although it would be a mistake to say that all college students are similar, it would be equally mistaken to say that they represent the rest of the population accurately.

Reporting Bias

People sometimes lie when asked their opinions. A Harvard graduate may overstate her income in order to appear more successful than she is, or may report what she thinks she should have made if it weren’t for extenuating circumstances. Of course, she may understate as well so that the Harvard Alumni Association won’t hit her up for a big donation. These biases may or may not cancel each other out. The average we end up with in a survey of Harvard graduates’ salaries is only the average of what they reported, not what they actually earn. The wealthy may not have a very good idea of their annual income because it is not all salary—it includes a great many other things that vary from year to year, such as income from investments, dividends, bonuses, royalties, etc.

Maybe you ask people if they’ve cheated on an exam or on their
taxes. They may not believe that your survey is truly confidential and so may not want to report their behavior truthfully. (This is a problem with estimating how many illegal immigrants in the U.S. require health care or are crime victims; many are afraid to go to hospitals and police stations for fear of being reported to immigration authorities.)

Suppose you want to know
what magazines people read. You could ask them. But they might want to make a good impression on you. Or they might want to think of themselves as more refined in their tastes than they actually are. You may find that a great many more people report reading the
New Yorker
or the
Atlantic
than sales indicate, and a great many fewer people report reading
Us Weekly
and the
National Enquirer.
People don’t always tell the truth in surveys. So here, you’re not actually measuring what they read, you’re measuring snobbery.

So you come up with a plan: You’ll go to people’s houses and see what magazines they actually have in their living rooms. But this too is biased: It doesn’t tell you what they actually read, it only tells you what they choose to keep after they’ve read it, or choose to display for impression management. Knowing what magazines people read is harder to measure than knowing what magazines people
buy
 (or display). But it’s an important distinction, especially for advertisers.

What factors underlie whether an individual identifies as multiracial? If they were raised in a single racial community, they may be less inclined to think of themselves as mixed race. If they experienced discrimination, they may be more inclined.
We
might define multiraciality precisely, but it doesn’t mean that people will report it the way we want them to.

Lack of Standardization

Measurements must be standardized. There must be clear, replicable, and precise procedures for collecting data so that each person who collects it does it in the same way. Each person who is counting has to count in the same way. Take Gleason grading of tumors—it is only relatively standardized, meaning that you can get different Gleason scores, and hence cancer stage labels, from different pathologists. (In
Gleason scoring, a sample of prostate tissue is examined under a microscope and assigned a score from 2 to 10 to indicate how likely it is that a tumor will spread.) Psychiatrists differ in their opinions about whether a certain patient has schizophrenia or not. Statisticians disagree about what constitutes a sufficient demonstration of psychic phenomena. Pathology, psychiatry, parapsychology, and other fields strive to create well-defined procedures that anyone can follow and obtain the same results, but in almost all measurements, there are ambiguities and room for differences of opinion. If you are asked to weigh yourself, do you do so with or without clothes on, with or without your wallet in your pocket? If you’re asked to take the temperature of a steak on the grill, do you measure it in one spot or in several and take the average?

Measurement Error

Participants may not understand a question the way the researcher thought they would; they may fill in the wrong bubble on a survey, or in a variety of unanticipated ways, they may not give the answer that they intended. Measurement error occurs in every measurement, in every scientific field. Physicists at CERN reported that they
had measured neutrinos traveling faster than the speed of light, a finding that would have been among the most important of the last hundred years. They reported later that
they had made an error in measurement.

Measurement error turns up whenever we quantify anything. The 2000 U.S. presidential election came down to measurement error (and to unsuccessfully recording people’s intentions): Different teams of officials, counting the same ballots, came up with different numbers. Part of this was due to disagreements over how to count a dimpled chad, a hanging chad, etc.—problems of definition—but even when strict guidelines were put in place, differences in the count still showed up.

We’ve all experienced this: When counting pennies in our penny jar, we get different totals if we count twice. When standing on a bathroom scale three times in a row, we get different weights. When measuring the size of a room in your house, you may get slightly different lengths each time you measure. These are explainable occurrences: The springs in your scale are imperfect mechanical devices. You hold the tape measure differently each time you use it, it slips from its resting point just slightly, you read the sixteenths of an inch incorrectly, or the tape measure isn’t long enough to measure the whole room so you have to mark a spot on the floor and take the measurement in two or three pieces, adding to the possibility of error. The measurement tool itself could have variability (indeed, measurement devices have accuracy specifications attached to them, and the higher-priced the device, the more accurate it tends to be). Your bathroom scale may only be accurate to within half a pound, a postal scale within half an ounce (one thirty-second of a pound).

A
1960 U.S. Census study recorded sixty-two women aged fifteen
to nineteen with twelve or more children, and a large number of fourteen-year-old widows. Common sense tells us that there can’t be many fifteen- to nineteen-year-olds with twelve children, and fourteen-year-old widows are very uncommon. Someone made an error here. Some census-takers might have filled in the wrong box on a form, accidentally or on purpose to avoid having to conduct time-consuming interviews. Or maybe an impatient (or impish) group of responders to the survey made up outlandish stories and the census-takers didn’t notice.

In 2015 the New England Patriots were accused of tampering with their footballs, deflating them to make them easier to catch. They
claimed measurement error as part of their defense. Inflation pressures for the footballs of both teams that day, the Pats and the Indianapolis Colts, were taken after halftime. The Pats’ balls were tested first, followed by the Colts’. The Colts’ balls would have been in a warm locker room or office longer, giving them more time to warm up and thus increase pressure. A federal district court accepted this, and other testimony, and ruled there was insufficient evidence of tampering.

Measurement error also occurs when the instrument you’re using to measure—the scale, ruler, questionnaire, or test—doesn’t actually measure what you intended it to measure. Using a yardstick to measure the width of a human hair, or using a questionnaire about depression when what you’re really studying is motivation (they may be related but are not identical), can create this sort of error. Tallying which candidates people support financially is not the same as knowing how they’ll vote; many people give contributions to several candidates in the same race.

Much ink has been spilled over tests or surveys that purport to
show one thing but show another. The IQ test is among the most misinterpreted tests around. It is used to assess people’s intelligence, as if intelligence were a single quantity, which it is not—it manifests itself in different forms, such as spatial intelligence, artistic intelligence, mathematical intelligence, and so forth. And IQ tests are known to be biased toward middle-class white people. What we usually want to know when we look at IQ test results is how suitable a person is for a particular school program or job. IQ tests can predict performance in these situations, but probably not because the person with a high IQ score is necessarily more intelligent, but because that person has a history of other advantages (economic, social) that show up in an IQ test.

If the statistic you encounter is based on a survey, try to find out what questions were asked and if these seem reasonable and unbiased to you. For any statistic, try to find out how the subject under study was measured, and if the people who collected the data were skilled in such measurements.

Definitions

How something is defined or categorized can make a big difference in the statistic you end up with. This problem arises in the natural sciences, such as in trying to grade cancer cells or describe rainfall, and in the social sciences, such as when asking people about their opinions or experiences.

Did it rain today in the greater St. Louis area? That depends on how you define rain. If only one drop fell to the ground in the 8,846 square miles that comprise “greater St. Louis” (according to the U.S. Office of Management and Budget), do we say it rained? How many
drops have to fall over how large an area and over how long a period of time before we categorize the day as one with rainfall?

The U.S. Bureau of Labor Statistics has two different ways of measuring inflation based on two different definitions. The Personal Consumption Expenditures (PCE) and the Consumer Price Index (CPI) can yield different numbers. If you’re comparing two years or two regions of the country, of course you need to ensure that you’re using the same index each time. If you simply want to make a case about how inflation rose or fell recently, the unscrupulous statistic user would pick whichever of the two made the most impact, rather than choosing the one that is most appropriate, based on an understanding of their differences.

Or what does it mean to be homeless? Is it someone who is sleeping on the sidewalk or in a car? They may have a home and are not able or choose not to go there. What about a woman living on a friend’s couch because she lost her apartment? Or a family who has sold their house and is staying in a hotel for a couple of weeks while they wait for their new house to be ready? A man happily and comfortably living as a squatter in an abandoned warehouse? If we compare homelessness across different cities and states, the various jurisdictions may use different definitions. Even if the definition becomes standardized across jurisdictions, a
statistic you encounter may not have defined homelessness the way that you would. One of the barriers to solving “the homelessness problem” in our large cities is that we don’t have an agreed-upon definition of what it is or who meets the criteria.

Whenever we encounter a news story based on new research, we need to be alert to how the elements of that research have been defined. We need to judge whether they are acceptable and
reasonable. This is particularly critical in topics that are highly politicized, such as abortion, marriage, war, climate change, the minimum wage, or housing policy.

And nothing is more politicized than, well, politics. A definition can be wrangled and twisted to anyone’s advantage in public-opinion polling by asking a question just-so.
Imagine that you’ve been hired by a political candidate to collect information on his opponent, Alicia Florrick. Unless Florrick has somehow managed to appeal to everyone on every issue, voters are going to have gripes. So here’s what you do: Ask the question “Is there anything at all that you disagree with or disapprove of, in anything the candidate has said, even if you support her?” Now almost everyone will have some gripe, so you can report back to your boss that “81 percent of people disapprove of Florrick.” What you’ve done is collected data on one thing (even a single minor disagreement) and swept it into a pile of similar complaints, rebranding them as “disapproval.” It almost sounds fair.

Other books

The Box by Peter Rabe
The Blessings by Elise Juska
Why Women Have Sex by Cindy M. Meston, David M. Buss
Sky Pirates by Liesel Schwarz
The Creation Of Eve by Lynn Cullen
Going the Distance by Julianna Keyes
Of This Earth by Rudy Wiebe