Read The Secret Life of Pronouns Online
Authors: James W. Pennebaker
211 The speed-dating project had a complicated history. Paul Eastwick, a faculty member at Texas A&M, visited our department in the spring of 2010 to describe some speed-dating research he had been doing with a colleague of his, Eli Finkel, who is at Northwestern. Molly Ireland was fascinated by his talk and asked if he would be interested in applying the LSM methodology to the speed-dating transcripts. Within a few days, Molly’s analyses yielded the remarkable finding that LSM in speed-dating conversations was a powerful predictor of later dates. Molly then added the speed-dating analyses to Richard Slatcher’s IM project (see p. 212) and, in record time, submitted the paper to a top journal, where it was accepted and published. The resulting paper is Ireland, Slatcher, Eastwick, Scissors, Finkel, and Pennebaker (2011).
212–215 The IM project was initially published as Slatcher and Pennebaker (2006) and then, with the reanalyses of the data, as Ireland, Slatcher, et al. (2011).
216 John Gottman’s research on relationships has a number of practical applications for making good marriages. In addition to his books and articles,
New York Times
writer Tara Parker-Pope has written a balanced book on marriage and relationships that relies on some of the most recent research.
218–223 The analyses of Elizabeth Barrett and Robert Browning, Sylvia Plath and Ted Hughes, and Sigmund Freud and Carl Jung were part of a paper published by Molly Ireland and me in 2010.
CHAPTER 9: SEEING GROUPS, COMPANIES, AND COMMUNITIES THROUGH THEIR WORDS
228 Several studies have tracked language use and its relationship with successful marriages. Not surprisingly, use of pronouns, especially we-words, between the couples is a reliable predictor. See the work of Seider and colleagues (2009) and of Rachel Simmons, Peter Gordon, and Diane Chambless (2005).
229 The project linking pronoun use among couples and heart failure was conducted by Rohrbaugh and colleagues.
The Sexton and Helmreich project focused only on flight simulation studies. Later analyses by Brian Sexton found links between low we-word use and human error in the cockpit voice recordings of planes that had crashed (personal communication, April 20, 2010). See also the work of Foushee and Helmreich.
232 One of the more interesting approaches to studying natural interactions was pioneered by Bill Ickes, a social psychologist at the University of Texas at Arlington. In a typical study, pairs of students are instructed to visit Ickes’s research lab to participate in a conversation. After both complete questionnaires and a consent form to be videotaped, the experimenter tries to begin filming and then “discovers” that his camera is broken. The experimenter leaves the lab, claiming he’s going to find a technician. The students remain in the lab and usually begin talking with one another.
What they don’t know is that another hidden camera is taping their interaction. Later, the students are told about the hidden camera and are asked to rate their interaction on a minute-by-minute basis. Ickes is able to see how the two people were thinking about each other as their conversation unfolded. Bill has kindly allowed us to analyze some of his interactions. I strongly recommend his recent book,
Strangers in a Strange Lab
.
And while we are talking about real-world approaches to studying the behavior of people, I insist that you check out the work of Sandy Pentland and Roz Piccard, who are at MIT’s Media Laboratory. Together and separately, the two have devised a striking number of methods that track how people see and emotionally react to their worlds as they go about daily life.
232–235 One way to think about the increase in we-words over time is that the longer people talk with others, the more their identities become fused. Bill Swann and his colleagues have been conducting a number of imaginative projects tracking identity fusion. For example, making people more aware of their own group increases the likelihood that they will endorse fighting and dying for it.
233 The national defense project was run by Andrew Scholand, Yla Tausczik, and me and funded by Sandia National Laboratory. The research tracking twenty professional therapists over three years was conducted by Susan Odom and Stephanie Rude. The findings are reported in Odom’s dissertation, which was completed in 2006.
234–235 Drops in suicide rates following terrorist attacks have been reported by Emad Salib and his colleagues. Additional findings about language and psychological changes following the subway bombings in Madrid in 2004 have been reported by Itziar Fernandez, Dario Paez, and me. The language changes in written essays among New Orleans residents after Hurricane Katrina were collected by Sandy Hartman.
238–239 A former graduate student of mine, Amy Gonzales, conducted a complex laboratory experiment where groups of students had to work together either in face-to-face groups or in online groups. The details are reported in Gonzales, Hancock, and Pennebaker (2010). A second project, which was described earlier, was run with business school students by Ethan Burris and his colleagues. The two lab studies are consistent with some fascinating real-world projects conducted by Paul Taylor and his colleagues. For example, Taylor found higher LSM levels in the transcripts of successful hostage negotiations between police and hostage-takers in the UK relative to unsuccessful hostage negotiations.
240–243 The Craigslist project is part of a larger study focusing on measures of community cohesiveness. The primary team members include Cindy Chung, Yla Tausczik, and me. We are indebted to Mark Hayward for his help in providing the relevant Gini statistics.
243–247 The word-catching research is based on an archive of tape recordings I have collected between 1990 and 2010. They include the anlayses of 1,162 conversational files of people in the real world having natural conversations. Discriminant analyses (for you statistics fans out there) show that cross-validation classifications are accurate at 80 to 84 percent for anywhere from five to seven settings, where 16 to 20 percent is chance.
248 One of my favorite language maps tracks the usage of the words
pop
,
soda
, and
Coke
as generic names for soft drinks. Check out www.popvssoda.com.
248–253 One of the giants in the world of sociolinguistics is William Labov from the University of Pennsylvania. Labov has pioneered ways to track how word usage and accents change across regions and time. Some of his early work, for example, examined language differences within blocks and neighborhoods of large cities. Later, he began to focus on much broader trends across the United States.
Due in large part to Labov’s influence, the University of Pennsylvania has taken an important lead in advancing our knowledge of social communication and language use. It houses the Linguistic Data Consortium, or LDC (www.ldc.upenn.edu), which houses one of the largest text archives in the world. In addition, Mark Liberman—a particularly thoughtful linguist—has created Language Log, a highly influential blog site (languagelog.ldc.upenn.edu).
249–251 The
This I Believe
project has been growing in multiple directions. Cindy Chung, Jason Rentfrow, and I have been developing detailed maps of language use across the United States based on both function words and content words.
251–252 A particularly hot approach to text analysis examines how people use emotion words in their blogs, tweets, or other communications. Although sentiment analysis focuses only on people’s use of positive and negative emotion words, it can provide a general overview of the happiness of cities, regions, or entire countries. For a discussion, see the work of Adam Kramer, Jason Rentfrow, and also Alex Wright’s article in the
New York Times
. Also, check out a truly wonderful book by Eric Weiner,
The Geography of Bliss
, on one man’s attempt to understand why some countries are happier than others.
252 In deducing the linguistic fingerprint of the Texas high schools, discriminant analyses showed that we could accurately classify students at a 19 to 20 percent rate, where 11 percent was chance.
CHAPTER 10: WORD SLEUTHING
258–261 Matching blog entries to specific authors can be done in a number of ways. In the chapter, we try to match blogs written today with those written many years ago by the same authors. This is much harder than matching blogs written by authors at about the same time. In fact, think back to the example of the twenty bloggers. Imagine we have, say, ten blog entries on consecutive days from each of the twenty people. We pull out one of the ten entries for each person and put this into a separate stack. The goal is to match the twenty “orphan” entries with the twenty bloggers by reading the nine blog entries of known authorship. Our computer does a much better job at guessing which orphan entry goes with which blogger. The overall hit rate is closer to 58 percent (where 5 percent is chance).
262–265 In addition to the work of Adair and of Mosteller and Wallace dealing with the Federalist Papers, be sure to see recent articles by Patric Juola (2006) and by Jeff Collins and his colleagues (2004).
265 Pardon me for a minute while I have a little chat with the twenty people on Earth who really, really want to know the methods for analyzing the Federalist Papers. The cross-validation approach is based on discriminant analyses assuming equal group size. The original function-word assignment method, which assigned all unknown texts to Madison, correctly classifed 92.4 percent of the original essays and 86.4 percent for cross-validation. The numbers for function words plus punctuation were 98.5 percent and 84.8 percent. Analyses based on the fourteen “tell” words used a binary procedure (was the word used or not within an essay) and yielded both classification and cross-validation accuracies of 98.5 percent. The one assignment error was for essay forty-one, which is attributed to Madison. The tell-word analyses estimated that Hamilton was the author of 49, 52 through 57, and 63, and that Madison was the author of 50, 51, and 62.
Whereas Hamilton claimed credit for all eleven of the unknown manuscripts, he reported that three additional ones were jointly written by Madison and himself. Madison’s later recollection was that he (Madison) had written them with some supplemental comments by Hamilton. All linguistic analyses show that the jointly written papers were completely different from either Hamilton’s or Madison’s solo-authored pamphlets. Given this, I tend to side with Hamilton’s accounts of the authorship issue rather than with Madison’s.
265–267 A recent project by Terry Pettijohn and Donald Sacco (2009) analyzed the lyrics of number one
Billboard
songs between 1955 and 2003. They discovered that during economic downturns, people preferred lyrics that were more complex, social, and future oriented.
268 There are several ways to determine if collaborations result in average or synergistic language use. Consider how John Lennon and Paul McCartney used present-tense verbs in their lyrics. For their individually written songs, Lennon consistently used more than McCartney (15.8 percent versus 13.7 percent). According to the average-person hypothesis, their collaboration should have resulted in songs that ranged between 13.7 and 15.8 percent present-tense verbs. In fact, the Lennon-McCartney eyeball-to-eyeball collaborations resulted in songs with 17.6 percent present-tense verbs. In this case, Lennon was somewhere between McCartney and Lennon-McCartney—the average writer. We can calculate the percentage of time that Lennon, McCartney, and Lennon-McCartney produced songs that were in the middle of the other two linguistically. The author who was statistically the average person for the Beatles was: 50.6 percent for Lennon, 36.1 percent for McCartney, and 13.3 percent for Lennon-McCartney. The statistically average author for the Federalist Papers was: 39.5 percent for Hamilton, 53.9 percent for Madison, and 6.6 percent for Hamilton-Madison. In other words, when collaborating Lennon-McCartney and Hamilton-Madison were far more extreme than either author on his own.
270 N-gram analyses have been used to characterize authors. For example, Art Graesser and his colleagues have also developed speech-act classifiers that assess the first three words in a sentence to determine what type of sentence is being uttered (e.g., “Are you here?” “Here you are!” “You are here.”). Their speech-act classifier can be used to determine the relative status of two interactants.
270–271 Another way to think about language use is to listen to how presidents create stories about themselves. Dan McAdams has spent much of his career analyzing the stories people tell to get a better sense of their personality. His most recent work is a fascinating analysis of George W. Bush.
272 Perhaps the best source for presidential documents is through the American Presidency Project, directed by Gerhard Peters at the University of California at Santa Barbara. Peters and his collaborators are bringing together one of the largest archives of presidential documents, including speeches, interviews, press conferences, and much more. For more information, go to www.presidency.ucsb.edu/.
273 The figure is based on summing the standardized scores (z-scores) for personal pronouns and total emotion word use. To make all the numbers positive values, a constant of 3.0 was added to the resultant z-scores.
274 Although Franklin Roosevelt’s press conferences have been transcribed, they have also been heavily edited. FDR had arrangements with press members so that large blocks would be off the record. In terms of social-emotional language, his was the lowest of any modern president. However, because his language records are so heavily edited, they have not been included in the press conference corpus.
275 Bosch quote from the 2000 program notes of the PBS documentary series
Reagan
. www.pbs.org/wgbh/amex/reagan/filmmore/description.html
275–277 The Obama missing-
I
case was originally reported on Mark Liberman’s blog, the Language Log, at http://languagelog.ldc.upenn.edu/nll/?p=1651. The I-word press conference data includes thirty-five press conferences or meetings of Obama from his inauguration in January 2009 through May 2010. Note that Mark Liberman, the founder of the Language Log blog, reported comparable findings in his analysis of Obama’s speeches.