Read Fooled by Randomness Online
Authors: Nassim Nicholas Taleb
The hoax:
Sokal (1996).
The Selfish Gene:
Dawkins (1989, 1976). Hegel: In Popper (1994).
Exquisite cadavers:
Nadeau (1970).
The generator:
www.monash.edu.au
.
Language and probability:
There is a very large connection between language and probability; it has been studied by thinkers and scientists via the sister methods of entropy and information theory—one can reduce the dimensionality of a message by eliminating redundancy, for instance; what is left is measured as information content (think of zipping a file) and is linked to the notion of “entropy,” which is the degree of disorder, the unpredictable that is left. Entropy is a very invasive notion as it relates to aesthetics and thermodynamics. See Campbell (1982) for a literary presentation, and Cover and Thomas (1991) for a scientific one, particularly the discussion on the “entropy of English.” For a classic discussion of entropy and art, Arnheim (1971), though the connection between entropy and probability was not yet clear at the time. See Georgescu-Roegen (1971) for a (perhaps) pioneering discussion of entropy in economics.
CHAPTER 5
The firehouse effect and the convergence of opinions:
There are plenty of discussions in the psychology literature of such convergence of opinions, particularly in the area of mate selection or what Keynes calls “the beauty contest,” as people tend to choose what other people choose, causing positive-feedback loops.
An interesting manifestation is the autokinetic effect. When people gaze at a stationary light in a room they see it moving after a while and can estimate the amount of movement, not knowing that it is an optical illusion. When isolated the subjects give wildly varying speeds of movement; when tested in a group they converge to a common speed of movement: See Plotkin (1998). Sornette (2003) gives an interesting account of the feedback loops that result from herding written in light, but with extremely intuitive mathematics.
Biology of imitation:
See Dugatkin (2001).
Evolution and small probabilities:
Evolution is principally a probabilistic concept. Can it be fooled by randomness? Can the least skilled survive? There is a prevalent strain of Darwinism, called naive Darwinism, that believes that any species or member of a species that dominates at any point has been selected by evolution because they have an advantage over others. This results from a common misunderstanding of local and global optima, mixed with an inability to get rid of the belief in the law of small numbers (overinference from small data sets). Just put two people in a random environment, say a gambling casino, for a weekend. One of them will fare better than the other. To a naive observer the one who fares better will have a survival advantage over the other. If he is taller or has some trait that distinguishes him from the other, such trait will be identified by the naive observer as the explanation of the difference in fitness. Some people do it with traders—make them compete in a formal competition. Consider also the naive evolutionary thinking positing the “optimality” of selection—the founder of sociobiology does not agree with such optimality when it comes to rare events: E. O. Wilson (2002) writes: “The human brain evidently evolved to commit itself emotionally only to a small piece of geography, a limited band of kinsmen, and two or three generations into the future. To look neither far ahead nor far afield is elemental in a Darwinian sense.
We are innately inclined to ignore any distant possibility not yet requiring examination. It is, people say, just good common sense.
Why do they think in this shortsighted way? “The reason is simple: It is a hardwired part of our Paleolithic heritage. For hundreds of millennia, those who worked for short-term gain within a small circle of relatives and friends lived longer and left more offspring—even when their collective striving caused their chiefdoms and empires to crumble around them. The long view that might have saved their distant descendants required a vision and extended altruism instinctively difficult to marshal.”
See also Miller (2000): “Evolution has no foresight. It lacks the long-term vision of drug company management. A species can’t raise venture capital to pay its bills while its research team . . . Each species has to stay biologically profitable every generation, or else it goes extinct. Species always have cashflow problems that prohibit speculative investments in their future. More to the point, every gene underlying every potential innovation has to yield higher evolutionary payoffs than competing genes, or it will disappear before the innovation evolves any further. This makes it hard to explain innovations.”
CHAPTER 6
Fooled by negative skewness:
The first hint of an explanation for the popularity of negatively skewed payoffs comes from the early literature on behavior under uncertainty, with the “small number problem.” Tversky and Kahneman (1971) write: “We submit that people view a sample randomly drawn from a population as highly representative, that is, similar to a population in all essential characteristics.” The consequence is the inductive fallacy: overconfidence in the ability to infer general properties from observed facts, “undue confidence in early trends,” the stability of observed patterns and deriving conclusions with more confidence attached to them than can be warranted by the data. Worst, the agent finds causal explanations or perhaps distributional attributes that confirm his undue generalization. It is easy to see that the “small numbers” get exacerbated with skewness since most of the time the observed mean will be different from the true mean and most of the time the observed variance will be lower than the true one. Now consider that it is a fact that in life, unlike a laboratory or a casino, we do not observe the probability distribution from which random variables are drawn: We only see the realizations of these random processes. It would be nice if we could, but it remains that we do not measure probabilities as we would measure the temperature or the height of a person. This means that when we compute probabilities from past data we are making assumptions about the skewness of the generator of the random series—all data is conditional upon a generator. In short, with skewed packages, the camouflage of the properties comes into play
and
we tend to believe what we see. Taleb (2004).
Philosopher sometimes playing scientist:
Nozik (1993).
Hollywood economics:
De Vany (2003).
People are sensitive to sign rather than magnitude:
Hsee and Rottenstreich (2004).
Lucas critique:
Lucas (1978).
CHAPTER 7
Niederhoffer’s book:
Niederhoffer (1997).
Goodman’s riddle of induction:
One can take the issue of induction into a more difficult territory with the following riddle. Say the market went up every day for a month. For many people of inductive taste it could confirm the theory that it is going up every day. But consider: It may confirm the theory that it goes up every day then crashes—what we are witnessing is not an ascending market but one that
ascends then crashes.
When one observes a blue object it is possible to say that one is observing something blue until time
t,
beyond which it is green—that such object is not blue but “grue.” Accordingly, by such logic, the fact that the market went up all this time may confirm that it will crash tomorrow! It confirms that we are observing a rising-crashing market. See Goodman (1954).
Writings by Soros:
Soros (1988).
Hayek:
See Hayek (1945) and the prophetic Hayek (1994), first published in 1945.
Popper’s personality:
Magee (1997), and Hacohen (2001). Also an entertaining account in Edmonds and Eidinow (2001).
CHAPTER 8
The millionaire next door: Stanley (1996).
Equity premium puzzle:
There is an active academic discussion of the “equity premium” puzzle, taking the “premium” here to be the outperformance of stocks in relation to bonds and looking for possible explanations. Very little consideration was given to the possibility that the premium may have been an optical illusion owing to the survivorship bias—or that the process may include the occurrence of black swans. The discussion seems to have calmed a bit after the declines in the equity markets after the events of 2000–2002.
CHAPTER 9
Hot-hand effect:
Gilovich, Vallone and Tversky (1985).
Stock analysts fooled by themselves:
For a comparison between analysts and weather forecasters, see Taszka and Zielonka (2002).
Differences between returns:
See Ambarish and Siegel (1996). The dull presenter was actually comparing “Sharpe ratios,” i.e., returns scaled by their standard deviations (both annualized), named after the financial economist William Sharpe, but the concept has been commonly used in statistics and called “coefficient of variation.” (Sharpe introduced the concept in the context of the normative theory of asset pricing to compute the expected portfolio returns given some risk profile, not as a statistical device.) Not counting the survivorship bias, over a given twelve-month period, assuming (very generously) the Gaussian distribution, the “Sharpe ratio” differences for two uncorrelated managers would exceed 1.8 with close to 50% probability. The speaker was discussing “Sharpe ratio” differences of around .15! Even assuming a five-year observation window, something very rare with hedge fund managers, things do not get much better.
Value of the seat:
Even then, by some attribution bias, traders tend to believe that their income is due to their skills, not the “seat,” or the “franchise” (i.e., the value of the order flow).The seat has a value as the New York Stock Exchange specialist “book” is worth quite large sums: See Hilton (2003). See also Taleb (1997) for a discussion of the time and place advantage.
Data mining:
Sullivan, Timmermann and White (1999).
Dogs not barking:
I thank my correspondent Francesco Corielli from Bocconi for his remark on meta-analysis.
CHAPTER 10
Networks:
Arthur (1994). See Barabasi (2002), Watts (2003).
Nonlinear dynamics:
For an introduction to nonlinear dynamics in finance, see Brock and De Lima (1995), and Brock, Hsieh and LeBaron (1991). See also the recent, and certainly the most complete, Sornette (2003). Sornette goes beyond just characterizing the process as fat-tailed and saying that the probability distribution is different from the one we learned in Finance 101. He studies the transition points: Say a book’s sales become close to a critical point from which they will really take off. Their dynamics, conditional on past growth, become predictable.
The Tipping Point:
Gladwell (2000). In the article that preceded the book (Gladwell,1996) he writes:“The reason this seems surprising is that human beings prefer to think in linear terms . . . . I can remember struggling with these same theoretical questions as a child, when I tried to pour ketchup on my dinner. Like all children encountering this problem for the first time, I assumed that the solution was linear: That steadily increasing hits on the base of the bottle would yield steadily increasing amounts of ketchup out the other end. Not so, my father said, and he recited a ditty that, for me, remains the most concise statement of the fundamental nonlinearity of everyday life: ‘Tomato ketchup in a bottle—None will come and then the lot’ll.’ ”
Pareto:
Before we had a generalized use of the bell curve, we took the ideas of Pareto with his distribution more seriously—its mark is the contribution of large deviations to the overall properties. Later elaborations led to the so-called Pareto-Levy or Levy-Stable distributions with (outside of special cases) some quite vicious properties (no known error rate). The reasons economists never liked to use it is that it does not offer tractable properties—economists like to write papers in which they offer the illusion of solutions, particularly in the form of mathematical answers. A Pareto-Levy distribution does not provide them with such luxury. For economic discussions on the ideas of Pareto, see Zajdenweber (2000), Bouvier (1999). For a presentation of the mathematics of Pareto-Levy distributions, see Voit (2001), and Mandelbrot (1997). There is a recent rediscovery of power law dynamics. Intuitively a power law distribution has the following property: If the power exponent were 2, then there would be 4 times more people with an income higher than $1 million than people with $2 million. The effect is that there is a very small probability of having an event of an extremely large deviation. More generally given a deviation
x,
the incidence of a deviation of a multiple of
x
will be that multiple to a given power exponent. The higher the exponent the lower the probability of a large deviation.
Spitznagel’s remark:
In Gladwell (2002).
Don’t take “correlation” and those who use the word seriously:
The same “A.” of the lighterthrowing variety taught me a bit about the fallacy of the notion of correlation. “You do not seem to be correlated to anything” is the most common blame I’ve received when carrying my strategy of shooting for rare events. The following example might illustrate it. A nonlinear trading instrument, such as a put, will be positively correlated to the underlying security over many sample paths (say the put expires worthless in a bear market as the market did not drop enough), except of course upon becoming in the money and crossing the strike, in which case the correlation reverses with a vengeance. The reader should do himself a favor by not taking the notion of correlation seriously except in very narrow matters where linearity is justified.
CHAPTER 11
Probability “blindness”:
I borrow the expression from Piattelli-Palmarini (1994).
Discussion of “rationality”:
The concept is not so easy to handle. As the concept has been investigated in plenty of fields, it has been developed the most by economists as a normative theory of choice. Why did the economists develop such an interest in it? The basis of economic analysis is a concept of human nature and rationality embodied in the notion of
homo economicus.
The characteristics and behavior of such
homo economicus
are built into the postulates of consumer choice and include nonsatiation (more is
always
preferred to less) and transitivity (global consistency in choice). For instance, Arrow (1987) writes, “It is note-worthy that the everyday usage of the term ‘rationality’ does not correspond to the economist’s definition as transitivity and completeness, that is maximization of something. The common understanding is instead the complete exploitation of information, sound reasoning, and so forth.”