Read Here Is a Human Being Online
Authors: Misha Angrist
My lesson in genome navigation was over. Dongliang installed a graphical user interface on my laptop so that I could peruse my genome from anywhere.
I was on my own. I went to my office and closed the door.
And immediately opened it again. Trying to get the server to run from my laptop was hopeless—it was molasses-slow and kept locking up. I would have to find a desktop with lots more memory and a giant hard drive.
I was only beginning to appreciate the idea that 80 billion bases was, undeniably, a metric
shitload
of data. Setting filters and plucking out what you were interested in was one thing, but even just moving the data around, waiting for the program to load, and making sure that the particular nucleotide you were looking at was where it was supposed to be and not in the half a percent that was actually wrong, was painstaking. Hugh Rienhoff called it “hand-to-hand” combat.
56
I was about to earn my stripes, or at least get a good bayoneting.
Another problem was that genome interpretation software was still not designed for civilians. In part this was because the data weren’t meant to be parsed by civilians. They were for bioinformaticians and various other übernerds: people who were used to walking and talking in Linux and writing code on the fly. If a particular analysis function didn’t work, they could intuit a workaround. I most certainly could not. Over the next several days I could be found in Dongliang’s office pestering him about how to get the Sequence Variant Analyzer to do what I wanted it to do in a way that did not tax my nonexistent computational skills. He was too nice to tell me to get lost.
If you’re above a certain age, you’ll remember the early days of the Internet: few Web pages, dial-up access, frozen computer screens, and frequent rebooting. Good times. In the realm of genome interpretation circa 2009–2010, we were still in dial-up mode. SNPedia, for example, had been online for less than three years and had annotated less than 12,000 of all of the millions of validated human SNPs
57
described in dbSNP, the NIH’s central repository of SNPs and other small variants.
58
Despite this, SNPedia and its creators had already become a critical resource for SNP annotation: the personal genomics companies used it, the PGP used it, and hobbyists used it.
59
Meanwhile George’s group had developed its own tool to make clinical sense of genomes, Trait-o-matic,
60
an open-source (natch!) program that linked out to SNPedia, Online Mendelian Inheritance in Man (OMIM), the Pharmacogenomics Knowledge Base,
61
and, pending commercial considerations,
62
the partially user-funded Human Gene Mutation Database.
63
But within those four databases’ purviews—SNPs, genetic diseases, genetic markers that influenced drug response, and all human mutations, respectively—there was not much in the way of homegrown observations. SNPedia would rank your SNPs, but in a fairly subjective and sometimes arbitrary manner determined by the community. When I asked Mike Cariaso why SNPedia had ranked my elevated risk for male-pattern baldness as my most interesting trait when I was at higher risk for heart attack, diabetes, and God knows what other serious medical condition, he wrote back with a shrug: “You’re free to change it to whatever you think is more appropriate.”
64
The onus was, finally, on the user to prioritize his or her variants. 23andMe and Navigenics made choices about what to show customers, I surmised, based on what they viewed as clinically significant, interesting, marketable, “actionable,” and what markers happened to already be on their standard SNP chips. For both companies, the information to be returned amounted to SNPs predisposing to a few dozen traits, which made sense. The expression “drinking from a fire hose” was apt when it came to genomic information: if the direct-to-consumer companies overwhelmed their customers with thousands of genotypes and each one was perceived to be a potential time bomb (or perhaps just as bad, a time
waster),
this would not have been a wise business strategy. Thus I wanted to know how Trait-o-matic prioritized my variants before I got sucked into another vortex of databases, articles, and spreadsheets.
George’s team, including geneticist Joe Thakuria, had outlined this in a paper they had just submitted. Because some 90 percent of disease-causing mutations occur in protein-coding regions, they chose to focus on those in particular, and especially on those already used in genetic testing. And because the genomes (or partial genomes) came from healthy people, that is, the PGP-10 and fifteen others, they didn’t expect to find many strong (“highly penetrant”) mutations. And they didn’t: just eleven in all. But both OMIM
65
and the database of genetic testing
66
listed just a small minority of even
potentially
harmful variants in the human genome. This made evolutionary sense: if clinically important genes were mutated all the time, then we wouldn’t be here to talk about them because we would have died in utero. The genes associated with serious diseases were telling us something: they were telling us that they mattered.
67
To identify changes in those genes and others that might be of clinical interest, the Church crew first generated a list of all of the variants they found in the twenty-five genomes they studied (some were incomplete). They matched those lists to the more than 1,500 variants they found in SNPedia and the Pharmacogenomics Knowledge Base. They then looked for those variants that resulted in an amino acid change: those would presumably have some effect on the protein and perhaps raise one’s risk for a single-gene disorder such as, say, ALS. SNPedia was also useful in finding genes that contributed to complex diseases, even if those genes did not cause disease outright. As I’ve mentioned, I carried a change in a gene that raised my risk for rheumatoid arthritis fivefold above average, for example, though it was hardly the only rheumatoid arthritis susceptibility gene we know about.
68
In the context of all of those RA susceptibility genes, my risk was probably lower than five-fold above average, though a rheumatologist assured me that no one really knew.
The Church lab also looked at changes that were known to disrupt splice sites. On their journey from DNA to protein, most genes are spliced: after the DNA is transcribed into RNA, it is cut into pieces and the protein-coding segments, the exons, are rejoined to each other and serve as the template for the protein while the bits that have been excised, the introns, are discarded. Thus the post-splicing version of an RNA molecule might be only a fraction of the length of the genomic DNA that gave rise to it. The cool thing is that the same gene can be spliced in multiple ways—different splice forms can be used in different tissues to produce similar but not identical proteins. Splicing is therefore a terrific source of protein diversity. Of course, one can imagine that if the sequence that instructs the cell to splice it goes awry, the cell won’t get the right message and bad things could happen. Aberrant splicing has been shown to play a role in certain instances of many diseases, including retinitis pigmentosa, muscular dystrophy, breast cancer, lupus, dysautonomia, cystic fibrosis, and elevated cholesterol.
69
Both Sequence Variant Analyzer and Trait-o-matic looked for these types of mutations, too.
In general, for a variant to raise a red flag, it had to be:
rare (if it was common, it would be less likely to cause severe disease or we’d all be sick or dead)
clearly associated with disease (false positives have been the bane of geneticists for decades)
likely to actually cause an observable phenotype (you can’t measure it if you don’t know it’s there)
shown to be clinically important in the literature (new occurrences were always harder to prove)
Making a “final” determination of clinical relevance still had to be done by hand, at least for now. Of the variants discovered in the PGP-10, only one was deemed to be serious. This was found in Steve Pinker (PGP6), who carried a mutation in the MYL2 gene, which had been shown, in some cases, to cause hypertrophic cardiomyopathy (HCM), a thickening of the heart muscle that makes it harder for the heart to pump blood. HCM is among the major causes of death in young athletes during strenuous exercise.
70
When I asked Steve about it by email, he said it was certainly a surprising discovery, but he seemed pretty sanguine. “I had some mildly anxious thoughts between the time that Joe Thakuria went over the pedigrees with me (which showed that the association was real enough to follow up, but too tenuous to get upset about) and an echocardiogram which revealed I am fine.”
71
I hoped that I would be as serene. I would find out soon: I was finally ready to turn my attention to my own genome.
*
As I fretted in chapter 7, I seem to carry most of these MS variants, though to my knowledge the myelin sheaths covering my nerves are just fine.
*
As mentioned in chapter 2, much of that original genome probably belongs to a guy in Buffalo code-named RP11, who unwittingly became a big part of the first reference genome.
M
y wife, the lapsed Catholic, teaches Jewish preschool and Sunday school and by now probably knows more than I do about Tisha B’Av and kosher dietary laws; she plays a mean “Hinei Mah Tov” on the guitar. I, on the other hand, since the day I donned a powder blue leisure suit in 1977 and read from the Torah on the occasion of my bar mitzvah, have spent most of my life—much to my parents’ chagrin—as a “twice-a-year Jew.”
*
I had a lengthy flirtation with Israel in the 1980s, and even lived there for a year, but I was unwilling to make the full Zionist commitment—something about the prospect of Katyusha rockets falling from the sky and taking up arms just didn’t work for me. I am prone to bouts of self-hatred, but I’ve never denied my heritage. So it struck me as significant somehow that at the beginning of my foray into personal genomics in 2006 I spent an intense hour with Rabbi Terry Bard, a pastoral counselor at Beth Israel Deaconess Medical Center (see chapter 2), talking about the PGP, George Church, informed consent, and Jewish notions of free will; and at the end of my journey, in 2009–2010, when the time came to really get a handle on what, if anything, I should care about in my genome, I spent an intense hour—and had an ongoing correspondence—with George’s then–grad student Abraham Rosenbaum, an observant Jew from New York with close-cropped hair and wire-rimmed glasses who is warm and generous and talks very fast. I would frequently email him, ask for a data file or a link as well as a layman’s explanation of what it was I was asking for, and he would write back a detailed response that usually began with something like “No problem. If it were not for people like you then I would have nothing to work with.”
1
Abraham explained that the PGP’s prior sequencing failures were not really the Polonator’s fault—the Polonator, of course, had not yet made much of a dent in the game. Instead our initial data were supplied by the core sequencing facility that served the Harvard Genetics Department with Illumina machines. Abraham had taken anything that the Trait-o-matic had flagged as suspicious and then gone back and resequenced those variants using good old-fashioned Sanger sequencing, which was still the gold standard for quality. “So,” I wondered aloud, “when can I see these data?” By now this question, asked frequently of George, had become both comical and rhetorical; my expectations were low. To my surprise, however, Abraham offered to let me look at the latest version of my own Trait-o-matic report. I was stunned. It didn’t seem possible. At long last, I would see actual PGP sequence data from my own genome!
2
I logged on to look at it and was greeted by a warning I knew all too well:
Before using Trait-o-matic, users should be aware of ways in which knowledge of their genome and phenotype could be used against them. For example, in principle, anyone with sufficient knowledge could take a user’s genome or open medical records and use them to:
infer paternity or other genealogical features;
claim statistical evidence that could affect employment or insurance;
claim relatedness to infamous criminals;
plant incriminating synthetic DNA at a crime scene;
reveal susceptibility to diseases currently lacking a cure.
Furthermore, any genetic information obtained about an individual may also have relevance to family members.
3
By now I was not losing sleep over any of this, but I was still curious about the last bit: Would there be actual clinical relevance for my family?
Most of what the Trait-o-matic generated was stuff I knew about (increased risks for diabetes, lupus, bipolar disorder), had never heard of and couldn’t pronounce (molybdenum cofactor deficiency; alpha-methylacyl-CoA racemase deficiency), and/or couldn’t get too worked up about. One variant interpretation, for example, suggested I “may require more methadone during heroin withdrawal.” Duly noted. Eat your heart out, Keith Richards.
Among the most potentially interesting variants were those in OCA2, an oculocutaneous albinism gene. OCA2 is associated with albinism and certain general pigmentation traits such as eye color, skin color, and hair color. I carried two variants in this gene: A481T and R305W. R305W makes one more likely to have brown or black eyes—my eyes are brown, so again: woo hoo! The other variant, A481T, seemed to be fairly common in Japanese people. But it wasn’t clear if it actually caused albinism; if it did, it probably needed help from at least one other mutation.
4
I was not albino, though I am blind as a bat without my glasses. So why did I care? My nephew Jesse, in addition to having Hirschsprung’s disease (see chapter 1), had albinism: he had light brown hair, fair skin, and was myopic—he began wearing glasses as a baby. Was there a connection? I didn’t know but I wondered. I emailed my brother and sister-in-law, who graciously offered to indulge my curiosity and said they would ask the eye doctor for Jesse’s report. They didn’t seem to share my interest in Jesse’s genomic underpinnings and I couldn’t blame them. They had gone through three years of hell and weren’t about to start drawing still more blood from their son just to satisfy his uncle’s curiosity. And I knew it probably didn’t matter. Jesse was in good health overall and already had what Hugh Rienhoff would call a “sound management plan.”
5
Jesse’s parents used sunscreen on him and he got regular checkups at the eye doctor. His molecular defect had an invasive but effective solution. A link between Hirschsprung’s disease and OCA2, or even Jesse’s particular case and OCA2, was not obvious. Yes, some of the same ancestral cells that go on to populate the gut also go on to populate the skin and involve some of the same biochemical pathways,
6
so perhaps OCA2 had something to do with Jesse’s phenotype. But again, it was of academic interest only to me and perhaps a few developmental biologists somewhere.