By John Bohannon
When it comes to storing information, hard drives don't hold a candle to DNA. Our genetic code packs billions of gigabytes into a single gram. A mere milligram of the molecule could encode the complete text of every book in the Library of Congress and have plenty of room to spare. All of this has been mostly theoretical—until now. In a new study, researchers stored an entire genetics textbook in less than a picogram of DNA—one trillionth of a gram—an advance that could revolutionize our ability to save data.
A few teams have tried to write data into the genomes of living cells. But the approach has a couple of disadvantages. First, cells die—not a good way to lose your term paper. They also replicate, introducing new mutations over time that can change the data.
To get around these problems, a team led by George Church, a synthetic biologist at Harvard Medical School in Boston, created a DNA information-archiving system that uses no cells at all. Instead, an inkjet printer embeds short fragments of chemically synthesized DNA onto the surface of a tiny glass chip. To encode a digital file, researchers divide it into tiny blocks of data and convert these data not into the 1s and 0s of typical digital storage media, but rather into DNA’s four-letter alphabet of As, Cs, Gs, and Ts. Each DNA fragment also contains a digital "barcode" that records its location in the original file. Reading the data requires a DNA sequencer and a computer to reassemble all of the fragments in order and convert them back into digital format. The computer also corrects for errors; each block of data is replicated thousands of times so that any chance glitch can be identified and fixed by comparing it to the other copies.
To demonstrate its system in action, the team used the DNA chips to encode a genetics book co-authored by Church. It worked. After converting the book into DNA and translating it back into digital form, the team’s system had a raw error rate of only two errors per million bits, amounting to a few single-letter typos. That is on par with DVDs and far better than magnetic hard drives. And because of their tiny size, DNA chips are now the storage medium with the highest known information density, the researchers report online today in Science.
Don’t replace your flash drive with genetic material just yet, however. The cost of the DNA sequencer and other instruments "currently makes this impractical for general use," says Daniel Gibson, a synthetic biologist at the J. Craig Venter Institute in Rockville, Maryland, "but the field is moving fast and the technology will soon be cheaper, faster, and smaller." Gibson led the team that created the firstcompletely synthetic genome, which included a "watermark" of extra data encoded into the DNA. The researchers used a three-letter coding system that is less efficient than the Church team's but has built-in safeguards to prevent living cells from translating the DNA into proteins. "If DNA is going to be used for this purpose, and outside a laboratory setting, then you would want to use DNA sequence that is least likely to be expressed in the environment," he says. Church disagrees. Unless someone deliberately "subverts" his DNA data-archiving system, he sees little danger.