Issue 80 / March - April 2011
Super Computer in a Cell
Halil I. Demir
Since the first electronic computer ENIAC (Electronic Numerical Integrator and Computer) was announced in 1946, computers have changed a great deal. As computers become more powerful and faster, their size has changed dramatically, shrinking from the size of a room (Fig. 1) to a pocket-sized device. Todayâ€™s computers use electrons to carry information. Many approaches have been taken to replace electrons in theoretical and practical applications, such as photons for photonic computers, heat for phononic computers, quantum mechanical phenomena for quantum computers, and nucleotides for DNA computers. All of these approaches provide a different advantage over classical electronic computers, such as higher speeds and power efficiency, or lower costs. Starting from the first electronic computer, we will review the development of computers, and one of the latest approach for computing, DNA computers.
ENIAC had cost around $500.000 and was capable of 5000 simple operations per second. Today, basic personal computers (PC) cost around $500 with enough processing power to perform millions of operations per second. An average PC is enough in terms of computing power for everyday use like word processing, checking emails, and computer games. However, some areas in scientific research require computers at the frontline of current processing capacity, called Super Computers. Twice a year, the TOP500 project, which started in 1993, ranks and publishes details of the 500 most powerful super computers in the world. The IBM Roadrunner, located at Los Alamos National Laboratory, was announced as the fastest supercomputer in the world as of May 2008.
In computing, â€śflopâ€ť (Floating Point Operations Per Second) is a measure of a computerâ€™s performance which is similar to calculations per second. The IBM Roadrunner had cost $133 million and had a peak performance of 1.7 petaflops, which is around 1.7x1015 operations per second. The Roadrunner was delivered on 21 tractor-trailer trucks to its current location. Supercomputers are an essential component of research in areas like computational biology, fluid dynamics, structural mechanics and cancer research, which requires high computing power.
While computers get faster every year, their computing power is way behind when compared to a human brain. They consume hundreds of times more energy than a brain. It is estimated that a computer will be able to simulate a human brain in seven years, yet we are decades away from expecting a computer that can think like a human and make decisions. The brain is one of the most miraculous parts of the human body, full of mysteries. It works more efficiently than any machine developed in the last 50 years of the computer history.
However, the human brain is not the only body part which has an incredible computing power. In 1994, Leonard M. Adleman, a professor at the University of Southern California, introduced the idea of using DNA (Deoxyribonucleic Acid) to solve computational problems . This idea then led to a new field of science, called DNA Computing, which combines two disciplines, biology and computer science, to build the fastest and smallest computers ever. DNA is known as the blueprint of life, with unique properties such as self-assembly, molecular recognition, minute size and high information density.
Computationally challenging problems have known solutions, but enormous amounts of resources (time and/or cost) are required to find the optimum solution. Some problems, such as optimization, can be solved by generating many possible solutions, and then selecting the optimum one. Standard computing methods can generate or test a possible solution one at a time. On the other hand, parallel computing methods can carry out this process simultaneously for thousands of possible solutions.
Similarly, enzymes can work in parallel for replicating and repairing DNA strands. They can even work on the next strand before the first one is replicated. An enzyme can replicate a DNA strand 500 times in a second, which is equal to 0.001 MIPS (million instructions per second). The computations in DNA can reach to 1014 MIPS, while a modern computer runs at an average of 1000 MIPS. DNA computing is not only faster in processing, but also much more efficient in energy consumption. The energy consumption of a DNA operation (on one strand) is about 1010 times less than the energy consumption of an operation on modern computers.
The Traveling Salesman Problem (TSP) is one of the most studied problems in computational mathematics. Here is an example of the problem: a traveling salesman needs to visit 20 cities once, with predefined starting and ending locations, and certain rules. The complexity of the TSP problems increases exponentially with the number of cities, so problems with only hundreds of cities will take thousands of years to solve by modern computers. If there are 18 factorial possible paths in this problem, it will take 2 whole years for a computer with 100 MIPS of processing power to generate the possible paths and find the correct answer. However, all possible paths can be generated in a very short time by using DNA computing. A simplified version of the Traveling Salesman problem presented by Adleman involves the following scenario:
A salesman wants to visit the cities (Figure 3) Monroe, Gainesville, and Conyers, starting from Athens, and arriving at Atlanta last. Each city should be visited only once. The cities are not fully connected. While some cities are connected to another in one direction, others are connected in both directions. Our objective is to find the shortest route to visit all cities once. The solution for this problem is a travel from Athens -> Gainesville -> Monroe -> Conyers -> Atlanta.
When we convert the problem to a molecular language, each city is coded as a single-stranded DNA molecule with 8 nucleotides. We can think of nucleotides as bytes in computer programming, which will take the value 0 or 1. Nucleotides exist as four bases: adenine (A), thymine (T), guanine (G) and cytosine (C). All cities are coded with eight nucleotides as follows:
Athens ATGC CATG
Gainesville TCAG GTCA
Monroe GACT TGAC
Conyers CGTA ACGT
Atlanta AGCT TAGC
Connections between two cities are coded with the last 4 nucleotides of the departure city and the first 4 nucleotides of the arrival city. For example, the connection between Athens (ATGCCATG) and Monroe (GACTTGAC) is coded as CATGGACT. The complementary codes for connections (the Watson-Crick complements), where every C is replaced by a G, every G by a C, every A by a T, and every T by an A, and connection codes are given below:
Connection Code Complementary Code
Athens â€“ Gainesville CATG TCAG GTAC AGTC
Athens â€“ Monroe CATG GACT GTAC CTGA
Gainesville â€“ Atlanta GTCA AGCT CAGT TCGA
Gainesville â€“ Monroe GTCA GACT CAGT CTGA
Monroe â€“ Conyers TGAC CGTA ACTG GCAT
Conyers â€“ Atlanta ACGT AGCT TGCA TCGA
Conyers â€“ Monroe ACGT GACT TGCA CTGA
The mixture for performing reactions will include DNA strands and their complements for 5 cities and 7 connections between cities in our example. If an Athens molecule (ATGC CATG) encounters the complement strand of an Athens-Monroe (GTAC CTGA) connection in the mixture, a hydrogen bond will be formed between strands (Figure 4). Other strands will continue forming bonds for valid connections between cities, building a complete travel path with the help of DNA ligase. There needs to be enough copies of each DNA strand to generate all possible travel paths.
ATGC - CATG
| | | |
GTAC - CTGA
Polymerase Chain Reaction (PCR) will be used to make multiple duplicates of DNA strands containing Athens (start) and Atlanta (end) cities. The result of PCR will be the amplification of correct travel from Athens to Atlanta, which makes it easy to separate. PCR is a method by which a few strands of DNA can be copied into millions in a very short amount of time. This also makes PCR a very important method to increase small amounts of DNA found in blood or hair samples, which could be enough to carry out analysis and reveal a personâ€™s identity in forensic science.
Electrophoresis follows the PCR process to sort the resulting paths according to their sizes. Since every city is coded with 8 nucleotides, the correct path should include exactly 40 nucleotides representing a full path for 5 cities. The gel electrophoresis process uses an electric field to separate DNA strands by size as they travel through a gel matrix. The speed of DNA molecules differs by their size, which results in the sorting of the molecules by size. After this step, DNA strands starting with the code of Athens, ending with the code of Atlanta and with a size of 40 nucleotides are separated from the mixture.
The last step will be reading the code and removing the DNA strands that didnâ€™t contain all the cities. Adleman used a common method known as affinity purification for the separation process. Finally, the mixture has the DNA strands with the correct travel path, starting from Athens and arriving to Atlanta, and traveling through every city once. All the laboratory work looks complex for this simple problem, but as a new concept for computing, it is revolutionary. In its ability to perform parallel computations, DNA computing shows great promise over traditional computing approaches.
Data density is another unique advantage of DNA. Billions of DNA strands can be stored in a regular laboratory tube. A DNA strand is composed of bases A, T, C and G spaced evenly, 0.35 nanometers apart from each other. The data density of DNA is around 106 GB (gigabytes) per square inch, which is 100,000 times larger than the data density of todayâ€™s storage technologies (7 GB per square inches). Moreover, DNA is a durable and strong molecule; the information stored within it can be kept for thousands of years in the right conditions. In 2008, 80% of the woolly mammoth genome, several thousand years old, has been identified from tufts of frozen woolly mammoth hair .
DNA is also created with remarkable mechanisms such as built-in error correction. The double stranded nature of DNA provides a double check on pairing. Error repairing enzymes are always ready to search for anomalies during the DNA replication process. It results ina ratio of one error per billion replications. DNA is located and protected at the center of each cell with a perfect balance. The miraculous architecture of DNA has waited for thousands of years to be understood by humans and be used for the benefit of the world. Further studies on DNA might open new opportunities to help researchers in solving technologically challenging problems.
Acknowledgment: This article was produced at MERGEOUS , an online article and project development service for authors and publishers dedicated to the advancement of technologies in the merging realms of science and religion.
Halil I. Demir is a postdoctoral scholar in the area of Informatics, and lives in Iowa.
 ENIAC, Image Credit: Wikimedia, http://upload.wikimedia.org/wikipedia/commons/4/4e/Eniac.jpg
 IBM Roadrunner, Image Credit: Wikimedia,
 Leonard M. Adleman (1994-11-11). â€śMolecular Computation of Solutions to Combinatorial Problems.â€ť Science, 266 (11): 1021â€“1024.
 Miller, W (et al). 2008. "Sequencing the nuclear genome of the extinct woolly mammoth", November, Nature.
 Mergeous, http://www.mergeous.com