Volumes of books and hundreds of articles have been published about the structure and functions of DNA, since the day two renowned scientists from Cold Spring Harbor laboratories, who would later win the Nobel Prize, described the double helix structure of it. Perhaps one common element that shines through all the publications is their emphasis on the numerous specific functions of DNA, if not the fascinating harmony of these specific functions in a living organism. In this article, we will take a look at a few small droplets from the vast ocean of information about the multi-layered functions of DNA that are orchestrated in an awe-inspiring manner.
The cell is the structural, functional, and biological unit of all organisms. All information needed for numerous processes in a cell, including repair and division, is contained in DNA (Deoxyribonucleic acid). DNA is a huge single molecule with intriguing features. How can a single molecule have such a dominant role in preserving information essential for the continuation of life? What are the mechanisms and levels of organization during its function? What does DNA mean for a single cell or for a human being? It’s impossible to answer these great questions in a single article; however, understanding the ways DNA exerts its role, DNA’s impact on multiple levels ranging from a single cell to an organism, and coordination between various levels, can potentially open up new frontiers in our mind and in our perception of life.
“Double helix” architecture of DNA DNA has an elegant structure that forms the basis for all of its functions. DNA is a repeating structure of nucleotides. Each nucleotide is formed of a phosphate group, 5-carbon sugar (deoxyribose) and a nitrogen-containing base attached to the sugar from outside to inside (See Figure 1a for a schematic view of DNA). There are four types of nucleotides in DNA, differing only in bases. We can consider bases as the identity of nucleotides. These four nucleotides are shown with letters A (adenine), T (thymine), G (guanine) and C (cytosine). Thousands of nucleotides bound with sugar-phosphate covalent bonds come together to form long strings. The sugar-phosphate backbone can be imagined as the steelwork of a skyscraper. The nice thing about nucleotides is their specific match to each other in double helix. A forms a base pair with only T, and G forms a base pair with only C. These pairs are bound to each other with hydrogen bonds. This feature is the key that makes DNA a double ladder. Two strings of nucleotides form a double helix by selective interactions of As with Ts, and Gs with Cs (See Figure 1b for 3-D structure of DNA). In DNA structure, hydrophobic bases tend to stay inside of double helix and hydrophilic sugar-phosphates stay outside interacting with water in nucleus. This feature helps DNA to form a double ladder. The length of the sugar-phosphate backbone is more than the bases. To compensate for the length difference, the sugar-phosphate backbone wraps around the bases inside, as a road wraps around a mountain to climb to the top. This simple difference is the main reason for DNA to form a helix.
The double-stranded nature of DNA with specific base pairing is one of its key features as genetic material. DNA is replicated using one strand as a template. Replication machinery reads one strand of DNA and builds the second strand by putting As against Ts and Gs against Cs. If a mutation occurs in one strand, it can be repaired using the second strand. This system is like photocopying DNA from itself instead of building it from scratch every time. That is why specific base pairing of nucleotides in the double helix makes it possible to replicate DNA through generations, protecting its integrity and information content. The code of DNA, an alphabet with four letters DNA contains the information to produce nano-sized cellular machineries called proteins. We mentioned that there are four types of nucleotides. Nucleotides are like letters in DNA, three of them are code for one amino acid of protein. We can make it more understandable by giving an example: “ATG-GCC-CTG-TGG-ATG” as a nucleotide sequence of DNA corresponds to the first five amino acids of a protein called insulin (a hormone regulating blood glucose level that is important in diabetes) and amino acid sequence is methionine-alanine-leucine-tryptophan-methionine. The code is so sensitive that even a single mistake in the sequence of DNA can cause serious diseases in humans such as sickle-cell disease or cystic fibrosis. With all these nucleotides, DNA can be thought of as a book containing amino acid sequence information for thousands of proteins (about 30,000 in humans). The amount of information contained in DNA is incredible: a typical human cell contains 2 meters of DNA that is tightly packed by proteins in the nucleus. If we tried to write the information from DNA into books, the book would contain over one billion words and 1,500,000 pages. DNA-protein interdependency and the cell as a micro-factory DNA can be thought of as an instruction manual that stores information for proteins and RNAs. Proteins, as molecular machines, perform particular tasks such as energy production and synthesis of DNA and RNA (See Figure 2 for the structure of proteins). Certain proteins read the information on DNA and make a transient copy of certain regions of DNA. These copies are called messenger-RNAs (mRNAs) and mRNAs are transported from nucleus to cytoplasm (See Figure 3 for representation of mRNA production from DNA by proteins). In cytoplasm, the information on mRNAs is read by protein complexes called ribosome. Ribosomes produce new proteins processing the data from mRNAs. This information flow from DNA to proteins is called central dogma in molecular biology (Figure 4). The data that is encoded in DNA can be read, translated, and put into the form of product only by proteins. We can conclude that for a protein to be produced, DNA is essential; for DNA regions to be read into proteins, proteins are essential. So, there is interdependency between proteins and DNA. Proteins without DNA have no future and no ability to regenerate and DNA without proteins is just like an instruction and manufacture manual of a computer without the user and computer itself. We can imagine the cell as a sophisticated factory, and proteins as the machines of the factory. DNA includes the instructions for the factory to be rebuilt and for itself to be rewritten for every new factory. It has instructions on how to build every machine in the factory. It has also codes for when and how much of these machines should be produced (we will discuss more about these codes on DNA in the next section). On the other hand, the timing and control of all these productions also depend on machines in the factory. Some of these machineries read and decode the instruction manual, some of them produce new machines by reading the decoded copies of the instruction manual, some of them act as sensors for the signals, some of them transmit signals to other machines, some of them produce signals by measuring the levels of materials in the factory, some of them function in communication with other factories, and so on. As we can see, DNA and proteins are meaningful for life only when they are together in the excellent cell context. This is a perfect example of the principle that the whole is bigger than the sum of its parts, because each element of the cell system has limited potential, until it comes together with the others to blossom into life.
The famous term “Gene” We can think of genes as functional units of DNA. A gene has the information content for at least one protein. Humans have about 20,500 genes that are read by protein machineries to produce proteins. Special proteins read the information on genes and make a transient copy of these certain regions of DNA. The process of making a copy of a gene as an mRNA is called transcription.
Genes don’t only store information; they have an intrinsic architecture of design to coordinate transcription utilizing three main components: promoter, coding region, and terminator. The promoter is the gene region that signals for the start of transcription. Protein machineries bind to the promoter and activate transcription. The coding region has the information for the amino acid sequence of the protein. The terminator region gives the stop signal for transcription. There are different functional regions on DNA located between separate genes such as enhancer regions that are platforms for binding regulatory proteins to tune the transcription.
The coding region of genes has multiple reading blocks for amino acid sequences and these reading blocks are called as exons. For some genes, different combinations of exons can be put together to give rise to different proteins. This mechanism allows one gene to be able to produce multiple proteins, increasing the efficiency of genetic material. A similar mechanism is used to produce antibodies (proteins recognizing foreign antigens) by the immune system. Different regional genes come together by a mechanism of DNA rearrangement (V(D)J recombination) and their differential combinations form many different antibodies. For example, a part of the antibody that is called a heavy chain is produced by a DNA region containing 65 variable (V) genes plus 27 diversity (D) genes and 6 joining (J) genes (5, 6). This produces a combination of 65 V genes x 27 D genes x 6 J genes = 10,530 heavy chains. There is a similar mechanism of rearrangement for light chain and variable region of antibodies, which result in millions of different antibodies for host antigens. A single example in the immune system shows us that DNA not only has a decent design for the coding system, but it also has ingenious and creative mechanisms to maximize its potential.
Gene expression is orchestrated during development and formation of organs The human body which consists of more than 1013 (ten trillion) cells is generated from a single cell called the zygote (see Figure 5). This tells us that, in a single cell, all the information and instructions to build and coordinate the systems of human body is encoded. Different tissues and organs including muscles, nerve cells, connective tissue, and eyes are fruits of one single cell. They all contain the same genetic information. Then what makes them different?
Promoters, enhancers, and repressors located in and nearby genes are important in spatial and temporal control of gene expression in different cell types of the body. Each cell type in our organs expresses a different subset of genes; this is what gives a cell its identity. For example, in muscles, myosin is expressed and in the eye’s retina, rhodopsin is expressed. Myosin functions in contraction and rhodopsin functions in vision. What determines the expression of rhodopsin in the eye but not in a muscle? The determination process occurs during development by programmed interactions of specific proteins called transcription factors, and restricted regions of DNA including promoters and enhancers. During development, certain regulatory proteins in a specific cell type, bind to DNA regions of only some genes (for example, in future retinal cells of the eye, rhodopsin gene would be activated but not myosin) and this predetermination orchestrates differential expression of genes to give rise to hundreds of different types of cells.
There are different layers of function for DNA—each subtitle of this article tries to focus on a certain layer of function. DNA as a molecule has a double helix structure and is replicated through generations to preserve genetic information. It stores genetic information and has a four-letter alphabet for the expression of proteins. In the second layer, DNA has an informational unit called gene and thousands of genes are encoded in DNA to contain information for proteins. Each gene is controlled individually by making use of promoters and enhancers. In the third layer, all processes in the cell micro-factory as an entity are performed through interactions of DNA and proteins with each other and among themselves. Proteins read DNA code and work as cellular nano-machineries. In another layer, temporal and spatial expression of genes on DNA are orchestrated and different subsets of genes give rise to different cell types and organs. Organs communicate with each other to function properly and keep the balance and homeostasis of the body. The information stored in DNA not only coordinates highly sophisticated processes of a single cell, it simultaneously projects the whole body system of a human being, which is billions times bigger than a single cell.
DNA functions in all these different layers and keeps a great harmony in coordination between various layers of function. After grasping this complexity, organization and communication from a single molecule, to proteins, to a single cell, to tissues and organs, and to a human being by utilization of DNA, should not we ask ourselves, “can these elements come into existence by random forces and collisions?
1. Calladine, C. R. et al. 2004. Understanding DNA: The Molecule and How It Works, Academic Press
3. Li A, Rue M, Zhou J, et al. 2004. “Utilization of Ig heavy chain variable, diversity, and joining gene segments in children with B-lineage acute lymphoblastic leukemia: implications for the mechanisms of VDJ recombination and for pathogenesis.” Blood 103 June (12): 4602–9.