Like most other experimental scientists, I hardly spend time in the library. In one of those rare occasions, while trying to find an article in an archive, I was truly amazed when I for the first time saw the mobile book shelving system there. In this system, a large number of books are stored in a way that saves a lot of space. With a push of a button, you can open up a particular section and search for a book you are interested in. If for some reason your book is not there, you can re-close that section and open up new shelves, again with the push of a button. In libraries, books are organized according to specific rules, such as their subject, their title and the name of the author. Without this structural organization, it would be immensely difficult to find one book among thousands of others. I have to confess that it still took me a while to find the book I was looking for, despite all these structural organizations and advanced shelving systems.
Spending so much time in the library for a particular book further amazed me about the answer I was searching for in my research. I am trying to understand how our genome is organized and how it functions. In order to make myself clear, let me first try to explain what the genome is. I bet you will be amazed by the impressive genome organization and its flawless function, too.
The genome can be thought of as a library. Each book in the “genome library” is what we call a “gene”. Every gene is different in size and the information they contain, just like the books in the library. Like the different sections in the library, our genes are also compartmentalized into different chromosomes. We have 23 pairs of chromosomes. One pair contains the information from our father and the other one from our mother. Therefore, unlike libraries, where you may find more than one copy of a book, our genome has two copies of each gene (except the genes on the X and Y chromosomes which carry only one set of genes).
Every cell in our body carries its own library: the genome. Our genome is the smallest library in terms of physical volume, yet contains relatively the largest amount of information. In our body, which contains roughly 100 trillion cells, we carry 100 trillion of these libraries. Here comes the amazing part; each of these libraries contains 3 billion letters of information. If this information were to be printed, it would take 1000 books of 200 pages each. The information in our genome is coded by a 4 letter alphabet; Adenine (A), Guanine (G), Cytosine (C) and Tymine (T). These four letters (A,G,C,T), called deoxyribonucleic acids, are the building blocks of every DNA strand on earth. The collective amount of these letters in any organism constitutes of its genome. We, as humans, have about 3 billion of these letters in our genome, which is encapsulated in the nucleus of every cell in our body. The total length of our genome is 2 meters long. This 2 meter long stretch of DNA (3 billion letters of information) is highly compacted and packaged in the nucleus, which is only 2 micrometer in diameter, an amazing 1,000,000 fold compaction!
How is our genome, which is 2 meters long, compacted so much that it fits in a nucleus only 2 micrometer in size? In the nucleus, DNA is wrapped around a group of 8 proteins called histones. This combination of DNA and histone proteins forms a special structure called “beads on the string”. Each bead is called “nucleosome” (Figure 1).
Multiple nucleosomes are then coiled together and stacked on top of each other. This organisation further packs the DNA up into a thicker fiber called “chromatin”. This chromatin fiber further condenses by forming tight loops. The structure which we call a “chromosome” is actually the most compact form of the chromatin fiber, which is only visible under the microscope during cell division. This remarkable chromatin organization allows a 2 meter long DNA to fit into the nucleus of each cell, an object so small that 10,000 of these nuclei can fit on the tip of a needle!
This remarkable genome organization further impresses us when we think about the utilization of this genetic information by over one hundred trillions of cells in our body. These many cells in our body are not all similar to one another. Most of the cells in our body are specialized to carry out specific functions. We have more than 200 different cell types specialized for unique functions. Some cells, like B and T cells in our immune system, are dedicated to fighting against infectious agents, whereas other cells, like neurons, function by transmitting signals between our brain and muscles. Since these cells have different structures and carry out different functions, they require different sets of instructions coded by genes in the genome. There are roughly 20.000 genes in our genome. Importantly, each specialized cell in our body utilizes only a subset of these genes, not all of them, at any given time. In other words, from the library analogy, roughly two third of the books (i.e., genes) are needed for each cell to function. The remaining one third of the genes is not necessary for that particular cell type. For example, the MYOD1 gene encodes a protein required for muscle cell differentiation. Therefore, this gene is absolutely required for muscle cells. However, the same gene is not required for the B or T cells that function in our immune system. While the MYOD1 gene has to be stored in an easy access location (open chromatin) for muscle cells, immune cells do not need this gene, and therefore it is stored in the depository section of the “genome library” (closed chromatin), which is not used very often.
In line with this, each cell has to organize its genome in a special way so that the genes needed for its function should be easily accessible. This remarkable genome packaging and organization allows each cell to easily and very quickly access the required genes for transcription into the proteins. On the other hand, those genes that are not going to be used are stored in relatively inaccessible regions in the genome library. Therefore, the genome is not packaged similarly all along. Certain regions of the genome are “open” and therefore easily accessible (called euchromatin) for transcription, while other regions are kept “closed” by condensed and packed structure (called heterochromatin). Since each cell type requires different set of genes, the genome is also differentially organized between cell types. Genome organization in a muscle cell is remarkably different than genome organization in, let’s say, a skin cell.
After all these explanations, I hear you asking, “How does each cell in our body know how to organize their genome? How does, lets say, a muscle cell decides to become muscle but not a blood cell?” These are exactly the same questions that many scientists are asking nowadays. Since the completion of the Human Genome Project (HGP)1,2, which determined the entire sequence of information of our DNA, scientists have been trying to understand how this amazing alphabet is being used in each and every cell in our body. Francis Collins, one of the great scientists of our time and the current director of National Institute of Health (NIH-USA), is especially noted for his landmark discoveries of diseases associated genes, as well as his leadership in the Human Genome Project. He calls the information coded in our DNA the “Language of God” in his recent book.3 Recent technological advancements allow scientists to better study the structure and function of this language and get better clues about the organization of this genomic library.
Whether we believe that this genomic information is “the language of God” or not, we are closer than ever to understanding the codes of this amazing language. New technological advancements allow us to get better insights about the organization and utilization of this information. The more we learn about it, the more we are amazed about not only its flawless packaging but also its differential utilization in each cell. At any time and in any tissue, trillions of cells are using different sections of the genome library to get the necessary instructions decoded from “the Language of God” and continue their journey in our bodies without any conscious decision making on our part.
Ahmet Mir Fazil holds Ph.D. degree in molecular Biology. He is a research scientist in Boston.
1. International Human Genome Sequencing Consortium (2001). “Initial sequencing
and analysis of the human genome.” Nature 409 (6822): 860–921.
2. Venter, JC, et al. (2001). “The sequence of the human genome.” Science 291
3. Francis S. Collins. 2006. The Language of God: A Scientist Presents Evidence for Belief. Free Press.