Recurring DNA in Genome Structure

Hamza Aydin

May 1, 2013

A genome is a data book or registry which records the past and future of living organisms. It dynamically and simultaneously stores hereditary and biological information in three different hierarchical levels belonging to three different time periods.

The first is the preservation of characteristic, long term data imprints that describes the development of an organism in the stable DNA sequences.

Second is the storage of medium term epigenetically featured data that is carried a couple of generations further down the cellular level. Epigenetic information is not stored within nucleotide sequences but in the chemical modifications of these sequences (like the methylation of repeated strings of GC dinucleotide).

Third is the storage of data generated as a result of dynamic interactions between proteins, RNA and DNA in order to adapt to the events and changes during cellular life cycle in the form of nucleoprotein or DNA-protein complexes.

The data generation and storage capacity of DNA in three different hierarchical levels and time periods demonstrates that genome plays a plethora of roles in cellular activities and heredity. Formatting of genome for its generation and storage of data is carried out via DNA sequences of various features. Genomic system is composed of repeating DNA sequences. DNA sequences (satellite) function as a marker as they repeat numerous times in various frequencies. Genome includes genomic folders similar to that of computer systems. These genomic folders, also known as the epigenetic index of genomes, are responsible for the remodeling of chromatin and the coordinated control of genomic functions. Repeating DNA sequences play a critical role in replication of genome (making a copy of DNA), dispersal of copied DNA into daughter cells and construction of support systems that enable organization of chromatins.

It is possible to better understand genomic functions in relation to examples such as memory sticks and hard drives that are used in electronic information systems. The difference between a genome as a basic data-information storage medium from a hard disc is that it can be replicated as required by its nature and these replicas can be transferred to daughter cells. Following examples could be given to illustrate that a genome gains function only when it interacts with various data processing modules in the cell.

A copy of genome is produced by cellular DNA replication system
Correct localization of each genome copy towards daughter cells is only possible when chromosome segregation system works (the centrosomes and microtubules)
The central transcription system is responsible for the copy of data from DNA to RNA. Different gene expression patterns are developed via regulation of transcription time and level with the help of transcription factors and a web of cell signalization.

Very intricately organized genomic system structures are designed through the successive combination of protein encoding sequences, signals distributed in various places and repeating DNA sequences. Formatting of genome resembles formatting of computer programs. Various repeated serial commands of computer software are used to allocate addresses to files independent of the original data contained; different computer systems use different signals and structures to manage programs. In a similar fashion, diverse living species often utilize repeating DNA sequences and chromosomal structures to organize the encoded information and to format their genomes.

Diversity and variation of repeating DNA sequences are building blocks that are constructed into different genomic system structures. Genomes of different organisms bear characteristic system morphology just like computers with various operating systems and hardware. For instance, animal cells are created as a good model to take and incorporate foreign DNA into their genomes. Genetic data transfer among organisms of the same kind is referred to as “vertical gene transfer” whereas transfers between different species, genuses and classes are called “horizontal gene transfer.” Mobile DNA sequences like transposons are very effective horizontal gene transfer agents.

Cellular differentiation and morphogenesis (formation of tissue and organ from cell) is not programmed completely in the primary structure of the DNA sequence. Components of modular programs are encoded in a flexible way and a continuous renewed and recombined arrangement is enabled when needed.

The reason behind creation of different organisms from a single genome is this utilization of such genomic structure. Metamorphosis, that is the development of different organisms like invertebrates such as a caterpillar and a butterfly, is a good example of this feature.

Two organisms from the perspective of the same genomic protein and RNA codes can be considered as two different species. Different genomic structures and repetition of sequences among different organisms are distinctive criteria for the identification of species since these features can lead to mismatch of reproductive cells, different expression patterns of genetic code sequences, and may cause ecological diversity as well. That is why repeated DNA sequences are very important in studying parental relationships. Today, microsatellite DNA as repeated DNA sequences are used to configure biological relations among individuals in forensic sciences. Plant species vary in respect to the repeated sequences in centromeres in their chromosomes; these variations are used for identification of species. Main determinants of genomic system structure are diversity, frequency, and genomic localization of repeated DNA sequences. To explain this with examples, we could say that successively repeated sequences at centromeres, telomere repetitions and transcription, packing of chromatin, repeated sequences that are spread throughout genome in charge of cellular functions like nucleus localization are the main elements of the genome system structure. Genome is a single integrated system that is controlled closely and remotely via communication webs that use repeated sequences.

While explaining the Qur’anic concept of the Manifest Record (36:12) Bediuzzaman Said Nursi, the great renovator of Islamic thought in Turkey in the twentieth century, wrote that the Manifest Record expresses one aspect of Divine knowledge that is related “more to the past and future than to the present. It is a book of Divine Destiny that contains the origins, roots, and seeds of things, rather than their flourishing forms in their visible existence” (30th Word, Second Aim).

Inspired from this view, a seed can be considered as a tiny adorned form of Divinely creative command as programs and indexes and as a determinant for those programs and indexes in the organization of an entire tree. Since the Manifest Record book, as a title of Divine knowledge and command, observes the past and the future rather than the present, the genome of a grain or a seed acts like a library and an archive in which the future and past of an organism is written.

Sequences encoding different information in DNA

Different information types corresponding with various DNA sequences exist in the genome. These DNA sequences that were considered junk for a long time because they were not coding proteins, have in fact been found to be responsible for an amazing array of functions in genomic structure. Some of these sequences include:

Group determining sequences that enable coordinated or successive expression of genes,
Sequences acting as a marker in charge of initiation and termination during transcription of DNA to RNA ,
Signal sequences responsible for conversion of primary immature RNA, sequences into smaller functional RNA molecules,
Transcription control sequences that determine the expression frequency of genes,
Sequences that identify and mark the initiation regions for intensification and remodeling of chromatins,
Sequences that make binding regions which affect the relocation of genome in nucleus or nucleolus,
Sequences that target regions where covalent DNA modification (methylation) with functional groups like methyl takes place,
Sequences that control and identify the regions responsible for initiation of DNA replication,
Sequences that make the structures which enable completion of replication at terminal ends,
Sequences at the segregation points that enable equal distribution of copied DNA molecules into daughter cells and centromere sequences,
Sequences responsible for guidance during repair of DNA bound errors and damages,
Start point sequences used for repackaging of genomes,

Recurring sequences exist in the genomes of many organisms and shows great structural diversity. Recurring elements function as an initiator or terminator for heterochromatin regions. Furthermore they form an important scaffold and binding spots for folding of DNA structure. As if they carry out the job of an architectural mold in specific shaping of genome to be packed into a very limited area. The ratio of repeating sequences in genome (60-90%) is much more than sequences that are encoding proteins and RNA (10-40%). To explain it with an example, chromosomes in human genome are made up of packages of protein-DNA such as heterochromatin and euchromatin. Heterochromatin regions usually make up the regions with no transcription whereas euchromatin regions feature DNA transcription.

The ratio of protein encoding sequences to the entire human DNA is approximately 1.2%. Around 43% of euchromatin regions are composed of recurring and mobile DNA elements. 18% of heterochromatin region is also made of satellite (dense repeating sequences) and mobile DNA elements. Therefore almost 50% of human genomic DNA is composed of these repeating DNA sequences. In bacteria however, these only make up around 5-10% of the genome. These sequences were described as parasitic and junk individual DNA structures up until today and still continues to be described thus by many researchers and scientist. Nevertheless, even today, mobile DNA elements and repeating sequences are accepted as genomic parasites. Recent advances in the last ten years that have demonstrated this is not true, have instead revealed the vital importance of repeating sequences in genomic functions.

Repeating DNA sequences affect chromatin (dense pack of DNA and protein) structure in two ways. Irregular repeating DNA sequence copies contain binding regions for proteins that organize DNA. Heterochromatin (darker since it is densely packed chromatin) inhibits transcription and recombination, delays replication, and generally blocks the reading of information in DNA sequences that contain genetic coding. Heterochromatin regions are distributed throughout the chromosome. Because of this, presence of regions with coupled successive repeated sequences triggers heterochromatin formation.

In fruit flies, placement of protein encoding loci required for eye pigmentation near the heterochromatin blocks in centromeres (phenomenon of position effect) is provided via organization of chromosomes and thus, formation of phenotypic characters are inhibited. The “phenomenon of position effect” is convincing evidence that genome is a major system which is integrated with composition of partially repeating DNA sequences. When heterochromatin amount is increased in XYY male fruit flies, reorganized pigmentation of eye expression decreases. In XO males, when heterochromatin amount decreases, inhibition becomes severe. Changes in levels of protein which binds to special heterochromatin specific DNA regions generate opposite effects. Decrease in these proteins reduces or suppresses “phenomenon of position effect.” Surplus synthesis of these proteins also enriches this effect.

Repeating DNA sequences play an important role in the transfer of genome into daughter cells. For instance, they function in formation of the centromeres as chromosomal binding regions for microtubules, during gamete formation as linear terminals of chromosomes are replicated, and during chromosomal matching. Distribution of repeating sequences plays a major role in configuration of genomic functions. Each genome has genomic system structure that is shaped dependent on the amount of repeating DNA sequences to a major extent.

Going back to Nursi’s explanation of the Manifest Record, we can draw a parallelism between the book of the universe and the book of revelation, the first of which shows us that certain sequences in the genome are repeated for significance and necessity, just as many verses are repeated frequently in the Qur’an with nuances to refer to different meanings, benefits, and purposes, opening a wider space for many interpretations.

A genome is not only a book that contains protein and RNA codes, but also has a complex system structure with many functions for cellular vitality. The most needed sequences are those that are repeated more frequently. They are not pieces of junk DNA as predicted, they are jewels Divinely constructed.

References

Shapiro J. A. 2001. “Genome Formatting for Computation and Function :Genome Organization and Reorganization in Evolution: Formatting for Computation and Function.” Presented at a symposium on "Contextualizing the Genome," Ghent University, Belgium, November 25 - 28, 2001 (Ann. N.Y. Acad. Sci., in press)
Shapiro, J.A. 2005. “A 21st century view of evolution: genome system architecture, repetitive DNA, and natural genetic engineering.” Gene 345, pp: 91–100.
Shapiro J. A. and Sternberg R. V. 2005. “Why repetitive DNA is essential to genome function.” Biol. Rev., 80, pp. 1–24. Cambridge Philosophical Society.