13. Sept. 2023
A research team from CEITEC Masaryk University and the Institute of Biophysics of the Czech Academy of Sciences led by Miloslava Fojtova and Jiri Fajkus has created a new unique database, TeloBase, which provides information about telomere sequences and their evolutionary changes. Telomeres have a crucial function in the cell, protecting the genetic information stored inside the chromosomes. TeloBase allows the construction of developmental trees, and candidate telomere sequences can be further validated in the laboratory. This database eliminates the need for scientists to manually sift through extensive sequencing data and laboriously select short repetitive motifs, as has been the case until now. The results of this study were recently published in the prestigious international journal Nucleic Acid Research.
Telomeres, structures at the ends of linear eukaryotic chromosomes, distinguish natural chromosome ends from DNA breaks and protect genetic information stored inside the chromosomes. The replication machinery in the cell is unable to complete replication of the very ends of chromosomes, which are thus shortened during cell division. Therefore, it is the telomeres that are shortened, not, for example, the DNA regions coding for essential proteins. Telomerase, an enzyme complex consisting of RNA and proteins, can compensate for the shortening of telomeres. At the DNA level, telomeres are formed by short repetitive sequences; TTAGGG is a sequence of human telomeres, and TTTAGGG repeats delimit the ends of chromosomes in most plants. However, especially in plants, many species with different telomere sequences have been identified.
Telomeres are important mainly for their protective function. Problems with telomere length homeostasis or telomerase function are frequently correlated with severe pathological conditions, including cancer. So, why study non-human telomeres? "Detailed information about changes in telomere structure during evolution is essential for understanding how things actually work at the ends of chromosomes and how we might intervene in this machinery – so that we can correct the problems and not make them worse," explains Miloslava Fojtova.
TeloBase contains the telomere sequences of more than 9,000 species. These include motifs that have already been experimentally confirmed, as well as those that have been predicted based on, for example, sequencing data. "In this sense, the database is a valuable resource for further research. Another important feature is that a community of interested scientists who can add newly discovered or proposed telomere sequences based on their results can maintain TeloBase. The sequences will then be reviewed by other experts and implemented in the database if approved. This interactivity across the community is key to keeping the database up-to-date," says Martin Lyčka. A common problem with many databases is that due to rapid scientific progress, they become outdated in a few years without regular updates.
Nowadays, we are used to find complete information about anything with a few simple clicks. This was not possible with telomere sequences. However, until recently, there was no reason to create such a database because it was assumed that there were only a few sequence motifs forming telomeres – e.g., TTAGGG in mammals, TTTAGGG in plants. The situation with yeasts was more complicated, but that was an exception in a way. However, it seems that only vertebrates have a uniform telomere repeat, otherwise, the spectrum of telomere sequences in different species is extremely diverse.
Therefore, existing resources containing telomere motifs have proven to be completely inadequate. There are 61 publications included in the Telomerase Database, the most recent being more than 10 years old. The Plant rDNA Database includes 81 papers that mention telomere sequences. Nevertheless, telomere sequences are a secondary object of interest in these databases; the enzyme telomerase and rDNA are of primary interest.
The idea of creating a comprehensive database of telomere sequences arose accidentally during a conversation between Martin Lycka and Vratislav Peska over coffee. After this meeting, Martin Lycka came up with the concept of TeloBase and programmed the database. The significant contribution of PhD student Michal Zavodnik to the literature search should also be highlighted. It was an extensive search. Michal and Martin went through thousands of papers published in nearly 40 years of telomere research, and just the detailed reading of these articles represented several months of hard work. The involvement of Martin Demko from the Bioinformatics core facility of CEITEC MUNI was also important for ensuring the implementation and operation of the database.
The research was supported by the GAČR ExPro project (20-01331X, GAČR), a joint project of CEITEC MU and IBP CAS. The Bioinformatics CF is funded by the e-INFRA CZ (ID:90254, Ministry of Education, Youth and Sports) and ELIXIR-CZ (ID:90255, part of the international infrastructure project ELIXIR) projects.