As a bioinformatician, Hoff focuses on improving and automating genome annotation, i.e. she uses computers to decipher which parts of a DNA sequence are responsible for certain biological functions. Her work centres around the automated prediction of protein-coding genes in genomes. Hoff aims to use the DFG grant to realise her vision of a foundation model-based revolution. “My focus lies on the development of an extensive annotation system for millions of eukaryotic genomes, i.e. we plan to develop a software solution that not only analyses DNA sequences, but is also able to process millions of different organisms in a joint, standardised system. The research of my group will not only determine the positions of the genes in the genomes, but also their biological function and the position and function of repetitive elements. That would be a breakthrough as we currently lack a consistent and reliable method for gaining useful findings from this flood of data.”
New methods for analysing millions of genomes
Existing software solutions reach their limits quickly when faced with the more than 1.5 million eukaryotic species – organisms such as fungi, plants, animals and humans – which are being sequenced as part of the international large-scale research project Earth BioGenome Project (EBP). “We not only have access to a large amount of data, the data is also very complex,” says Hoff. “Traditional methods are unable to keep pace especially with species that are currently underrepresented in the genome database. That is why we are counting on foundation models, which basically work on the same basis as popular chatbots, just for genomic data instead of natural language, as well as other machine-learning methods.”
New approaches from deep learning, such as the ‘Tiberius’ tool that was developed at the University of Greifswald, could considerably improve the accuracy of gene prediction. Nevertheless, a systematic annotation is still missing for a large proportion of the previously generated genomic data. “Without scalable new methods, a huge amount of potential remains unused,” stresses the bioinformatician. “Improved prediction of genes will have immediate effects on the development of new medication, plant resistance against disease, or the protection of endangered species. We don’t plan to merely generate data, we aim to gain useful insights from it – for science, but also for society.”
Heisenberg Project to strengthen genomic research in Greifswald
Hoff has gained international recognition as an expert for the analysis of eukaryotic genome data. Her most significant developments include the software pipelines BRAKER and Galba. BRAKER has already been cited more than 4,400 times according to Google Scholar, and downloaded more than 39,000 times as a software tool on GitHub. But this project goes further: it’s not just about finding individual genes, but about the systematic identification and analysis of all gene structures and functions of millions of species.
Although the Heisenberg-funded project can be realised anywhere in Germany, Hoff has deliberately chosen the University of Greifswald and the Institute of Microbiology. She explains, “the close collaboration with experimental microbiologists, in particular in the field of unicellular organisms, is of central importance for the validation and application of the developed methods.” Hoff sees Greifswald as an ideal environment for her research: “The interdisciplinary research building at the Center for Functional Genomics of Microbes and the close ties to the Helmholtz Institute for One Health, which is currently being established, provide perfect conditions for our work. The University’s modern computing infrastructure with specialist graphic processing units (GPUs) enables the CPU-intensive deep-learning analyses that we require. That provides me with the opportunity to contribute long term to the future of genome research and, at the same time, train the next generation of bioinformaticians.”
Further information
The DFG's Heisenberg Programme supports outstanding academics by providing them with the opportunity to establish an independent research group and enhance their academic career with the long-term goal of attaining a professorship. The grant spans a period of five years and provides extensive financial and institutional support to innovative research projects.
Contact at the University of Greifswald
PD Dr. Katharina Hoff
Institute of Mathematics and Computer Science
Walther-Rathenau-Straße 47, 17489 Greifswald
Tel.: +49 3834 420 4624
katharina.hoffuni-greifswaldde
https://www.linkedin.com/in/pd-dr-katharina-hoff-53283596/
https://bsky.app/profile/katharinahoff.bsky.social
https://fosstodon.org/@KatharinaHoff
