Project Detail |
Vertebrate genomes consist of tens of thousands of genes and yet they produce only a limited number of stable cell types. Our understanding of how cell types are encoded in the genome is still lacking. The expression of genes is controlled by gene regulatory elements, which are evolutionarily much less conserved than genes are and can be located far away in the genome. My main goal is to better understand how the genomic sequence underlies cell identity in the pallium across vertebrate species and I hypothesize that gene regulatory logic can be directly learned from the genomic sequence and used to predict cell types. Recent advances in the field of single cell sequencing have made the generation of large epigenomic datasets possible, while novel machine learning models like DNA language models are providing unprecedented insights into the regulatory logic of the genome. In addition, for many non-model species high quality reference genomes are becoming available as there is an increased awareness for the need to preserve biodiversity. I will conduct single cell multiome sequencing, supplemented with low-cost single-cell ATAC-seq using the HyDrop platform - developed in the host-lab - to profile regulatory elements across cell types in the pallium from multiple species, including mammals, birds, lizards and fish. My experience in generating and annotating single cell data from the brain will aid in the analysis and alignment of the generated datasets. Next, I will use this data to study the relationships between the genome of a species and the identified cell types by training species-aware DNA language models and using these models to identify species-specific and conserved regulatory logic. These models will not only be interesting from an evolutionary perspective, but also aid in ongoing efforts to develop synthetic enhancers used to target highly specific cell types, which can be applied in many therapeutic applications. |