Research projects

The physical world around us is manifestly “multi-scale”, in the sense that many details at a finer scale don’t matter. For example, to understand the motion of waves you don’t have to know that water is made of many molecules. It seems quite obvious that machine learning and deep networks exploit this fact, but the how has remained elusive. In this project, we attack this challenge through investigating how molecular microscopic structure is recognized and represented by deep networks, and how this translates into predictions of the coarser kinetics of the molecular conformations. To this end, we study small peptides (as the one shown left) and combine molecular dynamics simulations with methods from statistical physics and new developments in computer science. From the perspective of computer science, we try to understand more generically how deep learning methods perform coarse-graining of (natural) data into scales of abstraction: Given a prediction task, how does a deep network split the many bits of information in an input step-by-step into relevant and irrelevant pieces, and what is the structure of the statistical variability among representations that solve the same task (but, for example, emerge from different random initializations)? What mechanisms lead to the emergence of this structure, and which implicit assumptions are encoded in it? Can this be connected to processes of natural pattern formation?


In this project we investigate how information processing is realized in matter and how different physical systems can be used as implementation for neuromorphic devices. For brain-inspired computational approaches, systems with strong non-linearities are particularly suited. Besides obeying non-linearities, organics as well as spintronics-based system provide several advantages. For example, spintronics based systems are very energy efficient and complex, whereas organic systems can be easily produced in flexible substrates and are biocompatible. This project is structured in three subprojects entitled ``stochastic logic implemented in spintronics’’, ``spintronics-based reservoir computing’’ and ``neuromorphic systems via organic polymers’’.

Discovery of polygenic adaptation patterns in Chironomus Riparius. The non-biting midge species is known to adapt quickly to changing environmental and life situations. In this experiment, a C. Riparius population will be put under selection pressure by only using the early emerging midges for continuing the line. In the following sequencing analysis of these adapted individuals, not only a genetic adaptation will be demonstrated, but also the thesis of polygenic adaptation will be tested. It is planned to uncover the pattern of those genes that have changed together under selection pressure. First of all, it is of great importance to secure a solid method of variant calling, which mostly makes use of machine learning. With the help of unsupervised machine learning (also better known as data mining), the pattern of polygenic adaptation is to be found by means of pattern recognition and clustering.

We plan to develop artificial neural network-based tools for modelling
topological defects such as magnetic skyrmions or defects in nematic liquid
crystals on multiple scales. The goal is to bridge between microscopic
representations, where defects emerge as nonlinear excitations, and
defect-particle representations, where defects are treated as explicit objects.
The coarse-grained representation at the defect-particle level will ideally be
a dynamic neural network. Our results will provide a deeper understanding how
information is gained or lost in the learning, coarsening and refining
processes, thereby allowing to reveal and predict processes on the microscopic
level based on macroscopic observed data

Cognitive biological systems provide valuable information for the design of Artificial Intelligence. In particular, social insects show swarm intelligence and make collective decisions based on information acquisition and exchange, networking and communication. We will study the evolution and ligand-binding of Odorant Receptor (OR) genes, used by ants to communicate within their societies chemically. OR genes diversified in ants, and the functions of the more than 400 genes are largely unknown. We will use machine learning to predict the biological function of OR genes using sequence, structure, ligand interactions and functional features from ORs in Drosophila and other insects.

Autism is a neurodevelopmental disease with complex genetic background and a strong variability between autistic patients from intellectual disability to high intelligence. A comparative analysis of cognitive performance of affected individuals shows that autism can be considered as a disorder of neural information processing and aberrant connectivity. Genes encoding auxiliary α2δ subunits of calcium channels, which are known to trigger synaptogenesis, were consistently revealed among autism-risk genes. In this project, we will analyse and model effective connectivity in spiking neuronal networks using novel approach based on information relativity principle. In cultured neuronal networks, we will investigate the spontaneous and input-driven formation of connectivity and evaluate the impact of up- and downregulation of α2δ subunits on neuron-to-neuron information transfer and ensemble activity. These outcomes will be further implemented in entropy-based AI algorithms to model networks built of spiking egotistic neurons with memory function. A deeper understanding of how information is relayed, integrated and stored in rather small and simple networks can potentially pave the way for scaling up artificial networks to build neuromorphic computing systems of higher complexity.

Understanding the neuronal code is one of the biggest challenges in basic life sciences. To provide benchmark physiological data for large-scale neuronal networks, simultaneous recordings from many neurons with highest temporal resolution in a defined cortical column of an awake animal performing a defined behavioural task are required. Exactly this type of interdisciplinary experiment has been performed in a collaborative effort of the Luhmann and Stüttgen lab.
The central goal of this project is to access the quality and performance of the existent and emerging machine learning (ML) and artificial intelligence (AI) approaches with respect to their ability to describe, to explain and to predict the neuronal behaviour on the basis of these data. More common ML and AI approaches (hidden Markov models, shallow and reinforced learning, machine learning) will be compared to the very recently-developed Scalable Probabilistic Approximation approaches (Gerber et al., Sci. Adv. 2020) and to the entropy-driven approaches. Results of these comparison will aim at identifying the simplest possible (but not simpler then necessary) models that provide the most adequate lab-data descriptions. Identification of such models will enhance our understanding of emergence in the neuronal activity and provide a guidance for further experiments.

Dark matter, the invisible substance making over 80% of the matter in the universe, is one of the most fundamental mysteries of modern physics. Physicists expect dark-matter to interact only very weakly with normal matter, thus yielding feeble and model-dependent signatures to search for in experimental data. The sheer volume of possible models and data render the hunt for dark matter inextricable.
We argue that the strong potency of machine learning towards pattern recognition could prove useful to detect dark-matter : a dark-matter signal could be considered as a pattern in the data, rather than a signal emerging from a well-defined physical model. Thus, we aim to investigate unsupervised learning methods and methods for pattern detection/mining to address the problem. We propose to explore these opportunities which could drastically increase the probability of detecting new particles and dark matter, potentially solving one of the great puzzles of modern days.

We propose to devise a novel deep learning-based method for the detection of natural selection in sets of modern and ancient genomes. Natural selection is the change in allele frequencies with time due to differential effects on an organism’s survival and reproduction, and the evolutionary process most responsible for adaptation. In addition to the classification of genomic regions as selected or not, we aim to estimate relevant parameters like selection strength and timing.

One of the main unanswered questions in genome biology is the “C-value enigma”. It describes the fact that some biological species accumulate tremendous amounts of repetitive DNA leading to very large genome sizes (c-values), while other species –intriguingly even closely related ones- feature a streamlined and repeat-poor genomic architecture. It is unclear why this is so, and how this influences gene function and organismal adaptation. The precise identification and annotation of repeats is the prerequisite for understanding their origin, evolution and potential function in eukaryotic genomes. However, repetitive DNA regions are often formed by intermingled related and different sequence types, forming a bewildering mixture which obscures the inference of genetic mechanisms that are at work. We will therefore develop a strategy and novel algorithms for AI-based repeat prediction in newly sequenced genomes. We will then apply the novel tools to annotate complex repeat patterns in the genomes of two closely related insects, which differ substantially in the amount and distribution of repetitive DNA. In this model system, we intend to infer candidate mechanisms of change in the repeat repertoire and, ultimately, the genetic basis of species-specific repeat change and the C-value enigma.


Recent years have seen a tremendous increase in the volume of data generated in the life sciences, especially propelled by the rapid progress of high throughput technologies such as next-generation sequencing (NGS). Consequently, the number of genome-sequencing projects and the amount of sequencing data is dramatically increasing while sequencing cost continues to decrease. This project explores the design of new intelligent systems for analyzing large-scale NGS data based on emerging AI technologies that can take advantage of naturally arising genomic structures.