We develop novel methods to infer direct interactions between expression levels from transcriptome/proteome data. The problem is that the space of potential interactions is high dimensional, the underlying molecular mechanisms are highly non-linear, and data is scarce. Therefore the inclusion of prior information about network sparsity is crucial.
DNA motif search
Prediction binding affinities of transcription factors to DNA sequence is a fundamental challenge in molecular biology. The current benchmark is set by DeepBind, a deep learning approach based on convolutional neural networks. However, we can show that motif prediction can be improved by reducing model complexity and by including Bayesian sampling to improve generalisability.
Predicting protein levels from sequence data
Most of our effort is currently devoted to predict translational efficiency from RNA sequence data. Despite the high combinatorial complexity of gene sequences, predictions with high accuracies is possible by making use of prior information that local physical processes (translational initiation and peptide elongation) determine the amount of protein produced per mRNA per unit time.
Predicting secreted proteins from sequence data
One of the big biotechnological challenges is to change cells to become highly efficient recombinant protein producers. Often the limitation is not only to produce sufficiently high levels of recombinant proteins but also to export them from cells. Besides the classical secretion via signal peptides, there exist also non-classical protein export pathways whose mechanisms are not yet fully understood. We currently analyse the secretome of mammalian cells to predict protein export.