Author Post: On Why We Build Models

16 July 2015

Author Post: On Why We Build Models

The work from the final portion of my Ph.D. thesis is online today in Cell Systems, a brand new journal from Cell Press. Though nascent, the journal has already published exciting studies on the geospatial distribution of bacteria in cities, using CRISPR-Cas9 to rapidly engineer yeast metabolic pathways, and programming synthetic circuits in gut microbiota ¹.

In it, we use differential equation modeling to understand how AXL (and very likely the other TAM receptor tyrosine kinases) senses phosphatidylserine (PtdSer)-presenting debris, a long-understood core function of the family. This was by far the most challenging undertaking of my Ph.D.–from utilizing the computational techniques to carefully designing the experimental measurements at each stage. The experience has taught me an enormous amount about the purpose and power of systems biology².

Shou et. al. very recently described it best:

When scientists want to explain some aspect of nature, they tend to make observations of the natural world or collect experimental data, and then extract regularities or patterns from these observations and data, possibly using some form of statistical analysis. Characterizing these regularities or patterns can help scientists to generate new hypotheses, but statistical correlations on their own do not constitute understanding. Rather, it is when a mechanistic explanation of the regularities or patterns is developed from underlying principles, while relying on as few assumptions as possible, that a theory is born. A scientific theory thus provides a unifying framework that can explain a large class of empirical data. A scientific theory is also capable of making predictions that can be tested experimentally. Moreover, a theory can be refined in the light of new experimental data, and then be used to make new predictions, which can also be tested: over time this cycle of prediction, testing and refinement should result in a more robust and quantitative theory. Thus, the union of empirical and quantitative theoretical work should be a hallmark of any scientific discipline.

In a sense, kinetic rate equation models are fundamentally different from most data-driven approaches. These models of molecular systems make very few assumptions about underlying processes, meaning that we can not only learn from models that reproduce a behavior but often also from the ones that “break” and can’t fit the data. Relying only on experimental results doesn’t shield you from assumptions; in biology, experimental designs often rely on the underlying assumption that any one component of an organism has a unimodal relationship to the phenotype we observe. This is in part because the most simple (and often only feasible) experiments in one’s empirical toolbox are knockdown and/or overexpression, along with qualitative biochemical analyses. Biological systems are complex and nonlinear in their behavior though, and this initial view can quickly break down. For certain scales, a kinetic model can, in essence, be used as a scientific theory. Developed from underlying principles of rate kinetics and explaining the data we observe with as few assumptions as possible, it provides a unified framework for communication and further testing of our current understanding.

In the case of TAM receptors, manipulation by the addition or removal of ligand, receptor, or PtdSer has produced many observations. Some seemingly conflict with the previous mental model of particular factors being simply “activating” or “repressive.” The new theory/model that we propose, while being more complex, is maximally simple for the phenomena we wish to explain. With this new model, we can see how the previous mental model could be misleading, as complex, nonlinear relationships exist with respect to the timescale of one’s assay and factors such as PtdSer. That isn’t to say it is correct–it will always be wrong. In the near term, while we model only one receptor, AXL, the other TAM receptors MerTK and Tyro3 have critical roles both in normal physiology and cancer, and our understanding of how these receptors are similar or different is just beginning to be assembled. As TAM-targeted therapies are developed and evaluated in vivo, these models will help us understand how they work and develop even better therapies.

We need better methods at every step of this type of modeling, from construction and parameterization to understanding predictions³. Biology is complex, and mechanistic models such as these quickly become intractable on larger scales. Our study introduces the additional complexity of spatial scale, which makes each step of the process, as well as the corresponding experimental techniques, considerably more challenging. In the long term, I believe spatial organization of signaling will prove to be a critical component of understanding many cellular processes. We are going to need systems techniques to understand them.

I am incredibly excited by this journal’s creation. Systems biology has lacked a true “home” even as it has matured into an established field, and I am enthusiastic this will be it. ↩
Ironically, I had to be convinced that such a project would be challenging enough to be interesting. After all, we learned about ODE modeling in undergraduate classes, it’s been applied for decades, and I would only be considering two proteins! Surely in the era of big data, a few rate parameters could be thrown together in a weekend and output some simulations of receptor activation. And rather than simulate a phenomena, why not just measure it experimentally? ↩
Of course there are huge efforts to develop better tools for these purposes, but these remain difficult problems. Notable promising directions are rule-based models (such as BioNetGen) and brave attempts to try and accelerate Markov Chain Monte Carlo (e.g. DREAM). Truly rigorous modeling still remains a challenge even for the computationally adept however. It would be wonderful to have a rule-based framework that handled rigorous parameterization, spatial modeling through finite differences, compile-time detailed balance, and automatic differentiation (since all of these systems are stiff of course), but such a tool would be a considerable computational undertaking. ↩