I'm Aaron Meyer: a bioengineer, cyclist, and nerd.

You can also find me on Twitter, Github, LinkedIn, and Google Scholar.

Author Post: On Why We Build Models

The work from the final portion of my Ph.D. thesis is online today in Cell Systems, a brand new journal from Cell Press. Though nascent, the journal has already published exciting studies on the geospatial distribution of bacteria in cities, using CRISPR-Cas9 to rapidly engineer yeast metabolic pathways, and programming synthetic circuits in gut microbiota 1.

In it, we use differential equation modeling to understand how AXL (and very likely the other TAM receptor tyrosine kinases) senses phosphatidylserine (PtdSer)-presenting debris, a long-understood core function of the family. This was by far the most challenging undertaking of my Ph.D.–from utilizing the computational techniques to carefully designing the experimental measurements at each stage. The experience has taught me an enormous amount about the purpose and power of systems biology2.

Shou et. al. very recently described it best:

When scientists want to explain some aspect of nature, they tend to make observations of the natural world or collect experimental data, and then extract regularities or patterns from these observations and data, possibly using some form of statistical analysis. Characterizing these regularities or patterns can help scientists to generate new hypotheses, but statistical correlations on their own do not constitute understanding. Rather, it is when a mechanistic explanation of the regularities or patterns is developed from underlying principles, while relying on as few assumptions as possible, that a theory is born. A scientific theory thus provides a unifying framework that can explain a large class of empirical data. A scientific theory is also capable of making predictions that can be tested experimentally. Moreover, a theory can be refined in the light of new experimental data, and then be used to make new predictions, which can also be tested: over time this cycle of prediction, testing and refinement should result in a more robust and quantitative theory. Thus, the union of empirical and quantitative theoretical work should be a hallmark of any scientific discipline.

In a sense, kinetic rate equation models are fundamentally different from most data-driven approaches. These models of molecular systems make very few assumptions about underlying processes, meaning that we can not only learn from models that reproduce a behavior but often also from the ones that “break” and can’t fit the data. Relying only on experimental results doesn’t shield you from assumptions; in biology, experimental designs often rely on the underlying assumption that any one component of an organism has a unimodal relationship to the phenotype we observe. This is in part because the most simple (and often only feasible) experiments in one’s empirical toolbox are knockdown and/or overexpression, along with qualitative biochemical analyses. Biological systems are complex and nonlinear in their behavior though, and this initial view can quickly break down. For certain scales, a kinetic model can, in essence, be used as a scientific theory. Developed from underlying principles of rate kinetics and explaining the data we observe with as few assumptions as possible, it provides a unified framework for communication and further testing of our current understanding.

In the case of TAM receptors, manipulation by the addition or removal of ligand, receptor, or PtdSer has produced many observations. Some seemingly conflict with the previous mental model of particular factors being simply “activating” or “repressive.” The new theory/model that we propose, while being more complex, is maximally simple for the phenomena we wish to explain. With this new model, we can see how the previous mental model could be misleading, as complex, nonlinear relationships exist with respect to the timescale of one’s assay and factors such as PtdSer. That isn’t to say it is correct–it will always be wrong. In the near term, while we model only one receptor, AXL, the other TAM receptors MerTK and Tyro3 have critical roles both in normal physiology and cancer, and our understanding of how these receptors are similar or different is just beginning to be assembled. As TAM-targeted therapies are developed and evaluated in vivo, these models will help us understand how they work and develop even better therapies.

We need better methods at every step of this type of modeling, from construction and parameterization to understanding predictions3. Biology is complex, and mechanistic models such as these quickly become intractable on larger scales. Our study introduces the additional complexity of spatial scale, which makes each step of the process, as well as the corresponding experimental techniques, considerably more challenging. In the long term, I believe spatial organization of signaling will prove to be a critical component of understanding many cellular processes. We are going to need systems techniques to understand them.

  1. I am incredibly excited by this journal’s creation. Systems biology has lacked a true “home” even as it has matured into an established field, and I am enthusiastic this will be it. 

  2. Ironically, I had to be convinced that such a project would be challenging enough to be interesting. After all, we learned about ODE modeling in undergraduate classes, it’s been applied for decades, and I would only be considering two proteins! Surely in the era of big data, a few rate parameters could be thrown together in a weekend and output some simulations of receptor activation. And rather than simulate a phenomena, why not just measure it experimentally? 

  3. Of course there are huge efforts to develop better tools for these purposes, but these remain difficult problems. Notable promising directions are rule-based models (such as BioNetGen) and brave attempts to try and accelerate Markov Chain Monte Carlo (e.g. DREAM). Truly rigorous modeling still remains a challenge even for the computationally adept however. It would be wonderful to have a rule-based framework that handled rigorous parameterization, spatial modeling through finite differences, compile-time detailed balance, and automatic differentiation (since all of these systems are stiff of course), but such a tool would be a considerable computational undertaking. 

The Bad Luck of Improper Data Interpretation

An article and news summary is out in Science this week with a bold claim: that two-thirds of all cancers are due to baseline mutagenesis intrinsic to cell division, and not environmental factors or genetics. This is based on observed correlation between the number of stem cell divisions and incidence of cancer in various tissues. Certainly, such a conclusion would have immense consequences—it would emphasize treatment strategies over those of prevention, and refocus efforts away from understanding environmental toxins. Sadly, this conclusion is based on a frightening variety of errors in interpretation and basic math.

Most specifically, the two-thirds figure comes from the correlation coefficient between stem cell divisions and cancer incidence. While it is the case that the former explains 65% of the variation in cancer incidence between tissues, this does not translate to a percentage of cancer cases. This data is plotted on a log-log axis, and so distance along the plot is not linear. As the cancers with clear environmental factors are more common, the 65% claim is surely much lower.

Second, while this correlation might explain variation between tissues, it does not suggest the source of mutagenesis. Any factor that had similar effects throughout all tissues would vary this plot on the y-axis but have no effect on the correlation. Notably, as the data is plotted on a log axis, even for tissue-specific toxins many fold changes in the incidences of these cancers would still have no effect on the conclusions of this study.

Additional failures of interpretation in this study suggest little understanding of the data analysis involved. For example, k-means clustering a single variable lends little insight, and with outliers on either end is sure to form two groups with separation near the center of the range. This provides no evidence of there being two “classes” of cancer.

This article seems to be the product of lax peer review and pressure to over-interpret data to boost public interest. Both of these provide short-term gain to those involved but in the long run corrupt the scientific literature and erode public trust in science. Don’t do it!

Early Independence

Science doesn’t simply happen when money is spent; it requires immense effort, creative ideas, and dedicated time from well-trained scientists. Many factors can threaten these other requirements, such as funding instability and the aging of scientists. The average age at which an investigator receives their first R01, the mainstay grant of biomedical research, is now well into the mid-40’s. This drives many of the most talented individuals out of biomedical research, and curtails the benefit we as investors in biomedical research derive from those who remain, by limiting their ability to perform independent science during some of their most creative years.

To begin to address this one problem, the NIH has begun an experiment, the Early Independence Award, funding young investigators immediately after their Ph.D. so that they may undertake their own research independently. I’m excited to be officially joining this experiment, and hope you’ll see exciting work from the Meyer lab at MIT soon.