envelope external link

I'm Aaron Meyer: a bioengineer, cyclist, and nerd.

You can also find me on Mastodon, Github, LinkedIn, and Google Scholar.

Bridging the Gap: A Review of "Mesoscale Modules as Fundamental Units of Tissue Function"

In their recent perspective, Chen et al. observe that the leap from single-cell phenotypes to whole-organ function is too vast to be bridged by purely data-driven methods. They propose the existence of “mesoscale modules”—intermediate, functional units of tissue organization that serve as the building blocks for higher-order physiological behaviors. Drawing an analogy to the “network motifs” (like feed-forward loops) that revolutionized our understanding of gene regulatory networks (GRNs), the authors suggest that tissues are composed of recurring architectural patterns.

The authors propose several example modules differentiated by their function and structure:

  • Sheets: 2D arrays (like epithelia) that form selective barriers and facilitate mechanosensing.
  • Streets: Conduits formed by ECM or cells that expedite long-range transport.
  • Proliferative Conveyors: Structures like intestinal crypts that harness cell division to produce a controlled output of differentiated cells.
  • Repeaters: Arrays of sensory units (e.g., villi or hepatic lobules) that enhance signal detection and range.
  • Search Parties, Spatial A/D Converters, and Flocks: Modules governing collective migration, decision-making boundaries, and coordinated movement.

First, this is an outstanding, beautiful perspective piece. I love that this is an attempt to bring together ideas about how to integratively model intercellular systems. This perspective, along with recent work such as Johnson et al. (2025), shows that systems biologists have much to contribute to this problem. Where Chen et al. define the structural units (the “nouns” of tissue architecture), Johnson et al. provide the “grammar”—a human-interpretable language for encoding the behavioral rules that drive these agents. Together, these works suggest a future where we can describe a tissue by its modular architecture and simulate it using a standardized grammar of cell behaviors.

It also reflects that conceptual advancements will be necessary to understand multicellular networks. Purely data-driven (i.e. data hungry) methods won’t go as far at larger scales. We can obtain millions of cells from individuals to build atlases, but we can’t obtain millions of healthy lungs to run perturbations. We need the kind of mechanistic abstraction these papers propose to simulate “virtual tissues” and bridge the gap between static ‘omics data and dynamic function.

After considering the perspective, I think there are some limitations to the mesoscale modules concept. However, I bring these up as opportunities to refine the idea—it seems like the clearest path to me for building models that capture the emergent behaviors of tissues.

Multicellular Architectures Have a More Limited Mapping to Function Compared to GRN Counterparts

In the study of GRNs, a specific network motif (like a coherent feed-forward loop) provides a constrained range of behaviors, but that range is tunable via parameter values. Furthermore, we have excellent theory for exploring behavior across parameter ranges.

In contrast, multicellular architectures, such as Sheets, appear to work only for certain functions, and those functions are not necessarily revealed by the architecture itself. A sheet might be a barrier in the gut or a filtration system in the kidney glomerulus. The architecture alone doesn’t fully define the input-output relationship in the way a GRN motif often does.

Perhaps unbiased ways of looking for cellular features associated with an architecture motif could reveal new biology? For instance, cells in sheets will be enriched for cadherins and tight junction proteins compared to neighboring cells. If we could systematically link these molecular features to the Sheet module, this might help to define the unique niche of the module.

It Is Revealing That Multicellular Architectures Are Mostly Restricted to the Same Features Copied Across Cells

A striking feature of the modules proposed by Chen et al. is that they often involve the same features copied across many cells. For example, all the cells in a Sheet are sheet-like; all units in a Repeater are identical sensory structures.

Perhaps one exception is the Proliferative Conveyor, where cells exist along a differentiation cascade—stem cells at the base, transit-amplifying cells in the middle, and differentiated cells at the exit. This implies a spatial coordination of different cell states.

However, there are almost certainly many motifs in which cells of distinct types coordinate that are missing from this initial list. For instance, with stem cells and germinal centers, you have “feeder-eater” motifs or niche-support motifs. One cell type provides a signal and support for another to regulate the abundance or survival of the latter (e.g., the distal tip cell in the C. elegans gonad conveyor). How could we define the complex cytokine circuits that exist between immune cells? We could use better ways of discovering this coordination across cell types rather than just identifying geometric arrangements of similar cells. We are certain to only be scratching the surface of module complexity.

GRN Motifs Provide the Necessary Information to Run Useful Simulations—This Is Not Yet True for Multicellular Motifs

If you identify a GRN motif, you often have the topology required to write down differential equations that describe evolution and response of the system. This is not (yet) the case for multicellular motifs.

So you observe that cells are organized in a sheet. What is it that you would even model? The parameters—permeability, stiffness, transport rates—are not inherent to the Sheet definition. Maybe one needs a more focused question at this level of abstraction. Cells participate in many different functions, and so they are not as simple as a gene.

However, if the motif itself doesn’t tell you something specific about the behavior of the system, are we actually learning something more than what would come from an atlasing effort? Maybe refinements of modules can provide the specificity required for simulation. For example, Streets are described as facilitating transport, but “flow-based streets” (like blood vessels) are modeled very differently from “neuronal streets” (axon bundles) or “fibrous streets” (ECM tracks).

Given this, maybe there is a hierarchy of multicellular architectures, like GO terms, where there are high-level abstractions (e.g., “Transport Module”) and low-level, simulation-ready abstractions (e.g., “Vascular capillary network”).

Conclusion

I feel like it would be helpful to think about how one defines architectures beyond manual curation. This was done successfully with GRNs, where groups have been able to enumerate all possible motifs (e.g., 3-node networks) and categorize them based on function and frequency. To truly democratize virtual cell laboratories—as Johnson et al. aim to do with their grammar—we need a rigorous way to enumerate and define these mesoscale modules so they can become the reliable subroutines of tissue simulation.

A more formal plea for ambitious AI benchmarks in cancer research

A close friend inspired me to turn my recent blog post into a response to a recent RFI from the National Cancer Institute. As these responses never become public, and I am interested in others’ thoughts regarding some of these ideas, here was my response.


In response to the Request for Information, I offer the following input on the development of priority artificial intelligence benchmarks for cancer research.

What are AI-relevant use cases or tasks in cancer research and care that could be advanced through the availability of high-quality benchmarks?

We must aim far beyond tasks that are already largely solvable, such as image segmentation or treatment extraction from electronic health records. The true need is for benchmarks that define currently unachievable scientific goals. Here are some high-priority AI-relevant use cases in cancer research and care that would greatly benefit from novel benchmarks:

  • Predicting multi-cellular tissue response to localized perturbation: This goes beyond single-cell or bulk analysis. For example: “If you perturb gene X in cell population A within a specific tumor microenvironment, what specific and quantifiable changes would you expect to see in the gene expression, signaling pathways, and phenotypic behavior of neighboring cell populations B and C, and how would this collectively impact tumor growth/metastasis in vivo?” This requires understanding complex intercellular communication and emergent properties of tissues.
  • Predicting cancer incidence from molecular measurements of patient state: Patients present with many wide-spread molecular changes upon diagnosis, including reprogramming of their immune system. This task could aim to predict which patients in high-risk groups will be diagnosed with cancer, and of which type and when, given readily accessible molecular measurements, such as transcriptomics of their peripheral blood. Significant advancement here would aid the development of both predictive diagnostics and potentially preventative therapies.
  • Identifying optimal multi-modal therapeutic interventions for an individual patient's complex tumor state: This is not just predicting response to a single drug, but identifying the combination and sequence of therapies (e.g., specific chemotherapies, immunotherapies, targeted therapies, radiation, surgery) that will lead to a defined positive outcome (e.g., complete remission, prolonged progression-free survival, minimal side effects) based on their comprehensive molecular, cellular, and clinical profile. This moves beyond broad patient cohorts to truly individualized predictions of therapeutic efficacy and toxicity. This could be evaluated by providing molecular information about the tumor, alongside the sequence of therapies, and predicting the masked long-term survival.
  • Predicting long-term systemic impact of cancer and its treatment on patient physiology and quality of life: This benchmark would focus on integrated, whole-organism modeling. For example: “Given a patient's initial cancer diagnosis and treatment plan, predict the trajectory of specific organ function (e.g., cardiac, renal, neurological), immune system state, and patient-reported quality of life metrics over 5 years, accounting for potential late effects of treatment and disease progression.” This requires integrating diverse data types and understanding inter-organ dependencies.
  • Forecasting cancer evolution and emergence of resistance mechanisms under specific treatment regimens: Instead of merely detecting existing resistance, the benchmark would be: "Given a patient's tumor molecular profile at diagnosis and a proposed treatment regimen, predict the specific genetic and phenotypic alterations the tumor will acquire, and estimate the timeframe for this resistance to emerge in vivo." This requires dynamic, predictive models of evolutionary trajectories.

These are use cases where benchmarks are not merely scarce; they are non-existent because they require a level of biological understanding and predictive power we do not currently possess. Focusing on such challenges will ensure that AI development is aimed at generating novel biological capabilities, not just automating what we can already do.

What are the desired characteristics of benchmarks for these use cases, including but not limited to considerations of quality, utility, and availability?

The most critical characteristic of a benchmark should be its ability to define a quantifiable, testable, and currently unsolved scientific problem. Its utility will not be in comparing a dozen similar models on an existing dataset, but in providing a clear "North Star" for the entire field, compelling us to create models with entirely new predictive powers. This framework necessitates the adoption of masked or sequestered testing sets as a standard practice. By keeping evaluation data hidden, we can ensure objective, unbiased assessment of model performance, a crucial guardrail against the self-deception and publication bias that can otherwise hinder true progress. In many cases, the benchmark will define a task for which the necessary training data has not yet been collected, thereby spurring new experimental work as an integral part of the solution.

Along these lines, some specific desired characteristics for benchmarks include:

  • Defining a currently unachievable task: The benchmark must target problems that are not trivially solved by existing statistical methods and require significant advancements in AI and systems biology.
  • Clear, quantifiable metrics for success: Analogous to AlphaFold's protein structure prediction, there must be objective, numerical ways to assess model performance. From the examples above, this could involve:
    • Quantifiable changes in gene expression and protein levels in specific cell types within a tissue.
    • Measurable reduction in tumor volume, number of metastases, or time to recurrence in in vivo models.
    • Specific and measurable improvements in organ function or patient-reported outcomes.
    • Accuracy in predicting specific resistance mutations or pathways.
  • Requiring novel, out-of-sample data for validation: This is crucial to combat publication bias. The validation dataset must be kept separate and unknown to the model developers during training and development. This promotes true generalizability.
  • Masked testing sets: A portion of the evaluation data should be held secret, accessible only for submitting models and objectively assessing performance. This ensures unbiased evaluation.
  • Biological and clinical relevance: The benchmarks should address questions that, if solved, would genuinely advance our understanding of cancer biology or significantly improve patient care, rather than automating existing clinical or scientific tasks.
  • Multi-modal and multi-scale data integration: The problems often span genomic, proteomic, imaging, clinical, and physiological data, requiring models to integrate information across different biological scales (molecular to organismal).
  • Testable against measurements: The benchmark should be designed such that its proposed solution can eventually be verified through experimental or clinical measurements, even if those measurements are currently challenging to obtain. This encourages the collection of new experimental data and mechanistic understanding.
  • Promoting open exchange of ideas: The benchmark framework should encourage competition and collaboration, fostering an environment where different techniques can be rigorously compared and insights shared. For example, the AI community regularly publishes both approaches that advance performance or do not work.
  • Incentivizing major scientific advancement: The challenges should be significant enough to warrant substantial research effort and potentially large prizes, as seen in the protein structure prediction field.

What datasets currently exist that could contribute to or be adapted for benchmarking? Please include information about their size, annotation, availability, as well as AI use cases they could support.

While numerous datasets currently exist, they are insufficient for creating benchmarks by nature of being currently available. These existing resources are useful for training. However, determining whether a model is effective requires out-of-sample validation data that has never been seen by the developers. Furthermore, the very process of tackling a benchmark should involve the generation of new experimental knowledge. The paradigm should be less about fitting models to existing data and more about using models to generate bold, testable hypotheses.

What are the biggest barriers to creating and/or using benchmarks in cancer research and care?

The greatest barrier to creating and using meaningful benchmarks is a fundamental lack of consensus on the long-term goals for AI in biology. Without a shared understanding of what we are trying to achieve, efforts will remain scattered and focused on incremental advances. This is compounded by the immense difficulty and expense of generating the novel experimental data required for true, out-of-sample validation, which incentivizes a culture of self-validation and overly optimistic reporting. Ultimately, the field is hampered by a focus on automating existing data analysis rather than pursuing genuinely new scientific capabilities. Establishing ambitious, common challenges through benchmarks would be an effective way to overcome this inertia, fostering an open exchange of ideas, and creating a framework where funding can be directed toward efforts that demonstrably push the boundaries of science.

Please provide any additional information you would like to share on this topic.

I hope that my core message has come across, that benchmarks should focus on tasks that we know are currently not possible without major scientific advancement. This means moving beyond incremental improvements on existing tasks. To achieve this, the NCI could consider:

  • Convening a multi-disciplinary working group: Bring together leading cancer biologists, clinicians, systems biologists, and data scientists to collectively define these "grand challenge" benchmarks, like how CASP was established for protein structure.
  • Funding dedicated benchmark development consortia: Support groups specifically tasked with curating existing data, generating new experimental data for masked testing sets, and developing the infrastructure for benchmark competitions.
  • Establishing "AI X-Prize" style challenges: Offer large prizes for models that meet predefined performance thresholds on these highly ambitious, currently unsolved problems in cancer. This would incentivize innovation and attract talent.
  • Funding novel modeling approaches: Once a benchmark is established, the NCI should support competitive applications regarding different modeling or experimental solutions to tackle the challenge. If the benchmarks have defined challenges of great significance, then proposals that make even incremental progress are of value.
  • Prioritizing funding for model validation: Encourage and fund independent validation efforts, especially those involving prospective data collection, to ensure rigor.
  • Fostering data sharing infrastructure: Invest in secure, federated learning platforms or data enclaves that allow AI models to be trained and tested on real-world cancer data while maintaining patient privacy.
  • Encouraging mechanistic interpretability in benchmarks: While performance is key, future benchmarks could also incorporate metrics or requirements for models to provide biologically plausible and interpretable insights, not just black-box predictions.

Ultimately, the goal should be to shift the focus from simple automation to using AI to achieve new biological and clinical understanding and capabilities that were previously out of reach. This strategic shift is crucial for AI to truly revolutionize cancer research and care.

I worry that the NCI is going to be convinced that the excitement around AI is a reason for big data collection efforts, again. It is not at all clear to me that we need more data, or what that data would be if so. We are drowning in data. We need to set ambitious goals that can be measured through hard benchmarks that we think are not currently solvable and then provide modeling resources and incentives to understand why they cannot currently be solved. Only then should we go collect more data.

Archives