category
bioRxiv
date
Mar 20, 2026
slug
status
Published
summary
提出SCALE模型解决虚拟细胞扰动预测的三大瓶颈:1)基于BioNeMo构建高效训练推理框架,预训练速度提升12.51倍;2)设计条件传输的set-aware流架构,结合LLaMA编码与终点监督提升稳定性;3)采用生物意义指标评估,PDCorr和DE Overlap指标分别提升12.02%和10.66%。
tags
单细胞测序
type
Post
📄 原文题目
SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction
🔗 原文链接
💡 AI 核心解读
提出SCALE模型解决虚拟细胞扰动预测的三大瓶颈:1)基于BioNeMo构建高效训练推理框架,预训练速度提升12.51倍;2)设计条件传输的set-aware流架构,结合LLaMA编码与终点监督提升稳定性;3)采用生物意义指标评估,PDCorr和DE Overlap指标分别提升12.02%和10.66%。
📝 英文原版摘要
Virtual cell models aim to enable in silico experimentation by predicting how cells respond to genetic, chemical, or cytokine perturbations from single-cell measurements. In practice, however, large-scale perturbation prediction remains constrained by three coupled bottlenecks: inefficient training and inference pipelines, unstable modeling in high-dimensional sparse expression space, and evaluation protocols that overemphasize reconstruction-like accuracy while underestimating biological fidelity. In this work we present a specialized large-scale foundation model SCALE for virtual cell perturbation prediction that addresses the above limitations jointly. First, we build a BioNeMo-based training and inference framework that substantially improves data throughput, distributed scalability, and deployment efficiency, yielding 12.51* speedup on pretrain and 1.29* on inference over the prior SOTA pipeline under matched system settings. Second, we formulate perturbation prediction as conditional transport and implement it with a set-aware flow architecture that couples LLaMA-based cellular encoding with endpoint-oriented supervision. This design yields more stable training and stronger recovery of perturbation effects. Third, we evaluate the model on Tahoe-100M using a rigorous cell-level protocol centered on biologically meaningful metrics rather than reconstruction alone. On this benchmark, our model improves PDCorr by 12.02% and DE Overlap by 10.66% over STATE. Together, these results suggest that advancing virtual cells requires not only better generative objectives, but also the co-design of scalable infrastructure, stable transport modeling, and biologically faithful evaluation.
- 作者:NotionNext
- 链接:https://tangly1024.com/article/32948bd6-1f96-813f-9e91-f921a352111b
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
相关文章
