category
bioRxiv
date
Mar 20, 2026
slug
status
Published
summary
提出SCALE模型解决虚拟细胞扰动预测的三大瓶颈:1)基于BioNeMo构建高效训练推理框架,预训练速度提升12.51倍;2)设计条件传输的set-aware流架构,结合LLaMA编码与终点监督提升稳定性;3)采用生物意义指标评估,PDCorr和DE Overlap指标分别提升12.02%和10.66%。
tags
单细胞测序
type
Post

📄 原文题目

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

🔗 原文链接

💡 AI 核心解读

提出SCALE模型解决虚拟细胞扰动预测的三大瓶颈:1)基于BioNeMo构建高效训练推理框架,预训练速度提升12.51倍;2)设计条件传输的set-aware流架构,结合LLaMA编码与终点监督提升稳定性;3)采用生物意义指标评估,PDCorr和DE Overlap指标分别提升12.02%和10.66%。

📝 英文原版摘要

Virtual cell models aim to enable in silico experimentation by predicting how cells respond to genetic, chemical, or cytokine perturbations from single-cell measurements. In practice, however, large-scale perturbation prediction remains constrained by three coupled bottlenecks: inefficient training and inference pipelines, unstable modeling in high-dimensional sparse expression space, and evaluation protocols that overemphasize reconstruction-like accuracy while underestimating biological fidelity. In this work we present a specialized large-scale foundation model SCALE for virtual cell perturbation prediction that addresses the above limitations jointly. First, we build a BioNeMo-based training and inference framework that substantially improves data throughput, distributed scalability, and deployment efficiency, yielding 12.51* speedup on pretrain and 1.29* on inference over the prior SOTA pipeline under matched system settings. Second, we formulate perturbation prediction as conditional transport and implement it with a set-aware flow architecture that couples LLaMA-based cellular encoding with endpoint-oriented supervision. This design yields more stable training and stronger recovery of perturbation effects. Third, we evaluate the model on Tahoe-100M using a rigorous cell-level protocol centered on biologically meaningful metrics rather than reconstruction alone. On this benchmark, our model improves PDCorr by 12.02% and DE Overlap by 10.66% over STATE. Together, these results suggest that advancing virtual cells requires not only better generative objectives, but also the co-design of scalable infrastructure, stable transport modeling, and biologically faithful evaluation.
药理学METTL3抑制减弱HIV-1在CD4+ T细胞中的潜伏逆转生态背景揭示草甘膦耐受性对大豆代谢及病毒载体Epilachna varivestis表现的隐性效应
Loading...