category
bioRxiv
date
Feb 24, 2026
slug
status
Published
summary
创新点包括:1) 构建了参数规模从100M到1B的植物基因组基础模型系列;2) 预训练数据涵盖43种系统发育多样化的植物基因组;3) 在调控元件注释、基因表达推断和变异效应预测等任务中达到SOTA性能;4) 建立了植物基因组学领域的大模型预训练范式。
tags
基因编辑
type
Post
📄 原文题目
BOTANIC-0: a series of foundation models for plant genomic data
🔗 原文链接
💡 AI 核心解读
创新点包括:1) 构建了参数规模从100M到1B的植物基因组基础模型系列;2) 预训练数据涵盖43种系统发育多样化的植物基因组;3) 在调控元件注释、基因表达推断和变异效应预测等任务中达到SOTA性能;4) 建立了植物基因组学领域的大模型预训练范式。
📝 英文原版摘要
Genomic language models (gLMs) have emerged as a powerful paradigm for learning regulatory biology directly from DNA sequence. Here, we introduce Botanic0, a family of plant genomic foundation models spanning 100M to 1B parameters and pretrained on 43 phylogenetically diverse plant genomes. The Botanic0-S, Botanic0-M, and Botanic0-L models form the first generation of a long-term research initiative, dedicated to advancing crop improvement research, genotype-to-phenotype modeling, and sequence-based genome editing. The architecture, pre-training pipeline and pre-training dataset of Botanic0 follow the seminal work of [1]. Across a broad suite of genomic and genetic prediction tasks, including regulatory element annotation, gene expression inference, and variant effect prediction, Botanic0 models achieve performance competitive with state-of-the-art foundation models, both in zero-shot settings and after fine-tuning. Scaling analyses reveal consistent improvements in predictive power with increased model capacity, highlighting the benefits of large-model pretraining for plant genomics. This work establishes our ability to train foundation models at scale, and lays the foundation for the next generations of models to come. To support reproducible research and community benchmarking, we release all Botanic0 models at https://huggingface.co/living-models/models.
- 作者:NotionNext
- 链接:https://tangly1024.com/article/31148bd6-1f96-81a3-a3fb-c3bc21f92a04
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
相关文章
