category
bioRxiv
date
Feb 24, 2026
slug
status
Published
summary
创新点包括:1) 构建了参数规模从100M到1B的植物基因组基础模型系列;2) 预训练数据涵盖43种系统发育多样化的植物基因组;3) 在调控元件注释、基因表达推断和变异效应预测等任务中达到SOTA性能;4) 建立了植物基因组学领域的大模型预训练范式。
tags
基因编辑
type
Post

📄 原文题目

BOTANIC-0: a series of foundation models for plant genomic data

🔗 原文链接

💡 AI 核心解读

创新点包括:1) 构建了参数规模从100M到1B的植物基因组基础模型系列;2) 预训练数据涵盖43种系统发育多样化的植物基因组;3) 在调控元件注释、基因表达推断和变异效应预测等任务中达到SOTA性能;4) 建立了植物基因组学领域的大模型预训练范式。

📝 英文原版摘要

Genomic language models (gLMs) have emerged as a powerful paradigm for learning regulatory biology directly from DNA sequence. Here, we introduce Botanic0, a family of plant genomic foundation models spanning 100M to 1B parameters and pretrained on 43 phylogenetically diverse plant genomes. The Botanic0-S, Botanic0-M, and Botanic0-L models form the first generation of a long-term research initiative, dedicated to advancing crop improvement research, genotype-to-phenotype modeling, and sequence-based genome editing. The architecture, pre-training pipeline and pre-training dataset of Botanic0 follow the seminal work of [1]. Across a broad suite of genomic and genetic prediction tasks, including regulatory element annotation, gene expression inference, and variant effect prediction, Botanic0 models achieve performance competitive with state-of-the-art foundation models, both in zero-shot settings and after fine-tuning. Scaling analyses reveal consistent improvements in predictive power with increased model capacity, highlighting the benefits of large-model pretraining for plant genomics. This work establishes our ability to train foundation models at scale, and lays the foundation for the next generations of models to come. To support reproducible research and community benchmarking, we release all Botanic0 models at https://huggingface.co/living-models/models.
运输驱动的硫代葡萄糖苷空间模式构建根部微生物组组装溺亡于沙海:超干旱阿塔卡马沙漠中Tillandsia的沙面生长
Loading...