category
NAR
date
Feb 25, 2026
slug
status
Published
summary
提出SynCodonLM模型,通过同义密码子约束掩码机制分离密码子层级与蛋白质层级语义,基于核酸特性聚类密码子,显著提升DNA特征相关基准测试表现,推动合成生物学中的序列设计与生物过程研究。
tags
合成生物学
蛋白质组学
蛋白质进化
type
Post

📄 原文题目

Advancing codon language modeling with synonymous codon constrained masking

🔗 原文链接

💡 AI 核心解读

提出SynCodonLM模型,通过同义密码子约束掩码机制分离密码子层级与蛋白质层级语义,基于核酸特性聚类密码子,显著提升DNA特征相关基准测试表现,推动合成生物学中的序列设计与生物过程研究。

📝 英文原版摘要

<span class="paragraphSection"><div class="boxTitle">Abstract</div>Codon language models offer a promising framework for modeling protein-coding DNA sequences, yet current approaches often conflate codon usage with amino acid semantics, limiting their ability to capture DNA-level biology. We introduce SynCodonLM, a codon language model that enforces a biologically grounded constraint: masked codons are only predicted from synonymous options, guided by the known protein sequence. This design disentangles codon-level from protein-level semantics, enabling the model to learn nucleotide-specific patterns. The constraint is implemented by masking non-synonymous codons from the prediction space prior to softmax. Unlike existing models, which cluster codons by amino acid identity, SynCodonLM clusters by nucleotide properties, revealing structure aligned with DNA-level biology. Furthermore, SynCodonLM outperforms existing models on six of seven benchmarks sensitive to DNA-level features, including messenger RNA and protein expression. Our approach advances domain-specific representation learning and opens avenues for sequence design in synthetic biology, as well as deeper insights into diverse bioprocesses.</span>
NR2F6缺失恢复CAR-T细胞功能并诱导实体瘤中的抗原非特异性免疫记忆空间集中腺嘌呤碱基编辑器有效纠正少突胶质细胞中的PLP1突变
Loading...