category
NAR
date
Feb 25, 2026
slug
status
Published
summary
提出SynCodonLM模型,通过同义密码子约束掩码机制分离密码子层级与蛋白质层级语义,基于核酸特性聚类密码子,显著提升DNA特征相关基准测试表现,推动合成生物学中的序列设计与生物过程研究。
tags
合成生物学
蛋白质组学
蛋白质进化
type
Post
📄 原文题目
Advancing codon language modeling with synonymous codon constrained masking
🔗 原文链接
💡 AI 核心解读
提出SynCodonLM模型,通过同义密码子约束掩码机制分离密码子层级与蛋白质层级语义,基于核酸特性聚类密码子,显著提升DNA特征相关基准测试表现,推动合成生物学中的序列设计与生物过程研究。
📝 英文原版摘要
<span class="paragraphSection"><div class="boxTitle">Abstract</div>Codon language models offer a promising framework for modeling protein-coding DNA sequences, yet current approaches often conflate codon usage with amino acid semantics, limiting their ability to capture DNA-level biology. We introduce SynCodonLM, a codon language model that enforces a biologically grounded constraint: masked codons are only predicted from synonymous options, guided by the known protein sequence. This design disentangles codon-level from protein-level semantics, enabling the model to learn nucleotide-specific patterns. The constraint is implemented by masking non-synonymous codons from the prediction space prior to softmax. Unlike existing models, which cluster codons by amino acid identity, SynCodonLM clusters by nucleotide properties, revealing structure aligned with DNA-level biology. Furthermore, SynCodonLM outperforms existing models on six of seven benchmarks sensitive to DNA-level features, including messenger RNA and protein expression. Our approach advances domain-specific representation learning and opens avenues for sequence design in synthetic biology, as well as deeper insights into diverse bioprocesses.</span>
- 作者:NotionNext
- 链接:https://tangly1024.com/article/31448bd6-1f96-8198-b685-f04559534415
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
相关文章
