category
bioRxiv
date
Mar 20, 2026
slug
status
Published
summary
提出双编码器模型CLIPepPI,通过对比学习直接从序列嵌入结构域和肽的共享空间;采用参数高效的LoRA适配器微调策略;结合结构信息标记接口残基;在三个独立基准测试中表现优异,并实现全基因组尺度的NES扫描和变体效应预测。
tags
蛋白质组学
type
Post
📄 原文题目
CliPepPI: Scalable prediction of domain-peptide specificityusing contrastive learning
🔗 原文链接
💡 AI 核心解读
提出双编码器模型CLIPepPI,通过对比学习直接从序列嵌入结构域和肽的共享空间;采用参数高效的LoRA适配器微调策略;结合结构信息标记接口残基;在三个独立基准测试中表现优异,并实现全基因组尺度的NES扫描和变体效应预测。
📝 英文原版摘要
Domain-peptide interactions mediate a significant fraction of cellular protein networks, yet accurately predicting their specificity remains challenging. Peptide motifs typically have short, fuzzy sequence profiles, and their interactions are often weak and transient, limiting the size, coverage, and quality of experimentally validated domain-peptide datasets. Since true non-binders are rarely known, constructing negative examples often introduces bias. While structure-based prediction methods can achieve high accuracy, they are computationally demanding and difficult to scale to the proteome level. We introduce CLIPepPI, a dual-encoder model that leverages contrastive learning to embed domains and peptides into a shared space directly from sequence. Both encoders are initialized from a protein language model (ESM-C) and fine-tuned using lightweight LoRA adapters, enabling parameter-efficient training on positive pairs alone. To overcome data scarcity, we augment ~3K protein-peptide complexes from PPI3D with ~150K domain-peptide pairs derived from protein-protein interfaces. CLIPepPI further injects structural information by marking interface residues in the domain sequence, thus guiding the encoders toward binding regions and linking sequence-level learning with structural context. Competitive performance is achieved across three independent benchmarks: domain-peptide complexes from PPI3D, large-scale phage-library data from ProP-PD, and a curated dataset of nuclear export signal (NES) sequences. We demonstrate scalability and generalization through two applications: (i) proteome-wide NES scanning, and (ii) variant-effect prediction, where score changes in domain-peptide interactions between wild-type and mutant sequences discriminate pathogenic from benign variants. T
ogether, CLIPepPI offers a scalable, structure-informed model for predicting domain-peptide specificity and generating meaningful embeddings suited for large-scale proteomic analyses. CLIPepPI is available at: https://bio3d.cs.huji.ac.il/webserver/clipeppi/.
- 作者:NotionNext
- 链接:https://tangly1024.com/article/32948bd6-1f96-81f6-b23e-f49a5c810331
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
相关文章
