category
bioRxiv
date
Mar 20, 2026
slug
status
Published
summary
1. 基于生成对抗网络(GAN)结构,结合多源公共数据集(460万单细胞数据和5900例癌症样本);2. 引入通路神经层,可提取MSigDB预定义通路或单细胞数据新通路活性;3. 通过一次训练实现四种功能:患者分层、疾病标志物分析、小样本伪数据生成、跨数据集特征向量化;4. 特别优化小样本场景下的分析能力。
tags
单细胞测序
测序技术
type
Post

📄 原文题目

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

🔗 原文链接

💡 AI 核心解读

1. 基于生成对抗网络(GAN)结构,结合多源公共数据集(460万单细胞数据和5900例癌症样本);2. 引入通路神经层,可提取MSigDB预定义通路或单细胞数据新通路活性;3. 通过一次训练实现四种功能:患者分层、疾病标志物分析、小样本伪数据生成、跨数据集特征向量化;4. 特别优化小样本场景下的分析能力。

📝 英文原版摘要

The advent of artificial intelligence (AI) has brought revolutionary tools for biomedical transcriptomic (RNA-level) research. However, there are persistent constraints including limited interpretations with biomedical concepts such as functional pathways, small sample sizes and substantial time and computing power requirements for AI training. To overcome these limitations, we developed RNAGAN (https://github.com/ZhaozhengHou-HKU/RNAGAN-1.0.git), an AI tool with a generative adversarial network (GAN) structure with the objective of enhancing transcriptomic analysis. The network was established based on public human datasets comprising 4.6 million single cells from multiple organs and 5,900 sequenced samples of various cancer types with normal references. A specialized pathway neural layer was embedded to extract activities of predefined pathways from the Human Molecular Signatures Database (MSigDB), or newly learned pathways from single-cell data. The structure of RNAGAN (generator and discriminator) enables four applications after one shared training procedure: 1. single-cell and bulk-level patient stratification or differential diagnosis; 2. analysis of the gene and pathway markers in a selected disease; 3. pseudo data generation when sample size is limited for downstream analysis; 4. vectorization with gene and pathway-level features learned from multiple data sets. RNGAN contributes to the efficient utilization of limited data for transcriptomic studies.
木本油料作物沙卡因奇的染色体规模基因组揭示了α-亚麻酸生物合成及种子中三酰甘油积累的分子基础TriGraphQA:一种用于蛋白质复合物模型质量评估的三元图学习框架
Loading...