CVCDAP stands for Cancer Virtual Cohort Discovery Analysis Platform. It can help perform cohort-level rapid and effective re-discovery of integrated mulit-omics data for your research questions. This website is free and open to all users.


Guan X, Cai M, Du Y, Yang E, Ji J, & Wu J (2020) CVCDAP: an integrated platform for molecular and clinical analysis of cancer virtual cohorts. Nucleic Acids Res. 48(W1):W463-W471. [Full text]


Xiaoqing Guan downloaded and processed the datasets, implemented the analysis scripts, and wrote the documents.

Meng Cai and Yang Du developed the web server and upload your project function.

The project was designed and supervised by Dr. Jianmin Wu

Release 2019v4

  • Data Sources & Preprocessing

  • Somatic simple mutations and transcriptomic data (open access) were downloaded from PanCanAtlas (version 20190101). The TCGA PanCancer Atlas MC3 set consists of uniform re-calling results by the Multi-Center Mutation Calling in Multiple Cancers project to remove batch effects, was imported to enable robust cross-tumor-type genomic analyses. Batch-corrected mRNA expression levels (FPKM) was imported for unbiased gene expression analysis. We corrected gene symbols and imputed missing values by disease type, followed by merging and convertting FPKM to TPM. The log2(TPM+0.5) expression signal of each gene for individual sample were saved in CVCDAP.

    Copy number data (thresholded GISTIC2 focal-level score) were downloaded from the Genomic Data Commons (GDC) Data Portal.

    Proteomics data (open access) were download from the CPTAC data portal. Relative abundance of proteins generated by the CPTAC Common Data Analysis Pipeline, were included for preprocessing. After removing proteins which completeness less than 50% in individual study, we imputed values by individual study, merged and remained overlapped proteins, followed by quantile normalization.

    Clinical data (survival time, tumor site, age, race, and grade) (open access) were downloaded from both PanCanAtlas and CPTAC for corresponding samples with molecular data, and only primary tumors were integrated.

  • Data Stats

  • Download table for abbreviations of ProjectID

  • Data Policy

  • CVCDAP integrated the Open Access Data downloaded from the NCI Genomic Data Commons Data Portal (GDC) and The Clinical Proteomic Tumor Analysis Consortium (CPTAC), and adheres all aspects of data access and usage policies of the original studies. CVCDAP users should strictly adhere to the policies of NIH Genomic Data Sharing (GDS) Policy for utilizing Mutation/RNA-seq/clinical data integrated in CVCDAP, and the CPTAC Data Use Agreement. for utilizing proteomic data integrated in CVCDAP.


The following R packages and softwares are used in CVCDAP analysis.

Alboukadel Kassambara and Marcin Kosinski (2019). survminer: Drawing Survival Curves using 'ggplot2'.
R package version 0.4.4.
Kevin Blighe (2019). EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling.
R package version 1.0.1.
Raivo Kolde (2019). pheatmap: Pretty Heatmaps.
R package version 1.0.12.
Mayakonda, A., et al., Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res, 2018. 28(11): p. 1747-1756.
R package version 1.8.0.
H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis.
R package version 3.1.1.
Rachel Rosenthal (2016). Rosenthal, R., et al., deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biology, 2016. 17(1): p. 31.
R package version 1.8.0.
Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation.
R package version 0.15.
Ritchie, M.E., et al., limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res, 2015. 43(7): p. e47.
R package version 3.38.3.
Therneau T (2015). A Package for Survival Analysis in S.
R package version 2.43.3.
Vincent Q. Vu (2011). ggbiplot: A ggplot2 based biplot.
R package version 0.55.
Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15545-50.
Software version 3.0.