CVCDAP stands for Cancer Virtual Cohort Discovery Analysis Platform. It can help perform cohort-level rapid and effective re-discovery of integrated mulit-omics data for your research questions. This website is free and open to all users.
Xiaoqing Guan downloaded and processed the datasets, implemented the analysis scripts, and wrote the documents.
The project was designed and supervised by Dr. Jianmin Wu.
Somatic simple mutations and transcriptomic data (open access) were downloaded from PanCanAtlas (version 20190101). The TCGA PanCancer Atlas MC3 set consists of uniform re-calling results by the Multi-Center Mutation Calling in Multiple Cancers project to remove batch effects, was imported to enable robust cross-tumor-type genomic analyses. Batch-corrected mRNA expression levels (FPKM) was imported for unbiased gene expression analysis. We corrected gene symbols and imputed missing values by disease type, followed by merging and convertting FPKM to TPM. The log2(TPM+0.5) expression signal of each gene for individual sample were saved in CVCDAP.
Copy number data (thresholded GISTIC2 focal-level score) were downloaded from the Genomic Data Commons (GDC) Data Portal.
Proteomics data (open access) were download from the CPTAC data portal. Relative abundance of proteins generated by the CPTAC Common Data Analysis Pipeline, were included for preprocessing. After removing proteins which completeness less than 50% in individual study, we imputed values by individual study, merged and remained overlapped proteins, followed by quantile normalization.
Clinical data (survival time, tumor site, age, race, and grade) (open access) were downloaded from both PanCanAtlas and CPTAC for corresponding samples with molecular data, and only primary tumors were integrated.
CVCDAP integrated the Open Access Data downloaded from the NCI Genomic Data Commons Data Portal (GDC) and The Clinical Proteomic Tumor Analysis Consortium (CPTAC), and adheres all aspects of data access and usage policies of the original studies. CVCDAP users should strictly adhere to the policies of NIH Genomic Data Sharing (GDS) Policy for utilizing Mutation/RNA-seq/clinical data integrated in CVCDAP, and the CPTAC Data Use Agreement. for utilizing proteomic data integrated in CVCDAP.
The following R packages and softwares are used in CVCDAP analysis.
|Alboukadel Kassambara and Marcin Kosinski (2019). survminer: Drawing Survival Curves using 'ggplot2'. R package version 0.4.4. https://CRAN.R-project.org/package=survminer|
|Kevin Blighe (2019). EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling. R package version 1.0.1. https://github.com/kevinblighe/EnhancedVolcano|
|Raivo Kolde (2019). pheatmap: Pretty Heatmaps. R package version 1.0.12. https://CRAN.R-project.org/package=pheatmap|
|Mayakonda, A., et al., Maftools: efficient and
comprehensive analysis of somatic variants in cancer. Genome
Res, 2018. 28(11): p. 1747-1756.
R package version 1.8.0. https://bioconductor.org/packages/release/bioc/html/maftools.html
|H. Wickham (2016). ggplot2: Elegant Graphics for Data
R package version 3.1.1. https://cran.r-project.org/web/packages/ggplot2
|Rachel Rosenthal (2016). Rosenthal, R., et al.,
deconstructSigs: delineating mutational processes in single
tumors distinguishes DNA repair deficiencies and patterns of
carcinoma evolution. Genome Biology, 2016. 17(1): p. 31.
R package version 1.8.0. https://CRAN.R-project.org/package=deconstructSigs
|Jesse H. Krijthe (2015). Rtsne: T-Distributed
Stochastic Neighbor Embedding using a Barnes-Hut
R package version 0.15. https://github.com/jkrijthe/Rtsne
|Ritchie, M.E., et al., limma powers differential
expression analyses for RNA-sequencing and microarray studies.
Nucleic Acids Res, 2015. 43(7): p. e47.
R package version 3.38.3. https://bioconductor.org/packages/release/bioc/html/limma.html
|Therneau T (2015). A Package for Survival Analysis in
R package version 2.43.3. https://CRAN.R-project.org/package=survival
|Vincent Q. Vu (2011). ggbiplot: A ggplot2 based
R package version 0.55. http://github.com/vqv/ggbiplot
|Subramanian, A., et al., Gene set enrichment analysis:
a knowledge-based approach for interpreting genome-wide
expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43):
Software version 3.0. https://data.broadinstitute.org/gsea-msigdb/gsea/software/desktop/3.0/