Current Release (September 26th, 2020)

Protein-Protein Interactions

  PPIs Stats

Organisms Binary Interactions* Complexes**
Homo sapiens 439,714    15,252
Saccharomyces cerevisiae 128,319    6,302
Caenorhabditis elegans 22,305    105
Drosophila melanogaster 57,578    810
Mus musculus 57,669     1,304
Rattus norvegicus 5,796     307
Arabidopsis thaliana 56,282     431

* Number of interactions: The number is the sum of self-interaction and binary interaction that all participating proteins have UniProt Accession Number.
** Number of complexes: The number is the sum of complexes that all participating proteins have UniProt Accession Number.

  Data Sources

Original Database Version
IntAct version 4.2.15
BioGRID version 3.5.185
MINT May 21, 2020
DIP version 20170205
HPRD release 9

We identified the same interaction records in the different databases to build a non-redundant dataset.
We also utilized BioMart and UniProt to annotate each protein with the same high-quality information because some of the original records have limited annotation.

Cancer Data

  Data Sources & Preprocessing

  • Tumor type-specific cancer driver genes were from a recent TCGA Pan-cancer analysis of 9,423 tumor exomes.
  • Targets of therapeutic compounds were downloaded from the Genomics of Drug Sensitivity in Cancer (GDSC).
  • Cancer transcriptome profiles were downloaded from the Genomic Data Commons (GDC) portal of TCGA (version 20190101). The batch-corrected and upper quartile normalized RSEM measurements were log2 transformed for mRNA expression analysis.
  • Cancer proteome data were downloaded from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data portal (version 20200511). The relative abundance of proteins generated by the Common Data Analysis Pipeline (CDAP) was subjected to quantile normalization using the normalizeQuantiles function implemented in R package limma v3.36.1.
  • Both mRNA and protein expression datasets were further filtered by removing genes with zero or NA values in more than 80% of samples.
  • Clinical data (survival time, tumor site, age, ethnicity, and grade) were downloaded from both GDC and CPTAC for corresponding samples with molecular data.

  Data Policy

We adhere all aspects of data access and usage policies of the original studies. PINA users should strictly adhere to the policies of the NIH Genomic Data Sharing (GDS) Policy for utilizing RNA-seq/clinical data integrated into PINA, and the CPTAC Data Use Agreement for utilizing proteomic data integrated into PINA.

  Data Stats

mRNA expression datasets:
Dataset name Cancer name No. of patients* No. of genes
TCGA-ACC Adrenocortical carcinoma 79 18,136
TCGA-BLCA Bladder Urothelial Carcinoma 408 18,558
TCGA-BRCA Breast invasive carcinoma 1,095 18,563
TCGA-CESC Cervical squamous cell carcinoma and endocervical adenocarcinoma 304 18,491
TCGA-CHOL Cholangiocarcinoma 36 18,377
TCGA-COAD Colon adenocarcinoma 451 18,039
TCGA-DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma 48 18,158
TCGA-ESCA Esophageal carcinoma 184 18,963
TCGA-GBM Glioblastoma multiforme 154 18,473
TCGA-HNSC Head and Neck squamous cell carcinoma 520 18,671
TCGA-KICH Kidney Chromophobe 66 18,152
TCGA-KIRC Kidney renal clear cell carcinoma 533 18,588
TCGA-KIRP Kidney renal papillary cell carcinoma 290 18,331
TCGA-LAML Acute Myeloid Leukemia 173 16,731
TCGA-LIHC Liver hepatocellular carcinoma 371 18,419
TCGA-LUAD Lung adenocarcinoma 515 18,620
TCGA-LGG Brain Lower Grade Glioma 516 18,586
TCGA-LUSC Lung squamous cell carcinoma 501 18,777
TCGA-MESO Mesothelioma 87 18,488
TCGA-OV Ovarian serous cystadenocarcinoma 304 18,950
TCGA-PAAD Pancreatic adenocarcinoma 178 18,709
TCGA-PCPG Pheochromocytoma and Paraganglioma 179 18,318
TCGA-PRAD Prostate adenocarcinoma 497 18,710
TCGA-READ Rectum adenocarcinoma 160 18,040
TCGA-SARC Sarcoma 259 18,582
TCGA-SKCM Skin Cutaneous Melanoma 103 18,422
TCGA-STAD Stomach adenocarcinoma 415 18,972
TCGA-TGCT Testicular Germ Cell Tumors 150 19,270
TCGA-THCA Thyroid carcinoma 505 18,307
TCGA-THYM Thymoma 120 18,561
TCGA-UCEC Uterine Corpus Endometrial Carcinoma 532 17,629
TCGA-UCS Uterine Carcinosarcoma 57 18,918
TCGA-UVM Uveal Melanoma 80 17,679
Protein expression datasets:
Dataset name Cancer name No. of patients* No. of proteins
CPTAC-CCRCC Clear Cell Renal Cell Carcinoma 110 9,445
CPTAC-COAD Colon Adenocarcinoma 97 7,057
CPTAC-EC Endometrial Carcinoma 100 10,418
CPTAC-GC Gastric Cancer 80 8,732
CPTAC-HCC Hepatocellular Carcinoma 159 9,682
CPTAC-LUAD Lung Adenocarcinoma 111 10,546
TCGA-BRCA Breast Invasive Carcinoma 105 9,747
TCGA-OV Ovarian Serous Cystadenocarcinoma 174 7,703

* Only primary tumors were included.