Use this help guide to understand the data available.
Table of Contents
From the homepage the user may retrieve the information from the database in several ways:
The user may give a single gene identifier or a list of gene identifiers, to be chosen among five possibilities:
The user may choose to retrieve the list of driver genes from the publications of cancer screenings or healthy tissue screenings by clicking on the corresponding numbers on the home page.
In addition, a list of driver genes can be derived by cancer type or healthy tissue by clicking on the corresponding numbers on the home page.
The results page contains 13 sections for each gene:
This section includes the general information about the queried gene: symbol, description and links to external databases, such as Entrez, COSMIC, OMIM, RefSeq, Ensembl.
The button Details opens a new page, containing driver support, list of cancer and non-cancer screenings in which the gene has been reported as a driver and damaging alterations in TCGA samples and cancer cell lines.
Gene Duplication is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences directly to the human genome, using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes which do not have any additional hit above 60% of the query length.
The Gene Duplication page describes all the duplicated loci related to the queried gene.
The appearance of a gene is defined as the deepest taxonomic branch of the tree of life where an ortholog can be detected. In order to retrieve orthology relationships eggNOG 5.0 (Huerta-Cepas et al., 2016) is used.
Seven branches of the tree of life are defined:
The button details opens a new page, which describes all the orthology relationships of the gene of interest in detail.
This section provides information on the protein protein interactions as well as participation in protein complexes of the protein encoded by the gene of interest. The button details opens a new page which describes all the network properties and complexes in detail.
The network properties are derived from five databases of Protein-protein interaction networks:
Dataset | Version | Nodes | Interactions | Publications |
---|---|---|---|---|
BioGRID | 3.5.185 | 16,638 | 358,323 | 28,962 |
IntAct | 4.2.14 | 14,827 | 110,899 | 7,434 |
DIP | February 5th 2018 | 2,863 | 4,439 | 1,909 |
HPRD | 9 | 9,407 | 36,724 | 18,692 |
Bioplex | 3.0 | 14,277 | 163,336 | 1 |
Total | 17,883 | 542,397 | 41,246 |
The complex interactions are derived from three databases:
The number of miRNAs regulating the gene is reported.
The button details opens a new page which shows the graphical representation of all the miRNAs regulating the queried gene.
The number of functional pathways in which the gene is involved is shown.
The details button opens a new page, which describes gene functions along with pathway ids and terms.
The number of Normal Tissues where the gene is expressed is shown.
The details button opens a new page, which shows further information on the expression of the gene across normal tissues.
The number of Cancer Cell Lines where the gene is expressed is shown.
The details button opens a new page, which shows further information on the expression of the gene across cancer cell lines.
The number of Normal Tissues where the protein is expressed is shown.
The details button opens a new page, which shows further information on the expression of the protein across normal tissues.
In this section, the number of human cell lines in which the gene has been found essential is reported. The button details opens a new page, which describes all the info related to the queried gene.
Essentiality is derived from two databases including 9 datasets:
Dataset | CRISPR Cas9 / RNAi | Version | Genes | Cell lines |
---|---|---|---|---|
Achilles | CRISPR Cas9 | DepMap 20Q2 | 18,070 | 769 |
GeCKO | CRISPR Cas9 | DepMap 19Q1 | 18,377 | 43 |
Sanger Project SCORE | CRISPR Cas9 | DepMap August 2020 | 17,752 | 317 |
Achilles | RNAi | DepMap 2.20.2 | 15,951 | 501 |
DRIVE | RNAi | DepMap DEMETER2 Data v6 | 7,531 | 397 |
PICKLES Wang | CRISPR Cas9 | September 2020 | 17,955 | 18 |
PICKLES shRNA | RNAi | September 2020 | 13,172 | 100 |
PICKLES Tzelepis | CRISPR Cas9 | September 2020 | 17,818 | 5 |
PICKLES TKOV1 | CRISPR Cas9 | September 2020 | 17,050 | 9 |
Total | 19,013 | 1,122 |
In this section, the frequency of germline loss-of-function variants, damaging SNVs/indels and structural variants compared to the rest of human genes is shown.
The details button opens a new page, which gives a definition of a LOEUF score, its value for the queried gene and median across all human genes, number of damaging SNVs/indels and structural variants normalized by gene length split across alteration types with a median across all human genes.
This section provides information on the drugs targeting the gene and on the involvement of the gene as a biomarker of response or resistance in cancer cell lines and clinical trials.
The details button opens a new page that lists all the drugs and the associations between drugs, gene and response in the cell lines and clinical settings.
This page provides support on the cancer driver type (canonical/candidate), the screenings in which the gene has been reported as a driver and damaging alterations in both TCGA samples and cancer cell lines.
The Canonical/candidate support tab provides information on the cancer driver type of the queried gene (canonical or candidate cancer driver). For candidates, it also reports the predicted driver mode of action (putative oncogene, tumour suppressor or unclassified mode of action) based on the prevalence of gain-of-function or loss-of-function alterations in TCGA samples.
The Cancer/healthy drivers screens tab provides information on the screenings where the queried gene was identified as a driver (number of publications, methods, cancer types/healthy tissues, primary/organ sites). Clicking on the screening leads to a detailed table:
The first column, Type of Screening, describes the type of screening in which the gene is reported. This can be one of the following:
The second, third and fourth columns describe the Organ system, Primary site/Organ site and Cancer type/Healthy tissue where the gene is reported as a driver.
The fifth column, Method, describes the method used in the original screening to determine that the gene is a driver. This can be one of the following:
The sixth column, Driver type, provides whether a driver is found within a coding or a non-coding sequence.
The last column, Screening, provides a link to the screening where the gene was reported as a driver.
The Damaging alterations tab contains links to damaging alterations in TCGA samples and cancer cell lines. Clicking on the Damaging alterations links leads to a detailed table:
The first, second and third columns list the Organ system, Primary site and Cancer type where the gene is found damaged, respectively.
The fourth column, TCGA samples (n)/Cell lines (n), reports the number of samples or cell lines where the gene is found damaged.
The final column, Damaging alterations (n), provides the number and type of damaging alterations that were found in cancer samples or cell lines for the queried gene.
Gene Duplication is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences (RefSeq v99; O'Leary et al., 2016) directly to the human genome (hg38), using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes, which do not have any additional hit above 60% of the query length.
Three types of Hit are defined, depending on the genomic location of the duplicated locus:
The default cutoff to display genomic hits is 60% of the original length, but the user is allowed to choose different cutoffs from the dropdown box above the table. The range of choice varies from 10% of the query length to 100%.
The orthology relationships are derived from eggNOG 5.0 (Huerta-Cepas et al., 2018).
The Tree of Life provides a visualization of the origin and the orthologs of the gene of interest. The origin of the gene is represented by red color.
The tables describe all the species and the corresponding orthologs. In case the node has further branching with orthologus genes, the species from the lower nodes are also shown.
Three parameters describe the position of a protein in the protein protein interaction network:
On the top, the first-level network for the protein encoded by the gene of interest (which is in the centre of the image) is displayed. In the bottom, the interaction partners of the protein of interest can be filtered by different properties (for more help on those properties, please check the respective section of the help page). Proteins are colour coded according to their driver status. The network is constructed and displayed using the R shiny and igraph packages.
The table lists the protein-protein interaction network degree, betweenness and clustering coefficient for interactors of the protein of interest. In addition, the Pubmed IDs of the original publication(s) supporting the interaction are listed.
The table lists the complexes which the protein of interest is a part of.
The network of miRNA-target interaction is composed of cancer genes and the miRNAs targeting them. The network displays only interactions that are supported by experimental validations. The miRNA data are derived from miRecords v.4.0, Xiao F et al., 2009 and miRTarBase v.8.0 (Huang et al., 2020).
The interaction network of the gene of interest with miRNAs is shown. The gene is in the middle of the network and colour coded according to its driver status.
The Table includes all miRNAs and target genes visualized in the network. Each row provides information on the the target gene: involvement in cancer, evolutionary origin, and duplicability. The Pubmed IDs column contains the links to the publication supporting the interaction, while the last column describes the methods employed to experimentally validate the interaction.
This table provides information on the essentiality of the gene for cell survival. It lists whether the gene has been found essential or not in the respective human cell line and tissue in nine screens obtained from the DepMap and PICKLES databases.
The user may give a single gene identifier or a list of gene identifiers, to be chosen among four possibilities:
This page displays functional information from KEGG v94.1 and Reactome v72 for the gene of interest.
This page displays functional information from KEGG v94.1 and Reactome v72 for the gene of interest.
KEGG is a three-level hierarchical database of biological pathways. This table lists the lowest-level pathways to which the gene belongs ('Description'); links to the corresponding pathway maps ('ID'); and higher-level functional information ('Level 1', 'Level 2').
Reactome is a multi-level hierarchical database of biological pathways. This table lists the pathways at level two or greater to which the gene belongs ('Description'); the levels of these pathways ('Level'); links to the relevant section of the Pathway Browser ('ID'); and the corresponding level one pathways ('Level 1').
Expression levels are derived from two sources:
The results are plotted separately for each experiment.
The Human Protein Atlas version v19.3 provides protein expression data from immunohistochemistry assays in normal tissue samples. Expression is reported as Not detected, Low, Medium, or High in 45 tissue types
The results are shown in a column chart.
Expression levels in cancer cell lines are derived from three sources:
The results are shown separately for each experiment.
Germline variation data falls into three categories:
We report the drugs that target the gene and the associations between drugs, gene and response in cell lines and clinical settings.
We report cancer drivers interacting with the tumour immune microenvironment with literature, experimental or in silico support.
This section allows users to download information from the NCG database. There are four files available to download:
The first downloadable file is a list of cancer drivers, their annotation and supporting literature. There is one row per gene-screen pair. The columns contain:
The second downloadable file is a list of healthy drivers, their annotation and supporting literature. There is one row per gene-screen pair. The columns contain:
The third downloadable file is a list of cancer drivers and their systems-level properties. There is one row per gene. The columns contain:
The fourth downloadable file is is a list of healthy drivers and their systems-level properties.There is one row per gene. The columns contain: