Personal tools

Statistics iRefIndex 15.0

From irefindex
Jump to: navigation, search

Contents

Interactions available from major taxonomies (corrected)

Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequence record regardless of the taxon listed in the interaction record. See PMID 18823568 for details.

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 593178
559292 Saccharomyces cerevisiae S288C 125372
7227 Drosophila melanogaster 69867
10090 Mus musculus 61287
3702 Arabidopsis thaliana 49690
6239 Caenorhabditis elegans 16728
83333 Escherichia coli K-12 16605
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 11930
10116 Rattus norvegicus 11390
284812 Schizosaccharomyces pombe 972h- 9780
381518 Influenza A virus (A/Wilson-Smith/1933(H1N1)) 4973
632 Yersinia pestis 3947
243276 Treponema pallidum subsp. pallidum str. Nichols 3642
1111708 Synechocystis sp. PCC 6803 substr. Kazusa 3270

Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BHF_UCL 1931 1919 1919 100.00 1254 65.35
BIND 157736 91309 68064 74.54 49520 72.76
BIND_TRANSLATION 192923 84138 80606 95.80 60145 74.62
BIOGRID 1476473 664281 661220 99.54 477717 72.25
CORUM 3633 3633 3630 99.92 3372 92.89
DIP 81731 80134 79879 99.68 77459 96.97
HPIDB 2905 2783 2777 99.78 1526 54.95
HPRD 83022 83022 82983 99.95 40542 48.86
I2D_IMEX 892 891 891 100.00 434 48.71
INNATEDB 25927 25513 24971 97.88 18569 74.36
INTACT 515301 466222 466089 99.97 289572 62.13
INTCOMPLEX 1988 1746 1746 100.00 1712 98.05
MATRIXDB 26277 26256 26256 100.00 14629 55.72
MBINFO 542 522 522 100.00 331 63.41
MOLCON 377 375 375 100.00 212 56.53
MPACT 16504 16504 16373 99.21 13398 81.83
MPIDB 1505 1504 1425 94.75 893 62.67
MPPI 1814 1758 1578 89.76 776 49.18
QUICKGO 65026 53870 51762 96.09 26679 51.54
REACTOME 141996 141996 141844 99.89 130128 91.74
UNIPROTPP 10426 10341 10341 100.00 5856 56.63
VIRUSHOST 50146 50146 50127 99.96 45497 90.76
(All) 2859075 1808863 1775378 98.15 967265 54.48

Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BHF_UCL 5030 5030 100.00 0 0 0 0 1460
BIND 252251 215207 85.31 54 0 0 37044 30246
BIND_TRANSLATION 257681 250419 97.18 20546 0 0 7262 36228
BIOGRID 59042 58123 98.44 3199 0 0 919 57890
CORUM 15211 15208 99.98 0 0 0 3 5338
DIP 28066 27916 99.47 670 0 0 150 27169
HPIDB 7376 7370 99.92 0 0 0 6 2277
HPRD 123812 123812 100.00 13787 98296 121 0 9842
I2D_IMEX 1932 1932 100.00 0 0 0 0 448
INNATEDB 58745 58087 98.88 1 0 0 658 8627
INTACT 399851 399668 99.95 201 58 438 183 86241
INTCOMPLEX 6634 6634 100.00 0 0 0 0 3777
MATRIXDB 52533 52533 100.00 1 0 0 0 7068
MBINFO 1136 1136 100.00 0 0 0 0 274
MOLCON 862 862 100.00 0 0 0 0 275
MPACT 40349 40199 99.63 0 0 0 150 4995
MPIDB 3238 3090 95.43 0 0 0 148 930
MPPI 3568 3361 94.20 16 0 0 207 833
QUICKGO 118891 116735 98.19 0 0 0 2156 23728
REACTOME 283992 283839 99.95 640 0 0 153 5938
UNIPROTPP 25138 25138 100.00 1 0 0 0 5524
VIRUSHOST 100292 100273 99.98 4 0 0 19 10186
(All) 1845630 1796572 97.34 39120 98354 559 49058 135893

Mapping score summary

See below for definitions of the mapping score codes.

BHF_UCL BIND BIND_TRANSLATION BIOGRID CORUM DIP HPIDB HPRD I2D_IMEX INNATEDB INTACT INTCOMPLEX MATRIXDB MBINFO MOLCON MPACT MPIDB MPPI QUICKGO REACTOME UNIPROTPP VIRUSHOST
P 5023 194804 46262 15190 7366 1914 58084 398235 6608 52165 1136 862 3082 113957 270664 25105 98051
P+IN 386
P+N 48
PD 128348 6863 5 2994
PD+IN 1
PD+L 38
PD+LQ 10170
PD+XQ 26
PDQ 28443
PGD 641 1742 1
PGD+L 6273 3190 6
PGD+X 2
PT 2813 1 5 7 30579 2 2778 839
PTD 86728 2 2 44
PTD+LQ 4042
PTDQ 2488
PTGD 15
PTGD+L 23
PTM 3
PU 7 27 13 4 18 2 681 26 367 6 12535 32 81
PU+L 17 1 154 1 640 4
PU+O 44
PU+X 610 2 5
PUD 13 7 145
PUD+L 7 5 13
PUD+X 60 162
PUT 4 12 2527 1293
PUT+L 21 41 1
PUT+O 14
PUTD 4
PUTD+L 9 3
PV 9
S 2 45 14032 85 11
S+L 4 293 525
S+N 3
S+O 305
S+X 263
SD 5418 2474 1
SD+L 263 328
SD+N 121
SD+O 11600
SD+X 1267
SGD 580
SGD+L 2529
SGD+O 15580
ST 4890 105 2 7093
ST+L 30 3311
ST+O 859
STD 722 6736
STD+L 9 498
STD+O 29855
STGD 1628
STGD+L 6556
STGD+O 39958
SUD 62
SUD+L 45 25
SUD+O 5
SUD+X 569
SUTD 23
SUTD+L 30 15
SUTD+O 134


Mapping score code definitions

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record