Difference between revisions of "Statistics iRefIndex 9.0"

From irefindex
(Added complete file links.)
(Used proper links.)
 
Line 38: Line 38:
 
|}
 
|}
  
* Full list [[http://irefindex.uio.no/wikifiles//images/f/fa/iRefIndex9_taxonomy_summary.txt]]
+
Full list: [[File:iRefIndex9_taxonomy_summary.txt]]
  
 
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)===  
 
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)===  
Line 76: Line 76:
 
| 1392  || Bacillus anthracis              || 3090
 
| 1392  || Bacillus anthracis              || 3090
 
|}
 
|}
* Full list [[http://irefindex.uio.no/wikifiles//images/f/fa/iRefIndex9_taxonomy_summary_corrected.txt]]
+
 
 +
Full list: [[File:iRefIndex9_taxonomy_summary_corrected.txt]]
  
 
== Interactions ==
 
== Interactions ==

Latest revision as of 13:45, 16 February 2012

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

NCBI taxonomy identifier Scientific_name Number_of_interactions
559292 Saccharomyces cerevisiae S288c 216326
9606 Homo sapiens 145971
7227 Drosophila melanogaster 47389
4932 Saccharomyces cerevisiae 46966
40674 Mammalia 36307
10090 Mus musculus 21487
83333 Escherichia coli K-12 17673
4896 Schizosaccharomyces pombe 15493
6239 Caenorhabditis elegans 14020
197 Campylobacter jejuni 12028
3702 Arabidopsis thaliana 9911
10116 Rattus norvegicus 6917
562 Escherichia coli 5366
632 Yersinia pestis 3823
243276 Treponema pallidum subsp. pallidum str. Nichols 3642

Full list: File:iRefIndex9 taxonomy summary.txt

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

NCBI taxonomy identifier Scientific_name Number_of_interactions
559292 Saccharomyces cerevisiae S288c 225644
9606 Homo sapiens 156006
7227 Drosophila melanogaster 47388
10090 Mus musculus 18312
83333 Escherichia coli K-12 17408
284812 Schizosaccharomyces pombe 972h- 15693
6239 Caenorhabditis elegans 14020
197 Campylobacter jejuni 12028
3702 Arabidopsis thaliana 9911
10116 Rattus norvegicus 5383
155864 Escherichia coli O157 H7 str. EDL933 4953
632 Yersinia pestis 3823
243276 Treponema pallidum subsp. pallidum str. Nichols 3643
1148 Synechocystis sp. PCC 6803 3240
1392 Bacillus anthracis 3090

Full list: File:iRefIndex9 taxonomy summary corrected.txt

Interactions

BIND 62923
GRID 24380 277531
DIP 25785 39323 89715
INTACT 25260 38185 39031 156451
MINT 21992 41551 36676 47296 85755
HPRD 1949 8718 1124 5716 4482 40488
OPHID 2414 9264 1463 7578 6912 10286 47479
MPACT 6480 8513 7019 6231 6480 0 0 13331
MPPI 420 153 65 97 93 158 187 0 830
CORUM 263 199 116 248 119 246 237 0 15 2607
BIND_TRANSLATION 56109 24375 24813 24559 21875 2200 2722 6282 391 196 60227
INNATEDB 357 1310 409 811 654 1036 1222 0 52 82 419 7000
MATRIXDB 5 11 2 15 2 14 24 0 2 0 5 5 201
MPILIT 24 0 85 114 32 0 0 0 0 0 24 0 0 745
MPIIMEX 6 0 25 34 14 0 0 0 0 0 6 0 0 30 473
BIND GRID DIP INTACT MINT HPRD OPHID MPACT MPPI CORUM BIND_TRANSLATION INNATEDB MATRIXDB MPILIT MPIIMEX
(5070) (200591) (27502) (83929) (18731) (24349) (28544) (1120) (221) (1915) (3055) (4240) (156) (536) (399)

Interactors

BIND 40897
GRID 18036 34410
DIP 17437 18640 29961
INTACT 19107 23866 24585 53546
MINT 16751 18749 19671 25563 31615
HPRD 2920 5853 3489 6050 4539 9825
OPHID 3397 6004 4220 7030 5398 6228 9574
MPACT 4419 4625 4734 4936 4800 0 1 4979
MPPI 705 498 479 638 565 316 425 0 865
CORUM 2140 2449 2230 3239 2535 1859 2248 0 418 4365
BIND_TRANSLATION 35150 17491 16835 18553 16180 2957 3331 4014 687 2001 37247
INNATEDB 1685 2178 1813 2614 2137 1709 2112 0 359 1148 1687 3403
MATRIXDB 115 111 88 138 116 111 144 0 18 52 114 89 221
MPILIT 89 0 332 442 227 0 0 0 0 0 90 0 0 937
MPIIMEX 32 0 111 129 65 0 0 0 0 0 30 0 0 92 473
BIND GRID DIP INTACT MINT HPRD OPHID MPACT MPPI CORUM BIND_TRANSLATION INNATEDB MATRIXDB MPILIT MPIIMEX
(4387) (6793) (2208) (16016) (3969) (1708) (921) (10) (33) (494) (1401) (244) (34) (366) (282)

Summary of mapping interaction records to RIGs (Table 5)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91245(97.1136%) 62923(68.9605%)
grid 416648 411219 410641(99.8594%) 277531(67.5848%)
dip 90994 90994 89910(98.8087%) 89715(99.7831%)
intact 184959 183032 182359(99.6323%) 156451(85.7929%)
mint 122775 122775 122269(99.5879%) 85755(70.1363%)
HPRD 83022 83022 83022(100.0000%) 40488(48.7678%)
ophid 73257 73257 73160(99.8676%) 47479(64.8975%)
MPACT 16504 16504 16296(98.7397%) 13331(81.8054%)
MPPI 1814 1814 1699(93.6604%) 830(48.8523%)
CORUM 2844 2844 2844(100.0000%) 2607(91.6667%)
BIND_Translation 192923 87081 83347(95.7120%) 60227(72.2605%)
InnateDB 14729 11476 11248(98.0132%) 7000(62.2333%)
MatrixDB 846 349 321(91.9771%) 201(62.6168%)
mpilit 745 745 745(100.0000%) 745(100.0000%)
mpiimex 473 473 473(100.0000%) 473(100.0000%)
ALL 1396181 1179542 1169579(99.1554%) 545743(46.6615%)

Assignment of protein interactors to ROGs (Table 3)

Source Protein_Intractors Assigned % Arbitrary N_and_Y Unassigned Unique proteins
bind 285482 272457 95.4375 0 9077 3930 40897
BIND_Translation 264346 239976 90.7810 74 15390 8902 37247
CORUM 12916 12909 99.9458 7 0 0 4365
dip 30978 29436 95.0223 609 450 483 29961
grid 45569 37348 81.9592 7948 15 258 34410
HPRD 123812 103344 83.4685 20255 213 0 9825
InnateDB 27209 26914 98.9158 0 0 295 3403
intact 154359 151337 98.0422 36 2581 405 53546
MatrixDB 1123 1077 95.9038 0 0 46 221
mint 87509 83380 95.2816 51 3933 145 31615
MPACT 40349 40121 99.4349 0 1 227 4979
mpiimex 946 946 100.0000 0 0 0 473
mpilit 1490 1487 99.7987 3 0 0 937
MPPI 3628 3456 95.2591 0 42 130 865
ophid 146423 145149 99.1299 265 1003 6 9574
All 1226139 1149359 93.7381 29248 32705 14827 97139

ROG summary

Decimal_score Binary_flag String_score Score_class Proteins Percentage bind grid dip intact mint mpiimex mpilit HPRD ophid InnateDB MatrixDB MPACT BIND_Translation MPPI CORUM
786 000000001100010010 STO+ -1 8850 0.7218% 0 0 0 0 0 0 0 8850 0 0 0 0 0 0 0
1938 000000011110010010 STMOX+ -1 29 0.0024% 0 0 0 0 0 0 0 29 0 0 0 0 0 0 0
898 000000001110000010 SMO+ -1 21 0.0017% 0 0 0 17 0 0 0 4 0 0 0 0 0 0 0
131093 100000000000010101 PUTQ -1 5 0.0004% 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0
1922 000000011110000010 SMOX+ -1 2 0.0002% 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
914 000000001110010010 STMO+ -1 2 0.0002% 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0
163905 101000000001000001 PDYQ -1 2 0.0002% 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0
163921 101000000001010001 PTDYQ -1 1 0.0001% 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
218370 110101010100000010 SXLENQ+ -1 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 000000000000000001 P 1 745949 60.8372% 155411 31550 0 150559 49666 932 1400 0 124701 26914 828 0 188079 3021 12888
2 000000000000000010 S 1 36416 2.9700% 0 65 22378 13 267 0 0 13128 0 0 0 0 565 0 0
131201 100000000010000001 PMQ 1 24630 2.0087% 0 0 0 0 0 0 0 0 0 0 0 0 24630 0 0
554 000000001000101010 SVGO 1 17303 1.4112% 0 0 0 0 0 0 0 17303 0 0 0 0 0 0 0
8194 000010000000000010 SI 1 12319 1.0047% 12319 0 0 0 0 0 0 0 0 0 0 0 0 0 0
65 000000000001000001 PD 1 7080 0.5774% 7079 0 0 0 1 0 0 0 0 0 0 0 0 0 0
130 000000000010000010 SM 1 6593 0.5377% 0 0 0 0 0 0 0 6593 0 0 0 0 0 0 0
41 000000000000101001 PVG 1 2223 0.1813% 0 2223 0 0 0 0 0 0 0 0 0 0 0 0 0
42 000000000000101010 SVG 1 1108 0.0904% 0 0 122 0 0 0 0 986 0 0 0 0 0 0 0
129 000000000010000001 PM 1 714 0.0582% 468 0 0 77 0 0 0 0 0 0 137 0 0 32 0
139265 100010000000000001 PIQ 1 372 0.0303% 0 0 0 0 0 0 0 0 0 0 0 0 372 0 0
10 000000000000001010 SV 1 43 0.0035% 0 0 5 3 35 0 0 0 0 0 0 0 0 0 0
8193 000010000000000001 PI 1 35 0.0029% 0 0 0 27 8 0 0 0 0 0 0 0 0 0 0
66 000000000001000010 SD 1 22 0.0018% 0 4 0 0 0 0 0 0 0 0 18 0 0 0 0
9 000000000000001001 PV 1 5 0.0004% 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0
5 000000000000000101 PU 2 21909 1.7868% 0 0 0 289 253 9 7 0 20314 0 0 10 684 322 21
16386 000100000000000010 SE 2 4888 0.3986% 4888 0 0 0 0 0 0 0 0 0 0 0 0 0 0
770 000000001100000010 SO+ 2 3478 0.2837% 0 0 0 0 0 0 0 3478 0 0 0 0 0 0 0
147458 100100000000000010 SEQ 2 2242 0.1829% 0 0 0 4 0 0 0 0 0 0 0 0 2238 0 0
6 000000000000000110 SU 2 194 0.0158% 0 1 147 27 5 0 0 13 0 0 0 0 1 0 0
16385 000100000000000001 PE 2 156 0.0127% 0 0 0 147 9 0 0 0 0 0 0 0 0 0 0
147457 100100000000000001 PEQ 2 55 0.0045% 0 0 0 0 0 0 0 0 0 0 0 0 55 0 0
773 000000001100000101 PUO+ 2 21 0.0017% 0 0 0 8 2 0 0 0 11 0 0 0 0 0 0
1797 000000011100000101 PUOX+ 2 4 0.0003% 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
16514 000100000010000010 SME 2 3 0.0002% 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
774 000000001100000110 SUO+ 2 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
17 000000000000010001 PT 3 156757 12.7846% 87968 3505 0 19 32901 4 79 0 118 0 2 30590 1524 47 0
18 000000000000010010 ST 3 46525 3.7944% 0 0 6773 1 18 0 0 32718 0 0 0 6994 21 0 0
146 000000000010010010 STM 3 16664 1.3591% 0 0 0 0 0 0 0 16664 0 0 0 0 0 0 0
131217 100000000010010001 PTMQ 3 4257 0.3472% 0 0 0 0 0 0 0 0 0 0 0 0 4257 0 0
81 000000000001010001 PTD 3 2567 0.2094% 2472 0 0 3 1 0 0 0 0 0 91 0 0 0 0
8210 000010000000010010 STI 3 872 0.0711% 872 0 0 0 0 0 0 0 0 0 0 0 0 0 0
145 000000000010010001 PTM 3 171 0.0139% 137 0 0 0 0 0 0 0 0 0 0 0 0 34 0
163985 101000000010010001 PTMYQ 3 52 0.0042% 0 0 0 0 0 0 0 0 0 0 0 0 52 0 0
16530 000100000010010010 STME 3 13 0.0011% 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8209 000010000000010001 PTI 3 13 0.0011% 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0
82 000000000001010010 STD 3 10 0.0008% 0 0 0 0 9 0 0 0 0 0 1 0 0 0 0
139281 100010000000010001 PTIQ 3 7 0.0006% 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
16402 000100000000010010 STE 4 828 0.0675% 827 0 1 0 0 0 0 0 0 0 0 0 0 0 0
147474 100100000000010010 STEQ 4 411 0.0335% 0 0 0 2 0 0 0 0 0 0 0 0 409 0 0
22 000000000000010110 SUT 4 144 0.0117% 0 0 10 0 0 0 0 134 0 0 0 0 0 0 0
790 000000001100010110 SUTO+ 4 47 0.0038% 0 0 0 18 27 0 0 2 0 0 0 0 0 0 0
789 000000001100010101 PUTO+ 4 32 0.0026% 0 0 0 27 5 0 0 0 0 0 0 0 0 0 0
16401 000100000000010001 PTE 4 2 0.0002% 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
5378 000001010100000010 SXL+ 5 18721 1.5268% 0 0 0 14 1 0 0 18706 0 0 0 0 0 0 0
131073 100000000000000001 PQ 5 16324 1.3313% 0 0 0 6 0 0 0 0 0 0 0 0 16318 0 0
4393 000001000100101001 PVGL+ 5 7931 0.6468% 0 7931 0 0 0 0 0 0 0 0 0 0 0 0 0
810 000000001100101010 SVGO+ 5 3440 0.2806% 0 0 0 0 0 0 0 3440 0 0 0 0 0 0 0
21 000000000000010101 PUT 5 2721 0.2219% 0 0 0 15 168 1 1 0 5 0 0 2527 4 0 0
4394 000001000100101010 SVGL+ 5 1650 0.1346% 0 0 112 0 0 0 0 1538 0 0 0 0 0 0 0
131089 100000000000010001 PTQ 5 859 0.0701% 0 0 0 47 0 0 0 0 0 0 0 0 812 0 0
4354 000001000100000010 SL+ 5 493 0.0402% 0 17 474 2 0 0 0 0 0 0 0 0 0 0 0
4357 000001000100000101 PUL+ 5 241 0.0197% 0 0 0 0 0 0 3 0 222 0 0 0 9 0 7
4373 000001000100010101 PUTL+ 5 74 0.0060% 0 0 0 8 3 0 0 0 4 0 0 0 59 0 0
5381 000001010100000101 PUXL+ 5 55 0.0045% 0 0 0 11 5 0 0 0 39 0 0 0 0 0 0
5386 000001010100001010 SVXL+ 5 43 0.0035% 0 0 0 1 42 0 0 0 0 0 0 0 0 0 0
4374 000001000100010110 SUTL+ 5 30 0.0024% 0 0 17 0 0 0 0 7 0 0 0 0 6 0 0
4358 000001000100000110 SUL+ 5 6 0.0005% 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0
5382 000001010100000110 SUXL+ 5 4 0.0003% 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0
32769 001000000000000001 PY 6 16102 1.3132% 3687 12 0 1963 3392 0 0 0 750 0 0 0 6293 5 0
65601 010000000001000001 PDN 6 8727 0.7117% 52 0 0 2 247 0 0 0 253 0 0 0 8168 5 0
81922 010100000000000010 SEN 6 4421 0.3606% 4421 0 0 0 0 0 0 0 0 0 0 0 0 0 0
65537 010000000000000001 PN 6 970 0.0791% 35 0 190 299 256 0 0 179 0 0 0 0 0 11 0
32833 001000000001000001 PDY 6 773 0.0630% 773 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32770 001000000000000010 SY 6 427 0.0348% 0 3 258 92 28 0 0 0 0 0 0 0 46 0 0
163969 101000000010000001 PMYQ 6 402 0.0328% 0 0 0 0 0 0 0 0 0 0 0 0 402 0 0
212993 110100000000000001 PENQ 6 293 0.0239% 0 0 0 0 0 0 0 0 0 0 0 0 293 0 0
73729 010010000000000001 PIN 6 204 0.0166% 0 0 0 204 0 0 0 0 0 0 0 0 0 0 0
32785 001000000000010001 PTY 6 164 0.0134% 93 0 0 14 0 0 0 0 0 0 0 0 57 0 0
65553 010000000000010001 PTN 6 38 0.0031% 0 0 0 4 0 0 0 34 0 0 0 0 0 0 0
196609 110000000000000001 PNQ 6 31 0.0025% 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0
81921 010100000000000001 PEN 6 29 0.0024% 0 0 0 1 10 0 0 0 0 0 0 0 0 18 0
65617 010000000001010001 PTDN 6 23 0.0019% 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0
196625 110000000000010001 PTNQ 6 22 0.0018% 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0
81938 010100000000010010 STEN 6 14 0.0011% 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32786 001000000000010010 STY 6 3 0.0002% 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0
32897 001000000010000001 PMY 6 2 0.0002% 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
81986 010100000001000010 SDEN 6 1 0.0001% 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32913 001000000010010001 PTMY 6 1 0.0001% 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
163857 101000000000010001 PTYQ 6 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
40978 001010000000010010 STIY 6 1 0.0001% 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Scores (Table 2)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 19206(1.5856%)
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt. 13357(1.1027%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 33655(2.7784%)
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one) 29249(2.4147%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 53556(4.4214%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 45176(3.7296%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 14774(1.2197%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 33230(2.7434%)
I The protein reference used was an NCBI GenInfo Identifier (I). 13823(1.1412%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 25488(2.1042%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 242211(19.9961%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 33747(2.786%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 49967(4.1251%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 1023006(84.4559%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 188284(15.5441%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009) 17931(1.4803%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 18859(1.5569%)