Difference between revisions of "Sources and Issues Next Release"

From irefindex
 
(61 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<pre>
+
{{Note|
 
This is a planning template for the next release.  It does not correspond to a released product.
 
This is a planning template for the next release.  It does not correspond to a released product.
See http://irefindex.uio.no/ for the most recent release and related documentation.
+
See http://irefindex.org/ for the most recent release and related documentation.
 
This page can be used to create the sources page.   
 
This page can be used to create the sources page.   
Check for xxx before cut and paste to the appropriate sources page for the new release.  
+
Check for xxx before copying and pasting to the appropriate sources page for the new release.  
 
Do not edit xxx in this page.  Leave this page as a template.
 
Do not edit xxx in this page.  Leave this page as a template.
</pre>
+
After making a new release page, update the general [[Sources_iRefIndex|Sources for iRefIndex]] redirect page.
 +
}}
  
 
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}
 
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}
Line 13: Line 14:
 
Release date:  xxx
 
Release date:  xxx
  
Authors: Ian Donaldson, Sabry Razick and Paul Boddie
+
Authors: Ian Donaldson
  
Database: iRefIndex (http://irefindex.uio.no)
+
Database: iRefIndex (http://irefindex.org)
  
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
+
Organization: http://irefindex.org
  
 
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex.
 
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex.
Line 25: Line 26:
  
 
== Issues ==
 
== Issues ==
'''Hard Release date: July 1st.'''
 
  
'''Yeast taxon id changes'''
+
===Deprecated taxids appear in export for iRefWeb===
See http://www.uniprot.org/news/2011/05/03/release
 
  
'''New databases'''
+
See list from Yuri.  Examples are 273, 510, 515, 591, 592, 601, 602, 677, 887, 1139, 1156, 1312...133899, 137208, 144556, 150147, 160268, 163106, 163653, 196590, 216593.
To Be Discussed
 
  
'''BioGrid interaction record ids (pre-build issue)'''
+
These appear in the 'interactor' and  'interaction_interactor_assignment' , but are not in new taxonomy  'taxonomy_scientific' and  'names" tables.  A random selection of these do not appear in the mitab files so the fault likely lies in the export script for irfweb.
To Be Done
 
  
Capture Biogrid interaction record ids so iRefWeb can link out to BioGrid.
+
=== BioGRID interaction record ids (pre-build issue) ===
  
'''RIGID recalculation (pre-build issue)'''
+
Capture BioGRID interaction record ids so iRefWeb can link out to BioGRID.
  
See bug 242. Modify existing RIGID table or loose continuity of iRIGIDs with last release.
+
The only interaction id available from the BioGRID files are already being used and also there in the iRefWeb, such as...
  
'''Taxon specific MITAB files (post-processing issue)'''
+
<primaryRef db="grid" id="103" refType="identity" refTypeAc="MI:0356" dbAc="MI:0463" />
  
Taxon specific files should contain interactions ONLY if one or both taxa, taxb have the appropriate taxon (regardless of what the source database said the interaction taxon was.  Change README.
+
See [[Bugzilla:250]].
Example see PMID http://wodaklab.org/iRefWeb/pubReport/detail?pubmed=12565857+
 
A "mouse" interaction from HPRD lists only human interactors (the paper is about mouse and they have made a transfer to human without noting what they have done.)  As a result, this human interaction ends up in the mouse MITAB (because HPRD says it was mouse).  BioGRID correctly curates the paper as about mouse.
 
  
'''CORUM methods (code change implemented)'''
+
=== MITAB/iRefScape canonicalization ===
  
Ensure that all CORUM methods (with MI terms) are parsed.
+
Change this to choose canonical sequence rather than longest sequence (mapping score L).
 +
Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.
 +
 
 +
See [[Bugzilla:255]].
 +
 
 +
 
 +
 
 +
===PDB identifiers===
 +
 
 +
In previous releases we have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) .  In this release, this is only done when there are multiple database:accession entries in a field otherwise the <tt>|</tt>) character is maintained as part of the PDB identifier.  This is a regression and will be corrected in a future release.
 +
 
 +
===IMEX identifiers===
 +
IMEx identifiers should be present in column 52 but appear to be missing.  This is a regression and will be corrected in a future release.  There are 6004 lines in release 10 with imex:...  This number needs to be cross-checked before the next release.  This is still an issue as of release 13.
 +
 
 +
===Compatibility with Java PSI parser needs to be improved===
 +
Java parser from psimi https://code.google.com/p/psimi/downloads/detail?name=psimitab-1.8.3-distribution.zip. But there are at least a few examples where the files don't follow the specs:
 +
 
 +
-reserved characters are not quoted.
  
'''Repeated lines (post-processing issue)'''
+
        Like for instance in file for human:
  
There are multiple lines that are repeated many times.  These appear to arise from BIND 3DBP division (see for example lines 5,13,117,125 in Ecoli MITAB and others arising from BIND ID 92720 - 44 pieces of experimental evidence and 5 PMIDs) because the accessions for the different experimental forms are not present in MITAB.  See Antonio and bug# 245. Could be handled as a post-processing step on MITAB to take the unique set of all MITAB lines.
+
        taxid:11706(HIV-1 M:B_HXB2R)
 +
        taxid:10299(Herpes simplex virus (type 1 / strain 17))
 +
        go:GO:0005783|rigid:d//bz+DaMrbuxGA3i1Xe4hqlrXI|edgetype:X
  
'''MITAB/irefscape canonicalization (post-processing issue)'''
+
        In case of controlled terms to be standard conform it should look like this:
  
Change this to choose canonical sequence rather than longest sequence (mapping score L).
+
        psi-mi:"MI:0496"(bait)
Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.
+
 
 +
-empty columns need to be consistently filled with '-' . For example, column 15 in the human file.
 +
 
 +
-dates should be represented as yyyy/mm/dd but look like yyyy-mm-dd
 +
Thanks to Thomas Schmitt for pointing out these problems
 +
 
 +
 
 +
===Various issues reported by Andrei Turinsky===
 +
 
 +
There are a few remaining issues, as follows:
 +
 
 +
6630 interactors have obsolete Entrez Gene IDs shown (of which 3449 in Human).
 +
 
 +
2696 RefSeq IDs (of which 935 are Human) have no Entrez ID shown in the MITAB, but such ID is actually known from NCBI maps - not sure whether your canonicalization process may recover these IDs. Only one such case remains for UniProt IDs (uniprotkb:P0CE96 a.k.a. YL156_YEAST actually has known genes 850851, 850856, 850858 but none of these are shown) - so the rest of UniProts have been resolved, which is great.
 +
 
 +
 
 +
A minor thing: 6 interactors are shown with either two different taxons due to different strains of either E.coli or yeast.
 +
 
 +
A minor thing: 5 interactors are sometimes shown with no taxon at all (4 human and 1 from A. fulgidus).  
 +
 
 +
A noticeable percentage of PubMeds have been lost for some of the source DBs, with InnateDB and DIP having lost hundreds of Pubs. Human annotations especially affected: e.g. Human DIP lost 266 pubmeds, or 16%; Human InnateDB lost 347 pubs, or 13%).
 +
 
 +
There are 17 obsolete PSI-MI IDs that appear in the MITAB files, of which 15 are detection methods and 2 interaction types (MI:0191 "aggregation" and MI:0218 "physical interaction"). Their listing is attached. Of these, MI:0229 is actually still valid (it's an alt_id) but should be replaced with MI:0809 -- see the last line in the attached list.
 +
 
 +
Also, the detection method id "MI:0044" is not valid -- could be a typo? (it's not listed in the attached file).
 +
 +
In column 13, for mpi-imex and mpi-lit, should the code "MI:0000" be changed to MI:0903?
 +
 +
In column 14, CORUM records are referenced by their publication ID, not by their complex ID.
 +
 
 +
Outdated Entrez Gene identifier
 +
 
 +
Some users have reported that retired Entrez Gene identifiers have appeared in release 10 that were correctly updated in release 9.  
 +
 
 +
Examples
 +
 
 +
release 9:
 +
 
 +
uniprotkb:P21675|refseq:NP_620278|entrezgene/locuslink:6872|rogid:P0LoULOvon+Wp2G17lBlqn3Fo4E9606|irogid:3476704
  
Decided not to chnage L method...instead:
+
release 10:  
  
Resolve by distributing non-canonicalized data as before AND a canonicalized MITAB file with complete provenance info (this will become the main MITAB file we release and it will support PSICQUIC services and we will drop non-canonicalised version in future releases).  Also, canonicalize irefscape data and include provenace data for interactors in edge attribute viewer. 
+
entrezgene/locuslink:100287968|entrezgene/locuslink:100291704|entrezgene/locuslink:1863|rogid:P0LoULOvon+Wp2G17lBlqn3Fo4E9606|irogid:3476704
  
Requires review of current MITAB file format by Ian.
+
Release 9 used the correct id for TAF1, 6872. In release 10, 3 outdated entrez gene ids are used instead, which all say in their record: This record was replaced with Gene ID: 6872
  
===Other issues===
+
== Build issues ==
*Discuss the way to include I2D -- No I2D will not be included
 
*Parse all new datasets to a temporary database and test before homogenizing. -- not required, no new data sources
 
*Whether to use both BIND text and BIND_Translation OR only one of them
 
  
 
== Interaction related resources ==
 
== Interaction related resources ==
  
{| {{table}}
+
{| {{table}} cellpadding="10" cellspacing="0" border="1"
 
| align="center" style="background:#f0f0f0;"|'''Source'''
 
| align="center" style="background:#f0f0f0;"|'''Source'''
 
| align="center" style="background:#f0f0f0;"|'''Format'''
 
| align="center" style="background:#f0f0f0;"|'''Format'''
Line 80: Line 129:
 
| align="center" style="background:#f0f0f0;"|'''Version (date)'''
 
| align="center" style="background:#f0f0f0;"|'''Version (date)'''
 
|-
 
|-
| BIND,||Tab-delimited text file.||ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/   
+
| BIND ||Tab-delimited text file.||ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below).  
  
 
20050525.complex2refs.txt  
 
20050525.complex2refs.txt  
Line 104: Line 153:
 
http://web.archive.org/web/*/http://www.blueprint.org
 
http://web.archive.org/web/*/http://www.blueprint.org
  
||25th May, 2005
+
| 2005-05-25
 +
|-
 +
| BIND Translation ||PSI-MI 2.5||http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz  ||Version 1.0 (2010-12-15)
 +
|-
 +
| BioGRID||PSI-MI 2.5||http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.81/BIOGRID-ALL-3.1.81.psi25.zip  ||Version 3.1.81 (2011-10-01)
 +
|-
 +
| CORUM||PSI-MI 2.5||http://mips.gsf.de/genre/proj/corum/index.html<br>http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip || 2009-12-02
 +
|-
 +
| DIP||PSI-MI 2.5||http://dip.doe-mbi.ucla.edu/dip/Download.cgi
 +
<br>dip20101010.mif25
 +
<br>Note:  date on last IMEx release file is from 2008
 +
| 2010-10-10
 
|-
 
|-
| BioGRID||PSI-MI 2.5||http://www.thebiogrid.org/downloads.php<br>/Current Release/BIOGRID-ALL-2.0.61.psi.zip  ||Version 2.0.61 (January 31st, 2010)
+
| HPRD ||PSI-MI 2.5||http://www.hprd.org/download<br>HPRD_PSIMI_041310.tar.gz||Release 9 (2010-04-13)
 
|-
 
|-
| CORUM||PSI-MI 2.5||http://mips.gsf.de/genre/proj/corum/index.html<br>http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip ||December 2nd, 2009
+
| IntAct ||PSI-MI 2.5||ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-09-29/psi25/pmidMIF25.zip|| 2011-09-29
 
|-
 
|-
| DIP||PSI-MI 2.5||http://dip.doe-mbi.ucla.edu/dip/Download.cgi<br>dip20091230.mif25 ||December 30th, 2009
+
| MINT||PSI-MI 2.5|| ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/|| 2010-12-21
 
|-
 
|-
| HPRD,||PSI-MI 2.5||http://www.hprd.org/download/<br>HPRD_SINGLE_PSIMI_070609.xml.tar.gz||Release 8. July 6th, 2009
+
| MPACT||PSI-MI 2.5||ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz || 2008-01-10
 
|-
 
|-
| IntAct,||PSI-MI 2.5||ftp://ftp.ebi.ac.uk/pub/databases/intact/2010-01-22/psi25/pmidMIF25.zip|| January 22nd, 2010
+
| MPPI||PSI-MI 1.0||http://mips.gsf.de/proj/ppi/data/mppi.gz|| 2004-06-01 (from archive)
 
|-
 
|-
| MINT||PSI-MI 2.5|| ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/|| November 11th, 2009
+
| OPHID||PSI-MI 1.0||http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/)|| 2006-07-07
 
|-
 
|-
| MPACT||PSI-MI 2.5||ftp://ftpmips.gsf.de/yeast/PPI/<br>mpact-complete.psi25.xml.gz ||January 10th, 2008  ||
+
| colspan="4" align="center" style="background:#f0f0f0;" | New for this release
 
|-
 
|-
| MPPI||PSI-MI 1.0||http://mips.gsf.de/proj/ppi/data/mppi.gz||June 1st, 2004 (from archive)
+
| InnateDB ||PSI-MI 2.5|| http://www.innatedb.com/download.jsp<br>Curated InnateDB Data ||2011-03-06
 
|-
 
|-
| I2D||PSI-MI 2.5||http://ophid.utoronto.ca/ophidv2.201/downloads.jsp ||Downloaded on February 8th, 2010
+
| MPIDB||MITAB format file|| http://www.jcvi.org/mpidb (information)<br>
 +
http://www.jcvi.org/mpidb/download.php (general downloads)<br>
 +
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT (specific download for MPI-LIT)<br>
 +
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-IMEX (specific download for MPI-IMEX)
 +
|| Downloaded on 2011-10-03
 
|-
 
|-
 +
| MatrixDB||PSI-MI 2.5|| http://matrixdb.ibcp.fr/<br>MatrixDB_20100826.xml.zip || 2010-08-26 (timestamp)
 
|}
 
|}
  
== Sequence related resources ==
+
== Sequence related resources (not updated yet) ==
  
{| {{table}}
+
{| {{table}} cellpadding="10" cellspacing="0" border="1"
 
| align="center" style="background:#f0f0f0;"|'''Source'''
 
| align="center" style="background:#f0f0f0;"|'''Source'''
 
| align="center" style="background:#f0f0f0;"|'''Format'''
 
| align="center" style="background:#f0f0f0;"|'''Format'''
Line 134: Line 199:
 
| align="center" style="background:#f0f0f0;"|'''Version (date)'''
 
| align="center" style="background:#f0f0f0;"|'''Version (date)'''
 
|-
 
|-
| SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation||August 7th, 2007 (server gives "08/07/107")
+
| SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation||2007-07-24 (timestamp)
 
|-
 
|-
 
| UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)
 
| UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)
| rowspan="5" | UniProt Knowledgebase Release 15.14:<br>UniProtKB/Swiss-Prot Release 57.14 (09-Feb-2010)<br>UniProtKB/TrEMBL Release 40.14 (09-Feb-2010)
+
| rowspan="5" | UniProt Knowledgebase Release 2011_09 (2011-09-21) (Downloaded on 2011-10-04):<br>UniProtKB/Swiss-Prot <br>UniProtKB/TrEMBL <br>(from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt)
 
|-
 
|-
 
| UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz)
 
| UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz)
Line 147: Line 212:
 
| UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase.
 
| UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase.
 
|-
 
|-
| NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release 39 (January 30th, 2010)
+
| NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release 49  (2011-09-09) (Downloaded on 2011-10-04)<br>(from http://www.ncbi.nlm.nih.gov/refseq/)
|-
 
| NCBI,  MMDB/PDB||Tab-delimited text ||ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table|| (Downloaded on February 8th, 2010)
 
 
|-
 
|-
| NCBI, PDB sequences||FASTA||ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz||(Downloaded on February 8th, 2010)
+
| NCBI, MMDB/PDB||Tab-delimited text ||ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table|| (Downloaded on 2011-10-04)
 
|-
 
|-
| NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on February 8th, 2010)
+
| NCBI, PDB sequences||FASTA||ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz||(Downloaded on 2011-10-03)
 
|-
 
|-
|  
+
| NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on 2011-10-04)
 
|}
 
|}
  

Latest revision as of 11:47, 26 February 2014

NoteNote

This is a planning template for the next release. It does not correspond to a released product. See http://irefindex.org/ for the most recent release and related documentation. This page can be used to create the sources page. Check for xxx before copying and pasting to the appropriate sources page for the new release. Do not edit xxx in this page. Leave this page as a template. After making a new release page, update the general Sources for iRefIndex redirect page.

Last edited: 2014-02-26

Applies to iRefIndex release: xxx

Release date: xxx

Authors: Ian Donaldson

Database: iRefIndex (http://irefindex.org)

Organization: http://irefindex.org

Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex. Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.

Issues

Deprecated taxids appear in export for iRefWeb

See list from Yuri. Examples are 273, 510, 515, 591, 592, 601, 602, 677, 887, 1139, 1156, 1312...133899, 137208, 144556, 150147, 160268, 163106, 163653, 196590, 216593.

These appear in the 'interactor' and 'interaction_interactor_assignment' , but are not in new taxonomy 'taxonomy_scientific' and 'names" tables. A random selection of these do not appear in the mitab files so the fault likely lies in the export script for irfweb.

BioGRID interaction record ids (pre-build issue)

Capture BioGRID interaction record ids so iRefWeb can link out to BioGRID.

The only interaction id available from the BioGRID files are already being used and also there in the iRefWeb, such as...

<primaryRef db="grid" id="103" refType="identity" refTypeAc="MI:0356" dbAc="MI:0463" />

See Bugzilla:250.

MITAB/iRefScape canonicalization

Change this to choose canonical sequence rather than longest sequence (mapping score L). Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.

See Bugzilla:255.


PDB identifiers

In previous releases we have replaced the pipe character (|) of the PDB identifiers with an underscore character (_) . In this release, this is only done when there are multiple database:accession entries in a field otherwise the |) character is maintained as part of the PDB identifier. This is a regression and will be corrected in a future release.

IMEX identifiers

IMEx identifiers should be present in column 52 but appear to be missing. This is a regression and will be corrected in a future release. There are 6004 lines in release 10 with imex:... This number needs to be cross-checked before the next release. This is still an issue as of release 13.

Compatibility with Java PSI parser needs to be improved

Java parser from psimi https://code.google.com/p/psimi/downloads/detail?name=psimitab-1.8.3-distribution.zip. But there are at least a few examples where the files don't follow the specs:

-reserved characters are not quoted.

       Like for instance in file for human:
       taxid:11706(HIV-1 M:B_HXB2R)
       taxid:10299(Herpes simplex virus (type 1 / strain 17))
       go:GO:0005783|rigid:d//bz+DaMrbuxGA3i1Xe4hqlrXI|edgetype:X
       In case of controlled terms to be standard conform it should look like this:
       psi-mi:"MI:0496"(bait)

-empty columns need to be consistently filled with '-' . For example, column 15 in the human file.

-dates should be represented as yyyy/mm/dd but look like yyyy-mm-dd Thanks to Thomas Schmitt for pointing out these problems


Various issues reported by Andrei Turinsky

There are a few remaining issues, as follows:

6630 interactors have obsolete Entrez Gene IDs shown (of which 3449 in Human). 
2696 RefSeq IDs (of which 935 are Human) have no Entrez ID shown in the MITAB, but such ID is actually known from NCBI maps - not sure whether your canonicalization process may recover these IDs. Only one such case remains for UniProt IDs (uniprotkb:P0CE96 a.k.a. YL156_YEAST actually has known genes 850851, 850856, 850858 but none of these are shown) - so the rest of UniProts have been resolved, which is great.


A minor thing: 6 interactors are shown with either two different taxons due to different strains of either E.coli or yeast.
A minor thing: 5 interactors are sometimes shown with no taxon at all (4 human and 1 from A. fulgidus). 
A noticeable percentage of PubMeds have been lost for some of the source DBs, with InnateDB and DIP having lost hundreds of Pubs. Human annotations especially affected: e.g. Human DIP lost 266 pubmeds, or 16%; Human InnateDB lost 347 pubs, or 13%).
There are 17 obsolete PSI-MI IDs that appear in the MITAB files, of which 15 are detection methods and 2 interaction types (MI:0191 "aggregation" and MI:0218 "physical interaction"). Their listing is attached. Of these, MI:0229 is actually still valid (it's an alt_id) but should be replaced with MI:0809 -- see the last line in the attached list. 
Also, the detection method id "MI:0044" is not valid -- could be a typo? (it's not listed in the attached file). 

In column 13, for mpi-imex and mpi-lit, should the code "MI:0000" be changed to MI:0903?

In column 14, CORUM records are referenced by their publication ID, not by their complex ID.

Outdated Entrez Gene identifier

Some users have reported that retired Entrez Gene identifiers have appeared in release 10 that were correctly updated in release 9.

Examples

release 9:

uniprotkb:P21675|refseq:NP_620278|entrezgene/locuslink:6872|rogid:P0LoULOvon+Wp2G17lBlqn3Fo4E9606|irogid:3476704

release 10:

entrezgene/locuslink:100287968|entrezgene/locuslink:100291704|entrezgene/locuslink:1863|rogid:P0LoULOvon+Wp2G17lBlqn3Fo4E9606|irogid:3476704

Release 9 used the correct id for TAF1, 6872. In release 10, 3 outdated entrez gene ids are used instead, which all say in their record: This record was replaced with Gene ID: 6872

Build issues

Interaction related resources

Source Format Location Version (date)
BIND Tab-delimited text file. ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below).

20050525.complex2refs.txt

20050525.ints.txt

20050525.refs.txt

20050525.complexes.txt

20050525.labels.txt

20050525.complex2subunits.txt

These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/

For historical purposes, a snapshot of the the Blueprint web-site may be viewed at...

http://web.archive.org/web/20050204013426/www.blueprint.org/index.html

...via the internet archive at...

http://web.archive.org/web/*/http://www.blueprint.org

2005-05-25
BIND Translation PSI-MI 2.5 http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz Version 1.0 (2010-12-15)
BioGRID PSI-MI 2.5 http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.81/BIOGRID-ALL-3.1.81.psi25.zip Version 3.1.81 (2011-10-01)
CORUM PSI-MI 2.5 http://mips.gsf.de/genre/proj/corum/index.html
http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip
2009-12-02
DIP PSI-MI 2.5 http://dip.doe-mbi.ucla.edu/dip/Download.cgi


dip20101010.mif25
Note: date on last IMEx release file is from 2008

2010-10-10
HPRD PSI-MI 2.5 http://www.hprd.org/download
HPRD_PSIMI_041310.tar.gz
Release 9 (2010-04-13)
IntAct PSI-MI 2.5 ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-09-29/psi25/pmidMIF25.zip 2011-09-29
MINT PSI-MI 2.5 ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/ 2010-12-21
MPACT PSI-MI 2.5 ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz 2008-01-10
MPPI PSI-MI 1.0 http://mips.gsf.de/proj/ppi/data/mppi.gz 2004-06-01 (from archive)
OPHID PSI-MI 1.0 http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/) 2006-07-07
New for this release
InnateDB PSI-MI 2.5 http://www.innatedb.com/download.jsp
Curated InnateDB Data
2011-03-06
MPIDB MITAB format file http://www.jcvi.org/mpidb (information)

http://www.jcvi.org/mpidb/download.php (general downloads)
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT (specific download for MPI-LIT)
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-IMEX (specific download for MPI-IMEX)

Downloaded on 2011-10-03
MatrixDB PSI-MI 2.5 http://matrixdb.ibcp.fr/
MatrixDB_20100826.xml.zip
2010-08-26 (timestamp)

Sequence related resources (not updated yet)

Source Format Location Version (date)
SEGUID Tab-delimited text ftp://bioinformatics.anl.gov/seguid/
seguidannotation
2007-07-24 (timestamp)
UniProt Text http://www.uniprot.org/downloads
UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)
UniProt Knowledgebase Release 2011_09 (2011-09-21) (Downloaded on 2011-10-04):
UniProtKB/Swiss-Prot
UniProtKB/TrEMBL
(from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt)
UniProt Text http://www.uniprot.org/downloads
UniProtKB/TrEMBL (uniprot_trembl.dat.gz)
UniProt, IsoForms FASTA http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz
UniProt, SGD Tab-delimited text file. http://www.expasy.org/cgi-bin/lists?yeast.txt
Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD
UniProt, FLY Tab-delimited text file. http://www.expasy.org/cgi-bin/lists?fly.txt
Drosophila: entries, gene names and cross-references to FlyBase.
NCBI, RefSeq GenPept ftp://ftp.ncbi.nih.gov/refseq/release/complete
see *.protein.gpff.gz files
Release 49 (2011-09-09) (Downloaded on 2011-10-04)
(from http://www.ncbi.nlm.nih.gov/refseq/)
NCBI, MMDB/PDB Tab-delimited text ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table (Downloaded on 2011-10-04)
NCBI, PDB sequences FASTA ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz (Downloaded on 2011-10-03)
NCBI Gene2Refseq Tab-delimited text ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
gene2refseq.gz
(Downloaded on 2011-10-04)

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).