README iRefIndex MITAB2.6 proposal for 7.0

Note

This is an expansion of the MITAB format that was proposed for use in iRefIndex 7.0 and subsequent releases; it does not correspond to any released product and is considered obsolete.

See README MITAB2.6 for iRefIndex 7.0 for the revised MITAB format eventually adopted for iRefIndex 7.0 and for future releases.
See http://irefindex.uio.no for links to the latest release and relevant README documentation.

This proposal is based on the experimental form of the iRefIndex MITAB format found at...

http://irefindex.uio.no/wiki/README_iRefIndex_expanded_MITAB_proposal

Look for xxx or To do notes for things that need to be changed to create a version specific form of this README.
Look for Change notes for items that differ significantly from the current MITAB format.

This format is based on recent changes agreed upon by the PSI-MI working group in Turku, Finland.

References:

Last edited: 2010-10-19

Applies to iRefIndex release: none

Release date: never incorporated into a release

Download location: not available

Authors: Ian Donaldson, Sabry Razick, Paul Boddie

Database: iRefIndex (http://irefindex.uio.no)

Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)

Note: this distribution includes only those data that may be freely distributed under the copyright license of the source database. See Description below.

Description

This file describes the contents of the

xxx

directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.

Details on the build process are available from the publication PMID 18823568.

There are two sets of data: free and proprietary. The free version includes only those data that may be freely distributed under the copyright license of the source database. This includes data from BIND, BioGRID, IntAct, MINT, MPPI and OPHID.

iRefIndex also integrates data from CORUM, DIP, HPRD and MPact. This data is not distributed publicly, but may be made available to academic users under a collaborative agreement.

Contact ian.donaldson at biotek.uio.no if you are interested in using the iRefIndex database or would like your database included in the public release of the index.

Sources	http://irefindex.uio.no/wiki/Sources_iRefIndex_xxx
Statistics	http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx
Download location	ftp://ftp.no.embnet.org/irefindex/data/archive/xxx

Directory contents

`README`	pointer to this file at http://irefindex.uio.no/wiki/README_iRefIndex_MITAB_xxx
`Sources`	pointer to data files for this release at http://irefindex.uio.no/wiki/Sources_iRefIndex_xxx
`Statistics`	pointer to statisitics for this release at http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx
`xxxx.mitab.mmddyyyy.txt.zip`	individual indices in PSI-MITAB2.5 format

iRefIndex data is distributed as a set of tab-delimited text files with names of the form xxxx.mitab.mmddyyyy.txt.zip where mmddyyyy represents the file's creation date.

The complete index is available as All.mitab.mmddyyyy.txt.zip .

Taxon specific data sets are also available for:

	Taxon Id
Homo sapiens	9606 (human)
Mus musculus	10090 (mouse)
Rattus norvegicus	10116 (brown rat)
Caenorhabditis elegans	6239 (nematode)
Drosophila melanogaster	7227 (fruit fly)
Saccharomyces cerevisiae	4932 (baker's yeast)
Escherichia coli.	562 (E. Coli)
Other	other
All	all

Taxon specific subsets of the data are named xxxx.mitab.mmddyyyy.txt.zip where xxxx is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name xxxx.mitab.mmddyyyy.txt.

In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism.

Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.

A description of the NCBI taxon identifiers is available at the following location:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy

The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The All.mitab.mmddyyyy file is a complete and non-redundant listing.

The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.

Changes from last version

Sabry new {

This version is comparable to the experimental centric version
The way interactions involving more than one instance of a protein is changed. This will reflect much for homo-dimers, intramolecular interactions and homo-polymers. This change is due to the possibility of the original source providing an interaction between two isoforms of the same gene and canonicalization process wrongly representing this as homo-interaction.

The new representation is as follows.

When there is only one molecule provided as the interactor, both uidA and uidB will be this molecule and the edge type would be “Y” (please see column=50).
When there are two molecule of the same type or canonical iRefIndex procedure maps them to be so; both uidA and uidB will be this molecule and the edge type would be “X” (please see column=50).
when there are more than two molecule of the same type or canonical iRefIndex procedure maps them to be so; both uidA and uidB will be this molecule and the edge type would be “C” (please see column=50). Bipartite representation will be used in this case.

} xxx

Known Issues

xxx

Understanding the iRefIndex MITAB format

iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in a recent PSI-MI paper (PMID 17925023, full text). The following summary shows the columns defined by version 2.6 of the format plus columns added by iRefIndex (italicised) grouped by entity type:

Entity type	Principal columns	Other columns
Experiment	Method, author, pmids
Interaction	Before_C13N_rigid	interactionType, sourcedb, interactionIdentifiers, confidence, edgetype, numParticipants
Interactor	Final_ROGID_A, Final_ROGID_B	taxA, taxB, interactor_type_A, interactor_type_B, OriginalReferenceA, OriginalReferenceB, FinalReferenceA, FinalReferenceB
Canonical interaction	Checksum_Interaction, C13N_rig
Canonical interactor	Checksum_A, Checksum_B, irogA, irogB	uidA, uidB, altA, altB, aliasA, aliasB

Since this PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.

What each line represents

Each line or row in the MITAB file represents a single interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).

Change

Previously, each line represented a group of interaction records.

A single interaction is described on a separate line since this allows us to convey additional information about each of the original source records. Users can still "collapse" or find all lines that describe an interaction between the same set of proteins by using the "RIG" (column 47) or "RIGID" (column 35 or 48). Rows with identical rigids (redundant interaction group identifiers) all describe interactions between the same set of proteins.

The natural keys for each interaction record in this group (that is, the record identifiers from the source database) are listed under interactionIdentifier (column 14). For example:

intact:EBI-761694

Change

Our surrogate (primary) key for a group of redundant interaction records (RIG) is no longer listed in column 14; only the source database record is listed in this column. The RIG identifier is now listed (by itself) in column 48 (and column 35 in canonical form).

The RIG identifier is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record (see columns 41 and 42). The RIG identifier is listed (by itself) in column 48 for convenience. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).

Representation of interactions

Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either 1) an intramolecular interaction is being represented or 2) a multimer (3 or more) of some protein is being represented. These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. The way we handle this is to list the ROG identifier for the single interactor twice (once in each of columns 41 and 42) of the MITAB. The RIG identifier for these interactions will be the SHA-1 digest of the interactor’s ROG id (see column 48). These interactions are marked by a Y in column 50.

Note that column 50 may also contain a C. This indicates that the MITAB entry describes membership of a protein in some complex. These entries correspond to PSI-MI records where more than two interactors are listed (associative interaction data; a.k.a. n-ary data cf. binary data). In these cases, the first column holds the ROG identifier of the complex and the second column contains the ROG id of the protein. We refer to this method of representation as a bi-partite model since there are two kinds of nodes corresponding to complexes and proteins.

As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation.

Then we would represent the complex in the MITAB file using three lines:

X-A
X-B
X-C

All three entries would have the same string in column 35 or 48 (the RIG id for the complex). All three entries would have a C in column 50.

Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a spoke model to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:

A-B
A-C

Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.

Alternatively, a matrix model might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file: A-B, B-C, and A-C.

All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data.

We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want.

Users are advised that other databases will use spoke and matrix model representations of complexes. In these cases, column 50 will indicate this fact. The pairs of proteins found in these entries do not necessarily represent observations of real binary interactions: they merely represent membership in some larger list of proteins observed to be somehow associated.

For binary interaction data, column 50 will contain an X. Two protein interactor ROGIDs will be listed in columns 33 and 34 (and also in columns 41 and 42).

Canonical interactors and interactions

As indicated by the summary table given above, the MITAB format used by iRefIndex now contains information about interactors and interactions that use canonicalized information as described in the Canonicalization document. Since each line refers to a specific, observed interaction (column 48) and specific interactors (columns 41 and 42), information about the canonical groups involved in an interaction (columns 33 and 34, also columns 43 and 44) and the resulting canonical interaction (column 35, also column 47) provides an additional layer which can be used to group specific interactions.

Thus, in the file, a collection of interactions "labelled" with this additional layer of canonical information would resemble the following:

CA-CB A1-B1 CI I1
CA-CB A1-B2 CI I2
CA-CB A2-B2 CI I3

Here, CA is the canonical group for A1 and A2, and CB is the canonical group for B1 and B2. Since CA and CB remain the same for all of the specific interactions listed above, the canonical ROGID will also remain the same in the form of CI, even though the specific interactions between combinations of A1, A2, B1 and B2 produce the distinct ROGIDs I1, I2 and I3.

Canonical group coverage

Note that the MITAB file will not necessarily provide all members of a given canonical object group - that is, all ROGIDs corresponding to a given canonical ROGID - since the file only contains observed interactions. Although a ROGID may be mapped to a canonical ROGID, if the specific ROGID is never observed in an interaction, it will never be listed in this file. Consequently, any attempt to find the theoretical size of a canonical group - the number of proteins potentially represented by a particular canonical ROGID - will fail where such non-interacting ROGIDs exist. This file can only provide the size of a canonical group in terms of interacting proteins.

To find the size of a canonical group in terms of its interacting members, all distinct ROGIDs corresponding to a particular canonical ROGID can be collected, regardless of the interactions in which they participate.
By considering canonical interactions, the number of interacting members of a canonical group can be found for each canonical interaction. Note that this figure is typically less than the total number of interacting members for any given canonical group.

License

Data released on this public ftp site are released under the Creative Commons Attribution License http://creativecommons.org/licenses/by/2.5/. This means that you are free to use, modify and redistribute these data for personal or commercial use so long as you provide appropriate credit. See next section.

iRefIndex data distributed on the FTP site includes only those data that may be freely distributed under the copyright license of the source database. This includes data from BIND, BioGRID, IntAct, MINT, MPPI and OPHID.

iRefIndex also integrates data from CORUM, DIP, HPRD and MPact. These data are not distributed publicly. These data may be made available to academic users under a collaborative agreement.

Contact ian.donaldson at biotek.uio.no if you are interested in using the iRefIndex database or would like your database included in the public release of the index.

Citation

Credit should include citing the iRefIndex paper (PMID 18823568) and any of the source databases upon which this resource is based. See http://irefindex.uio.no for appropriate citations.

Disclaimer

Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Description of PSI-MITAB2.6 file

Each line in this file represents either

an interaction between two proteins (binary interaction) or
the membership of a protein in some complex (complex membership) or
an interaction that involves only one protein type (multimer or self-interaction).

See column 50 for more details.

Column number: 1 (uidA)

Column type:	String
Description:	Unique identifier for the canonical group to which interactor A belongs.
Example:	uniprotkb:P23367

Notes

This column contains an identifier, taken from a major database, for a protein representing the canonical group to which interactor A belongs. i.e select one identifier from a list of identifiers used by all the sources to represent any member of the canonical group. The user should not assume that this provided identifier is the one participating in the interaction, this is just a selected identifier to represent the canonical group. Due to the way original sources provide identifier, this field may contain:

Identifiers of wrong format : (e,g RefSeq:NP 036076 instead of RefSeq:NP_036076)
Version information (e.g. GenBank:AAN15193.1)
Wrong database (e.g. GenBank:NP_013133 instead of RefSeq:NP_013133)
Structural identifier which are not referring to the full sequence ("PDB:1OCC|I")
Incomplete identifiers ("PDB:1KQ1| ")
Outdated or deleted identifier (UniProt:Q9H233 instead of UniProt:Q29RF6)

It should also be noted that “OriginalReferenceA” (Column number: 38) may or may not match what is given in this column.

But it is guaranteed that a canonical group would be always represented by the same identifier.

Column number: 2 (uidB)

Column type:	String
Description:	Unique identifier for the canonical group to which interactor B belongs.
Example:	uniprotkb:P06722

Notes

See notes for column 1.

Column number: 3 (altA)

Column type:	Pipe-delimited set of strings
Description:	Alternative identifiers for interactor A
Example:	uniprotkb:P23367\|refseq:NP_418591\|entrezgene/locuslink:948691

Notes

Change

Previously, this column listed database identifiers for specific interactors. Since column 1 (and column 33) now refers to a canonical group, this column lists interactors for all members of that group.

Column 3 lists database names and accessions that belong to the same canonical group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the closely related taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in column 43.

Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:

http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI

Database references listed in this column may include the following:

uniprotkb: The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line.
refseq: If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession.
entrezgene/locuslink: NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version
other: If none of the three identifier types are available then other databasename:accession pairs will be listed. These database names may not follow the MI controlled vocabulary.

Example:

emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991

irefindex: If the node represents a complex, then the rogid for the complex will be listed here, such as the following:

irefindex:xBr9cTXgzPLNxsaKiYyHcoEm/DM

Column number: 4 (altB)

Column type:	Pipe-delimited set of strings
Description:	Alternative identifiers for interactor B
Example:	uniprotkb:P06722\|refseq:NP_417308\|entrezgene/locuslink:947299

Notes

See notes for column 3.

Column number: 5 (aliasA)

Column type:	Pipe-delimited set of strings
Description:	Aliases for interactor A
Example:	uniprotkb:MUTL_ECOLI\|entrezgene/locuslink:mutL

Notes

Each pipe-delimited entry is a database name:alias pair delimited by a colon. Database names are taken from the PSI-MI controlled vocabulary at the following location:

http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI

Database names and sources listed in this column may include the following:

uniprotkb:entry name: the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file
entrezgene/locuslink:symbol: the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for gene_info, specifically details for the Symbol column
irefindex:complex: If the node is a complex then irefindex:complex will be listed here.
NA: NA may be listed here if aliases are not available

Column number: 6 (aliasB)

Column type:	Pipe-delimited set of strings
Description:	Aliases for interactor B
Example:	uniprotkb:MUTH_ECOLI\|entrezgene/locuslink:mutH

Notes

See notes for column 5.

Column number: 7 (Method)

Column type:	String
Description:	Interaction detection method
Example:	MI:0039(2h fragment pooling)

Notes

Change

Only a single method will appear in this column. Previously, multiple methods appeared.

Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.

The interaction detection method is from the original record. Path for PSI-MI 2.5:

entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/

Change

If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then MI:0000 will appear before the shortLabels.

NA or -1 may appear in place of a recognised shortLabel.

For example:

MI:0000(-1)
MI:0000(NA)

Column number: 8 (author)

Column type:	Pipe-delimited set of strings
Description:
Example:	hall-1999-1\|hall-1999-2\|mansour-2001-1\|mansour-2001-2\|hall-1999

Notes

According to MITAB2.5 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.

Change

This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.

Column number: 9 (pmids)

Column type:	Pipe-delimited set of strings
Description:	PubMed Identifiers
Example:	pubmed:9880500\|pubmed:11585365

Notes

This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. According to MITAB2.5 format, this column should contain a pipe-delimited set of databaseName:identifier pairs such as pubmed:12345. The source database name is always pubmed.

Change

This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references will be included here.

The special value - may appear in place of the identifiers.

Column number: 10 (taxa)

Column type:	Pipe-delimited set of strings
Description:	Taxonomy identifier for canonical interactor A
Example:	taxid:83333(Escherichia coli K-12)

Notes

The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be different than what is listed in the interaction record. See the methods section for more details. See also the NCBI taxonomy database at the following location:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy

According to the MITAB2.5 format, this column should contain a pipe delimited set of databaseName:identifier pairs such as taxid:12345. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be NA if the interactor is a complex.

Column number: 11 (taxb)

Column type:	Pipe-delimited set of strings
Description:	Taxonomy identifier for canonical interactor B
Example:	taxid:83333(Escherichia coli K-12)

Notes

See notes for column 10.

Column number: 12 (interactionType)

Column type:	String
Description:	Interaction Type from controlled vocabulary or short label
Example:	MI:0218(physical interaction)

Notes

Change

Only one interaction type will be present in each line of the file (previously, multiple types were listed).

The interaction type is taken from the PSI-MI controlled vocabulary and represented as...

database:identifier(interaction type)

...(when available in the interaction record) or Path for PSI-MI 2.5:

entrySet/entry/interactionList/interaction/interactionType/names/shortLabel

See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.

Change

If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier. If this was not possible then MI:0000 is listed.

NA may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).

Column number: 13 (sourcedb)

Column type:	String
Description:	Source databases containing this interaction
Example:	MI:0469(intact)

Notes

Taken from the PSI-MI controlled vocabulary and represented as...

database:identifier(source name)

See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.

Change

Only one source database will be listed in each row.

Column number: 14 (interactionIdentifier)

Column type:	String
Description:	source interaction database and accession
Example:	intact:EBI-761694

Notes

Each reference is presented as a database name:identifier pair.

Change

Only one source database reference will be listed in each row. The RIGID (from iRefIndex) is no longer listed in this column. See column 35 instead.

The source database names that appear in this column are taken from the PSI-MI controlled vocabulary at the following location (where possible):

http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI

If an interaction record identifier is not provided by the source database, this entry will appear as database-name:- with the identifier region replaced with a dash (-).

Column number: 15 (confidence)

Column type:	Pipe-delimited set of strings
Description:	Confidence scores
Example:	lpr:1\|hpr:12\|np:1

Notes

Each reference is presented as a scoreName:score pair. Three confidence scores are provided: lpr, hpr and np.

PubMed Identifiers (PMIDs) point to literature references that support an interaction. A PMID may be used to support more than one interaction.

The lpr score (lowest pmid re-use) is the lowest number of distinct interactions (RIGIDs, see column 35) that any one PMID (supporting the interaction in this row) is used to support. A value of one indicates that at least one of the PMIDs supporting this interaction has never been used to support any other interaction. This likely indicates that only one interaction was described by that reference and that the present interaction is not derived from high throughput methods.

The hpr score (highest pmid re-use) is the highest number of interactions (RIGIDs, see column 35) that any one PMID (supporting the interaction in this row) is used to support. A high value (e.g. greater than 50) indicates that one PMID describes at least 50 other interactions and it is more likely that high-throughput methods were used.

The np score (number pmids) is the total number of unique PMIDs used to support the interaction described in this row.

- may appear in the score field, indicating the absence of a score value.

Change

COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT

Column number: 16 (expansion)

Column type:	String
Description:	Model used to convert n-ary data into binary data for purpose of export in MITAB file
Example:	bipartite

Notes

For iRefIndex, this column will always contain either bipartite or none.

Other databases may use either spoke or matrix or none in this column.

See Understanding the iRefIndex MITAB format for an explanation.

Column number: 17 (biological_role_A)

Column type:	String
Description:	Biological role of interactor A
Example:	MI:0501(enzyme)

Notes

When provided by the source database, this includes single entries such as MI:0501(enzyme), MI:0502(enzyme target), MI:0580(electron acceptor), or MI:0499(unspecified role).

See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.

For complexes and when no role is specified this column will indicate an unspecified role.

Column number: 18 (biological_role_B)

Column type:	String
Description:	Biological role of interactor B
Example:	MI:0501(enzyme)

Notes

See notes for column 17.

Column number: 19 (experimental_role_A)

Column type:	String
Description:	Indicates the experimental role of the interactor (such as bait or prey).
Example:	MI:0496(bait)
Example:	MI:0498(prey)

Notes

This column indicates the experimental role (if any was provided by the source database) that was played by interactor A (columns 1, 33, 41).

See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey. as well as browse other possible values of experimental role that may appear in this column for other databases.

For complexes and when no role is specified this column will contain the following:

MI:0499(unspecified role)

Column number: 20 (experimental_role_B)

Column type:	String
Description:	Indicates the experimental role of the interactor (such as bait or prey).
Example:	MI:0496(bait)
Example:	MI:0498(prey)

Notes

This column indicates the experimental role (if any) that was played by interactor B (columns 2, 34, 42).

See notes above for column 19.

Column number: 21 (interactor_type_A)

Column type:	String
Description:	describes the type of molecule that A is
Example:	MI:0326(protein)

Notes

For iRefIndex, this will always be one of...

MI:0326(protein)
MI:0315(protein complex)

Column number: 22 (interactor_type_B)

Column type:	String
Description:	describes the type of molecule that B is
Example:	MI:0326(protein)

Notes

See column 21.