Difference between revisions of "iRefIndex Data Preparation for iRefWeb"

From irefindex
(Added iRefWeb source_db note.)
(→‎Mapping RIGIDs: Added sorting, de-duplication, header notes.)
 
Line 132: Line 132:
 
  from join_2
 
  from join_2
 
  order by sourceid, experiment_uid
 
  order by sourceid, experiment_uid
  into outfile '/tmp/iRefIndex_interaction_interactors';
+
  into outfile '/tmp/output/iRefIndex_interaction_interactors';
  
 
This output file can then be processed using the <tt>make_legacy_mapping.py</tt> program:
 
This output file can then be processed using the <tt>make_legacy_mapping.py</tt> program:
  
  python make_legacy_mapping.py /tmp/iRefIndex_interaction_interactors /tmp/iRefIndex_interaction_rigids
+
  python make_legacy_mapping.py /tmp/output/iRefIndex_interaction_interactors /tmp/output/iRefIndex_interaction_rigids
  
 
(The program needs Python 2.5 or later since it uses the <tt>hashlib</tt> module. It could probably be made compatible with earlier Python versions if necessary.)
 
(The program needs Python 2.5 or later since it uses the <tt>hashlib</tt> module. It could probably be made compatible with earlier Python versions if necessary.)
  
The resulting output file (<tt>/tmp/iRefIndex_interaction_rigids</tt> in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column.
+
The resulting output file (<tt>/tmp/iRefIndex_interaction_rigids</tt> in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column. Since the output may include duplicate lines, the following step is advised:
 +
 
 +
sort -u /tmp/output/iRefIndex_interaction_rigids > /tmp/iRefIndex_interaction_rigids
 +
 
 +
A header can also be included in the file:
 +
 
 +
cat header_legacy_mapping.txt /tmp/iRefIndex_interaction_rigids > iRefIndex_interaction_rigids
  
 
== All iRefIndex Pages ==
 
== All iRefIndex Pages ==

Latest revision as of 13:04, 20 October 2011

A special iRefWeb database needs to be created and populated so that iRefWeb may publish iRefIndex data.

Creating a Database

Enter MySQL using a command like the following:

mysql -h <host> -u <admin> -p -A

The <admin> is the name of the user with administrative privileges. For example:

mysql -h myhost -u admin -p -A

Then create a database using commands of the following form:

create database <database>;
grant all privileges on <database>.* to '<username>'@'%';

For example, with <database> given as iRefWeb, <username> given as irefindex, and a substitution for <password>:

create database iRefWeb;
grant all privileges on iRefWeb.* to 'irefindex'@'%';

If difficulties occur granting privileges in this way, try the following statements:

grant select, insert, update, delete, create, drop, references, index, alter, create temporary tables, lock tables, execute, create view, show view, create routine, alter routine on <database>.* to '<username>'@'%';
grant process, file on *.* to '<username>'@'%';

You should choose the same user as the one used to build the iRefIndex database since that database will need to be accessed during this activity.

Accessing the Previous iRefWeb Database

NoteNote

This should probably be changed so that the process of building an iRefWeb database depends only on the iRefIndex database.

In order to populate the iRefWeb database, the previous version of the database needs to be referenced. Thus, a command of the following form may be required:

grant select on <old_irefweb_db>.* to '<username>'@'%';

Obtaining the SQL Scripts

Get the scripts from this location:

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Updating the SQL Scripts

The make_iRefWeb.sql script needs to be updated so that each statement defining a source in the source_db table refers to the appropriate release version and date.

Preparing the SQL Scripts

The make_iRefWeb.sql and test_iRefWeb.sql scripts needs to be parameterised to refer to specific databases. The following commands should prepare a suitable version of the scripts given appropriate values for <actual_irefindex_db>, <actual_old_irefweb_db> and <actual_irefweb_db>:

sed -e 's/<irefindex_db>/<actual_irefindex_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' make_iRefWeb.sql > make_iRefWeb_specific.sql
sed -e 's/<irefweb_db>/<actual_irefweb_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' test_iRefWeb.sql > test_iRefWeb_specific.sql

For example, given the following substitutions...

Parameter Substitution Value
<irefindex_db> <actual_irefindex_db> iRefIndex_full_beta9
<old_irefweb_db> <actual_old_irefweb_db> iRefWeb8
<irefweb_db> <actual_irefweb_db> iRefWeb9

...the commands would be as follows:

sed -e 's/<irefindex_db>/iRefIndex_full_beta9/g;s/<old_irefweb_db>/iRefWeb8/g' make_iRefWeb.sql > make_iRefWeb_specific.sql
sed -e 's/<irefweb_db>/iRefWeb9/g;s/<old_irefweb_db>/iRefWeb8/g' test_iRefWeb.sql > test_iRefWeb_specific.sql

Running the SQL Scripts

In the SQL_commands directory, two scripts - preprocess_for_iRefWeb.sql and make_iRefWeb.sql - together provide a large number of SQL statements for the creation of iRefWeb data. The first of these scripts should be run as follows and any error conditions noted:

mysql -h <hostname> -u <username> -p -A -D <irefindex_database> < preprocess_for_iRefWeb.sql

The prepared version of the second script can then be run if no errors were experienced:

mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < make_iRefWeb_specific.sql

Here, <irefweb_database> refers to the current iRefWeb database created above, whereas <irefindex_database> refers to the current iRefIndex database.

The built database can be tested by running the following:

mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < test_iRefWeb_specific.sql

This will output a number of figures corresponding to changes in the number of interactions and interactors. Large deviations from the typical scale of change may indicate a problem in the generation or assignment of identifiers.

Generating the Database Dump

A database dump can be produced by running a command of the following form:

./dump_iRefWeb.sh <hostname> <username> <database>

Mapping RIGIDs

For iRefIndex 9.0, a table will be made available mapping the RIGIDs used in that release (and subsequent releases) to "legacy" RIGIDs used in earlier releases of iRefIndex. This can be produced by running the following SQL command in the iRefIndex database:

select sourceid, experiment_uid, Represent_rigid, Represent_rogid
from join_2
order by sourceid, experiment_uid
into outfile '/tmp/output/iRefIndex_interaction_interactors';

This output file can then be processed using the make_legacy_mapping.py program:

python make_legacy_mapping.py /tmp/output/iRefIndex_interaction_interactors /tmp/output/iRefIndex_interaction_rigids

(The program needs Python 2.5 or later since it uses the hashlib module. It could probably be made compatible with earlier Python versions if necessary.)

The resulting output file (/tmp/iRefIndex_interaction_rigids in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column. Since the output may include duplicate lines, the following step is advised:

sort -u /tmp/output/iRefIndex_interaction_rigids > /tmp/iRefIndex_interaction_rigids

A header can also be included in the file:

cat header_legacy_mapping.txt /tmp/iRefIndex_interaction_rigids > iRefIndex_interaction_rigids

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).