Bioscape Installation

From irefindex
Revision as of 13:30, 4 February 2009 by PaulBoddie (talk | contribs) (Initial draft based on parts of README.txt.)

Installation

Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from a suitable package, in order to save time and effort working through the installation process manually. However, for those interested in installing Bioscape from the source code distribution, the procedure is given below.

Installation from Source Code

Bioscape can be installed as follows:

  python setup.py install

Note that you may need to be a privileged user to perform the above command, and it might be preferable to choose an alternative installation location if you do not have administrative or superuser rights. The following command provides an example of installing the software in another location:

  python setup.py install --prefix=/home/user/software/usr

You will need to change the location according to your own system's conventions and your own preferences. Once installed, you may also need to tell your system where to find the installed libraries and programs; this is usually done by modifying environment variables, and could be done for the above example by adding the following definitions to your environment configuration:

  export PATH=${PATH}:/home/user/software/usr/bin
  export PYTHONPATH=${PYTHONPATH}:/home/user/software/usr/lib/python2.3/site-packages

Note that the exact details of the latter definition, particularly the version of Python (2.3) and the library directory (lib) may depend on certain system details.

Dependency Configuration

For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. Some brief details of this work are given below.

PostgreSQL

It is necessary to initialise a "database cluster" for Bioscape. This is typically done using commands such as the following:

  mkdir -p /home/user/software/var/lib/pgsql
  initdb -D /home/user/software/var/lib/pgsql

Setting the PGDATA environment variable to the directory given in the above commands will save you the effort of specifying it later with other PostgreSQL-related commands.

In order to get improved performance from PostgreSQL, consider replacing the postgresql.conf file in the database cluster with the version found in the docs/database directory.

Configuration

Before use, the distribution must be configured according to the environment in which the software will operate. This is done most conveniently by running the configuration program:

  python bioscape_configure.py

The configuration program takes the bioscape.cfg.in template and produces a specific bioscape.cfg configuration file. An alternative approach is to copy bioscape.cfg.in to bioscape.cfg and to edit the file manually.

Once the bioscape.cfg file has been produced, it may be left in a "working directory" where all Bioscape-related tasks will be performed, or it can be copied or moved to your home directory; for example:

  mv bioscape.cfg /home/user

See below for advice on setting database parameters in the configuration.

Useful Configuration Value Groups

The following groups of settings and values may be of use when choosing particular configurations of the software.

Setting Value
database_system pgsql
jdbc_database_url jdbc:postgresql://localhost/bioscape
jdbc_driver_class org.postgresql.Driver

Database Configuration

In order to use certain modules (or packages) within the distribution, the database support must be configured, preferably using the database configuration program:

  python bioscape_dbconfigure.py

Each of the modules (or packages) requiring database support can be listed, and the specific table and data definitions can be prepared and invoked using the database configuration program.

Quick Start

Use the quick start program in order to initialise Bioscape as quickly as possible:

  bioscape_quickstart.py -t quickstart

Or, from the distribution directory:

  python scripts/bioscape_quickstart.py -t quickstart

The program has a range of "targets" that can be specified; running the program without any arguments (given as -t quickstart above) will indicate some of these targets.

Dependencies

Bioscape has the following basic dependencies:

Package Release Information Purpose Notes
Python Tested with 2.3.6, 2.4.4 Runs most of the software Note that Python releases in the 2.3 series earlier than 2.3.5 have threading issues which are exposed by PyLucene, causing deadlock situations. Additional compatibility issues with gcj apply to PyLucene, and it is recommended that the software be compiled with gcj 3.4.6, potentially together with a suitable version of Python (such as 2.3.5 or 2.4.4 or later).
PyLucene Tested with 2.0.0, 2.1.0-2 Indexes textual documents
CMDsyntax 0.91 Processes command line options
XSLTools 0.6 Produces the Web interface
WebStack 1.3 Produces the Web interface
libxml2dom 0.4.6 Required by XSLTools
libxslt Tested with 1.1.20 Required by XSLTools
libxml2 Tested with 2.6.27 Required by libxml2dom
PostgreSQL Tested with 8.1.9 Storage of information Currently PostgreSQL is the only supported database system
pyPgSQL Tested with 2.5.1 Database access
egenix-mx-base Tested with 3.0.0 Required by pyPgSQL
Optional: to collect words from WordNet, the following dependencies apply:
Package Release Information Purpose Notes
WordNet 3.0 Provides the WordNet database
pywordnet 2.0.1 A Python interface to WordNet
Alternative: to use Bioscape with LingPipe, the following dependencies apply:
Package Release Information Purpose Notes
Jython Tested with 2.2a1 Used to run LingPipe-related software
LingPipe Tested with 2.3.0 Sentence splitting in textual documents
Lucene Tested with 2.0.0 Indexes textual documents
PostgreSQL JDBC Driver Tested with 8.1-407 JDBC 3 Database access (if PostgreSQL is used) Required by Jython
Optional: the following dependencies are related to improving the software:
Epydoc Tested with 3.0a3 API document generation

Bundled Resources

The following resources are currently bundled with the software:

english.words ftp://ftp.cs.cornell.edu/pub/smart/
abbreviations.txt A combination of the following, plus additional terms, with fragments incorporated in the list, in place of the full abbreviations, where appropriate:
official.txt A combination of files from the downloadable archive found at the following location:

http://www.dcs.shef.ac.uk/research/ilash/Moby/mwords.html

The following files from the archive were concatenated, sorted, with duplicate and multiple-word entries removed:

113809of.fic 4160offi.cia

The following command was used to prepare the file:

cat 113809of.fic 4160offi.cia | sort | uniq > official.txt

According to a notice at the following location, the Moby lexicon project has been placed in the public domain:

http://www.dcs.shef.ac.uk/research/ilash/Moby/

wordnet.txt A list of distinct nouns, verbs, adjectives and adverbs from the WordNet 3.0 database, prepared using the bioscape_get_wordnet.py script. See the docs/licences/LICENSE-WordNet file for copyright and licensing information.
common_english.txt Common English word token dictionary processed from the common_english file (taking stripped text after the . field separator), with the original file retrieved from the following location:

http://pir.georgetown.edu/pirwww/iprolink/protname.shtml

adjectives.txt Animal adjectives. See the permissive licensing details in the docs/licences/adjectives.txt file for more information.

Additional Resources

Entrez Gene 
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
Entrez Taxonomy 
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy
NCBI PubMed 
http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed