Bioscape Distribution

From irefindex
Revision as of 17:36, 16 February 2009 by PaulBoddie (talk | contribs)

Please note that this documentation covers an unreleased product and is for internal use only.


Distribution Structure

The distribution consists of the following elements:

Software

A Python package called bioscape which consists of a number of subpackages. Each subpackage contains modules and directories containing resources.

Subpackage Purpose Notes
bioscape.modules.chebi chemical/molecule data aggregation These subpackages are related to activities and can be considered as functional modules.
bioscape.modules.gene gene data aggregation
bioscape.modules.pubmed PubMed abstract aggregation
bioscape.modules.taxonomy taxonomy data aggregation
bioscape.modules.text text mining
bioscape.utils.database utilities for accessing databases These subpackages are focused on functionality which may be employed in a number of activities and contain modules for particular groups of functions.
bioscape.utils.files utilities for managing filesystem resources
bioscape.utils.ftp utilities for accessing FTP resources
bioscape.utils.index utilities for manipulating text indexes
bioscape.utils.templates templating utilities
bioscape.config configuration management Some modules exist to provide global services to the software
bioscape.constants constants employed throughout Bioscape

Configuration

A properties file (bioscape.cfg) located in the top-level directory configures the system as is accessed via the bioscape.config subpackage mentioned above.

Adding New Modules

The following steps should be followed to add a new activity module to the distribution and to integrate the module into various mechanisms.

  1. The new module should be inserted as a new directory under bioscape/modules.
  2. If the new module is written in Python and is to be usable as a genuine component in Bioscape, there must be an __init__.py file in the directory.
  3. Any database initialisation or finalisation templates should be placed in an sql subdirectory of the new module directory.
    1. Initialisation templates should have names of the form activity-dbsystem.sql.in. For example:
      geneparse-pgsql.sql.in
    2. Finalisation templates should have names of the form drop-activity-dbsystem (since they typically drop resources from the database). For example:
      drop-geneparse-pgsql.sql.in
    3. In template names activity is the name of the activity for which the template defines database resources; dbsystem is a database name chosen from the list of acceptable values in the bioscape.cfg.in file.
    4. Typically, activity names should be the same as the Python module or class (in other programming languages) which requires or populates the described database resources. For example...
       geneparse-pgsql.sql.in
       drop-geneparse-pgsql.sql.in
      ...both describe operations on resources which are populated by the geneparse Python module (bioscape/modules/gene/geneparse.py).
    5. The initialisation of the database must usually be performed by applying the templates in a specific order, a dependencies.txt file must be added to the sql directory containing a list of template activities showing the order of initialisation. For example:
       bionames
       index
       search
      This indicates that bionames should be applied first and search last in any initialisation of the database, whereas search should be applied first and bionames last in any finalisation of the database.
    6. Any maintenance-related templates must be documented in a tasks.txt file in the sql directory containing those templates. For example:
      bionames
      This indicates that bionames supports the backup and restore maintenance tasks.
  4. Any special configuration settings for the module should be added to the bioscape.cfg.in template. Before the module is used, the bioscape.cfg file specific to any particular installation of the software must then be prepared again from this template.
  5. To ensure that the module is installed, it should be added to the setup.py file in the packages list. For example:
     packages=[
         ...
         "bioscape.modules.newmodule",
         # Add new modules here.
         ]

    If any resource files (such as database descriptions) are provided within the module, an entry should be made in the data_files list. For example:

     data_files=[
         ...
         data_dir("bioscape.modules.newmodule", "sql", ["*.in", "*.txt"]),
         # Add new module resources here.
         ]
    The result of these modifications should be the successful installation of the new module in any installation of Bioscape.
  6. The module should be mentioned in this file and in other parts of the documentation in examples or lists of modules.

Generating API Documentation

The tools directory contains a program which can be run to generate API documentation and to put such documentation in a special apidocs directory at the root of the distribution:

  python tools/apidocs.py

The generated documentation is principally useful as a reference to the API, rather than as a resource illustrating the architecture of the system or as a guide to writing new components.