Position Paper for Semantic Web Life Sciences Workshop

J. Hunter (Distributed Systems Technology CRC), M. A. Ragan (ARC Centre in Bioinformatics and Institute for Molecular Bioscience), S. Little (Dept. ITEE)
The University of Queensland

1. Introduction to the Visible Cell Project

The "Visible Cell" project is a research project being undertaken at the ARC Centre in Bioinformatics at the University of Queensland. The aim of this project is to significantly progress our understanding of the mammalian cell via the synthesis of physical data, models, mathematical and statistical simulations, and bioinformatics data. A single cell contains tens of thousands of molecules, each interacting with other molecules in complex ways as yet not fully understood. If we can understand, visualise, model, simulate and predict how normal cells behave, we will be that much closer to understanding how abnormal cells such as cancer cells behave. The ability to model and understand the interactions of biomolecules within cells will accelerate the design, discovery and development of biomolecules such as drugs, vaccines, protein therapeutics and gene therapies. It will also be important in understanding essential contemporary issues such as directing stem cell differentiation.

The objective of the Visible Cell project is to provide a visualization environment that seamlessly embeds macromolecular structures, networks and quantitative simulations based on mathematical and complex-system models into a 3D mammalian cell reconstructed from high resolution tomograms and electron micrographs. Using physical information gained by techniques such as high resolution tomography, NMR, electron and X-ray crystallography, it is possible to provide the scientists with a dynamic 3D visualization environment for hypothesis testing and integration of new discoveries. The challenge is to manage, integrate and assimilate the large amounts of information associated with the multiscale physical data, the related highly complex bioinformatics data and the mathematical and statistical simulations.

Instrument measurements provide microscopic image data describing physical geometry and location of sub-cellular components. These empirically derived geometries provide the setting for mathematical simulation. Computer algorithms to predict the precise 3D structures of, for example, an array of proteins can be compared directly with the experimentally gained results. Complex, high-resolution models and simulation systems can assist in the prediction of phenomena such as protein-protein interactions. Appropriate semantic representation of the images will allow tools to automatically access relevant biological databases and literature from around the world, and integrate this data and information within the simulations and visualizations which spatially and temporally map the data onto the model. Virtual reality environments may offer scientists even further immersion in the three-dimensional cell through the use of haptics, providing new innovative mechanisms for discovery.

2. Key Problems/Challenges

There are four main phases in developing a virtual Visible Cell environment that will enable distributed teams of scientists to better understand cell physiology and the behaviour of cells under different circumstances. Each phase has its own challenges and requirements:

1. Develop an underlying 3D spatial matrix from tomographic images

2. Need to fit proteins into the matrix

Different scenarios which need to be handled include:

  1. protein or macromolecular assembly images available from microscopy, and protein structure known experimentally (x-ray crystallography or NMR);
  2. protein or macromolecular assembly images available from microscopy, but protein structure not known experimentally;
  3. protein or macromolecular assembly images not available from microscopy, but protein structure known experimentally;
  4. protein or macromolecular assembly images are not available from microscopy, and protein structure not known experimentally.

3. The Visible Cell as a dynamic modelling environment

4. The Visible Cell as an environment for data exploration, hypothesis formulation and testing

3. Progress to Date

Progress has been made in a number of areas. This work can be leveraged, extended and refined to develop new tools and services required by projects such as the Visible Cell project:

4. Conclusions and Requirements

The Visible Cell framework needs to integrate physical data, simulation data and bioinformatics data, in order to construct 3D models of cells and cellular processes, with the capability to extract, record and reuse new information and ideas. Consequently it requires mechanisms for storing, indexing, searching, accessing, retrieving, sharing, reusing and tracking and integrating resources which may include:

Providing a semantically rich framework for the Visible Cell project will depend on the availability of semantic descriptions of these resources. Semantic descriptions will reduce subjectivity, enhance resource discovery and interoperability and allow sophisticated semantic querying and knowledge mining. Semantic inferencing rules offer potential to generate high-level semantic descriptions of cell components from automatically extracted low-level features and to correlate data from across disciplines, media types and formats. However projects such as the Visible Cell project are going to require both extensions to existing Semantic Web technologies as well as the development of new Semantic Web technologies. More specifically it will require tools and services that include:

References

[1] Jane Hunter. "Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology" International Semantic Web Working Symposium (SWWS). Stanford. July 2001.

[2] Jane Hunter. "Enhancing the Semantic Interoperability of Multimedia through a Core Ontology" IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description. Februrary 2003.

[3] Jane Hunter and Suzanne Little. "A Framework to enable the Semantic Inferencing and Querying of Multimedia Content" International Journal of Web Engineering and Technology (IJWET) Special Issue on the Semantic Web. to appear 2005

[4] Suzanne Little and Jane Hunter, "Rules-By-Example - a Novel Approach to Semantic Indexing and Querying of Images", 3rd International Semantic Web Conference (ISWC2004). Hiroshima, Japan, November 2004.

[5] Jane Hunter, Katya Falkovych and Suzanne Little. "Next Generation Search Interfaces - Interactive Data Exploration and Hypothesis Testing" 8th European Conference on Digital Libraries (ECDL2004). Bath, UK, September 2004.