Detailed inferences about mechanisms behind normal and abnormal biological function can be assembled through integration of complementary types of high throughput molecular information, digitized histology data, in-vivo imaging data and clinical information. Multiple complementary types of biomedical information sources are employed in an increasing fraction of clinical studies. Such clinical studies aim to better target treatments by predicting how various subclasses of patients will respond to a given treatment. Studies that generate complementary sets of clinical, molecular, pathology and imaging data in a coordinated manner are sometimes referred to as "deep integrative clinical studies". Deep integrative clinical studies ideally result in well defined conclusions about how pathological processes or treatments alter systems biology pathways, patterns of protein expression and biological structures.
Over the next few years, my principal research objective will be to develop principles, techniques and tools that can be used by biomedical researchers to assemble a coherent biomedical picture by integrating information from multiple complementary data sources. My approach is to develop knowledge and data management middleware so that investigators can explore different ways of synthesizing information from multiple disparate datasources. This middleware will allow researchers to generate and test biomedically-meaningful hypotheses.
One step towards this goal has been my team’s development of caGrid, a strongly typed “computational grid” service oriented software architecture. The meaning of information in each data source is described using a semantic modeling scheme consisting of controlled vocabularies and the UML modeling language. Computational services are modeled in an analogous manner. Standardized data layouts corresponding to semantic models are described by XML. caGrid is designed to support composition of semantically modeled data sources and computational services.
Another step has been my team’s development of imaging informatics tools, algorithms and techniques. Radiology and Pathology images are used as key components of baseline disease classification; images are also increasingly used as biomarkers to assess treatment response. Reproducible and standardized methods of image analysis and quantification are crucial components of deep integrative studies. My group’s efforts in this area includes development of methods for labeling anatomic and microanatomic structures found in images along with techniques for sharing this labeling information in a computational grid architecture. Members of my research group have also been active in development of image segmentation and classification algorithms designed (to varying degrees) to help automate the image labeling process.
Development of effective and efficient support for demand driven high end computing has been a third step towards enablement of deep integrative clinical studies. Many existing image and bioinformatics analyses are compute and/or data intensive. Even more computationally intensive are algorithms being developed to carry out coordinated analyses of complementary datasets and analyses designed to interpret increasingly high resolution imaging datasets.