Visual Analytics of Life Science Data

The analysis of gene expression experiments on microarrays is challenging scientific endeavor, since it involves the scalable processing of very large, heterogeneous, incomplete, potentially conflicting, and potentially dynamic data. The relevant information (genes active in a biological process) is very difficult to extract and requires the support of automated extraction algorithms based on statistical computing. Unfortunately, the unsupervised application of these statistical measures does not guarantee the successful extraction of relevant information, but requires critical consideration itself. Hence, the use of powerful visualization and interaction methods is of central relevance.


The scope of this project is the application of the visual analytics paradigm for the analysis of complex microarray datasets. It combines novel visual exploration and interaction methods with advanced statistical computing to extract the relevant information from potentially huge datasets generated by high-throughput methods such as microarrays. Furthermore, methods from perception research will be applied to create a perception sensitive processing pipeline, including a psychophysical evaluation study on the created methodology. Preliminary results already indicate that this avenue will lead to an effective analysis and “enable profound insight” into the application domain. This visual analytics approach, combining visualization, interaction, data integration, and statistics for large high-throughput experiments in molecular biology, aims at an innovative contribution that enables sustainable and significant impact to the life sciences.


The focus of the work so far has been on the efficient visualization of and the effective interaction with these types of data, aiming at the visual exploration of volumetric datasets. In this context, we designed the SignatureSpace approach, which provided methods for the visual exploration and analysis of volume datasets using information visualization methodology combined with methods from scientific visualization. Due to the focus on volumetric datasets, the SignatureSpace approach allows the visual representation of a large number of data points, which makes it in particular useful for large datasets from bioinformatics, which are still smaller than most volumetric datasets. This was demonstrated in the SPRAY system, which is an application of the SignatureSpace to microarray data. A typical application of SPRAY is the question how physical exercise is 40 influencing the immune response of the human body. Gene expression experiments of saliva samples of the participants of a halfmarathon provided information on a superset of genes, while the biological active genes were exposed by the SPRAY system using a combination of visual and statistical computing. This example is shown in Figure 1, where the information is plotted in a parallel coordinate system. The eight left-most axes represent the gene expression values, while the 10 right-most axes represent the derived information on a variety of statistical correction methods. This combined visualization allows for a more effective and comprehensive visual analysis.

 

Figure 1: Gene expression data (8 dimensions) and derived statistical data (10 dimensions) are visualized together.