|

Introduction
Oncomine includes datasets from more than 100 different institutions. Data were generated on several platforms including Affymetrix GeneChips, commercial and home-brew cDNA microarrays and commercial spotted oligonucleotide arrays from Agilent and others. Affymetrix data is single-channel whereas cDNA and spotted oligonucleotide data is dual-channel. For all dual-channel Oncomine datasets, the first channel is the experimental sample and the second channel is a common reference used in all experiments in the respective dataset. Because Oncomine integrates diverse datasets and attempts to visualize and meta-analyze data from independent platforms and laboratories, we have universally applied a simple primary normalization procedure. Also, because the goal is to include the broadest spectrum of profiling data, we have allowed for heterogeneous pre-processing approaches per dataset, when pre-processed data is all that is available. Thus, for analysis purposes, we have not directly compared expression values from independent datasets, instead opting for primary analysis per dataset and then meta-analysis for multi-dataset analysis.
Pre-processing
Affymetrix
When raw .cel files are available, we perform RMA (Robust Multi-Chip Average) normalization. When .cel files are not available, we include datasets pre-processed by RMA or MAS5. When MAS5 data is incorporated, we include all expression values and ignore absent/present calls.
cDNA
cDNA data is typically collected as background-subtracted ratios between the experimental channel and the common reference channel. In some cases, global loess or print-tip loess normalization has been performed. If raw .gpr files are available, we perform global loess normalization.
Primary Normalization
Oncomine utilizes a simple global normalization strategy applied to all datasets regardless of the platform or the pre-processing method. Normalization is performed per microarray experiment.
- All expression values or ratios are log(2) transformed.
- The median value per microarray (experiment) is scaled to zero by subtracting the median from each value.
- The standard deviation of values for each microarray is scaled to 1 by dividing each value by the standard deviation.
- Normalized values are stored in the database and used in all Oncomine analyses (diff/ex, co/ex, etc.) and presented in Oncomine graphical representations.
Visualization Normalization
Heatmaps
Additional color normalization is performed per reporter to generate intensities in heatmaps, although primary normalization values are presented on mouse-over of heatmap cells. The default color normalization scales the mean reporter expression value to 0 and the standard deviation to 1 (z-score). Some heatmaps allow one to change the reporter color normalization from z-score to median only. This option allows one to more directly compare heatmap cells between reporters.
Box Plots
Multi-profile box plots generated by using check-boxes to select multiple profiles in the gene module or by generating a box plot from the meta module, has additional reporter normalization, scaling the mean to zero, per profile. This step allows data from independent profiles to be graphed on a common scale and emphasizes relative changes. Primary normalization values can be obtained by clicking on the box plot to generate a single profile bar graph. Single profile box plots do not have additional reporter normalization.
Bar Graphs & Scatter Plots
No additional reporter normalization is performed.
|