Identification of Prognostic Genes by Combining Information across Different Institutions and Array Platforms Keith Baggerly Section of Bioinformatics Department of Biostatistics and Applied Mathematics M.D. Anderson Cancer Center Abstract: Identification of genes that predispose individuals to cancer initiation or better clinical outcomes has been the goal of many published microarray studies. In many cases, both the raw and the clinical data are available on the web, so it is natural to ask whether we can improve our identifications of "important" genes by combining information across studies. Such meta-analysis is often difficult, in large part due to vagaries of processing from lab to lab and platform to platform which make the mappings of numerical scores approximate at best. In this talk, we address the meta-analysis issue by defining a new mapping between different types of Affymetrix platforms. Using this mapping, we reexamine the raw data from two studies on lung cancer (Beer et al, Nat. Med, 2002, and Bhattacharjee et al, PNAS, 2001). Whereas the initial studies tried to identify genes that were associated with poor clinical outcome, we restrict our attention further by focusing on genes found to have predictive value over and above that obtained using readily available clinical covariates such as age and smoking history. We find that the use of multiple datasets does increase our ability to find interesting genes by increasing the sample size, and we further find that the lists produced have a high density of genes known to be important in cancer progression. Almost none of the genes that we found were described in the initial papers.