In gene expression studies where gene-level p-values have been calculated, Fisher's method for pooling p-values (here referred to as MLP for "mean log p") can be used to identify predefined gene sets that are enriched in the sense that the genes that comprise them have comparatively low p-values. Since gene-level p-values tend not to follow a uniform distribution even in situations that could be regarded as null, a permutation procedure is the most effective way to assess significance. However, this may prove computationally burdensome if a large number of analyses need to be done. In this article, we derive a highly accurate approximation to the permutation p-value that can be used to assess the significance of Fisher's test statistic in a computationally efficient manner. In addition, we show the superiority of this approach compared to methods based on the (regular or weighted) Kolmogorov-Smirnov statistic, which is the basis of the popular GSEA method for gene set enrichment analysis, and Fisher's exact test, which is the basis of several other gene set analysis modalities such as Ingenuity, GoMiner, MAPPFinder, and EASE. We also explore some simple but novel variations of the MLP and find that one of them, MLQ, essentially Fisher's method based on FDR-adjusted p-values or q-values, has comparable performance to MLP for small gene set sizes, but for large gene set sizes, offers noticeable improvement over MLP.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Pharmaceutical Science
- Mean log p