Adjustment and Combination of Independent Discrete p-values

A summary and discussion of a new optimal transport framework for combining discrete p-values proposed by Gonzalo Contador and Zheyang Wu.
Optimal Transport
Statistics
Author

Alex Yuan

In the typical data analysis framework, one takes a set of p-values \(\{P_i\}_{i=1}^n\) with each one representing evidence one obtains from a single hypothesis test and combines them into a summary statistics. This new summary statistics is used to assess the overall evidence against the “global” null hypothesis or “intersection” null hypothesis which means that all individual hypothesis in a family are simultaneously true. The advantage for combining these p-values allows one to more effectively evaluate the collective significance across multiple tests. For example, single tests may have limited power, so combining evidence from modestly small p-values across several tests provides stronger evidence together than apart.

Consider a general family of statistics:

\[ T= \sum_{i=1}^nG^{-1}(P_i) \text{ or } T= \sum_{i=1}^nG^{-1}(1-P_i) \] where \(G\) is a strictly increasing CDF. A classical example is Fisher’s and Pearson’s combinations which correspond to \(G\) being the CDF of the \(\mathbf{\chi^2_2}\) distribution. Other examples include George’s, Stouffer’s, and Edginton’s combinations which correspond to \(G\) being the CDF of the logistic\((0,1)\), normal\((0,1)\), and uniform\((0,1)\) distributions respectively.

To test the “global” or “intersection” null hypothesis, one compares the observed value of \(T\) to the quantiles of its null distribution at some significance level \(\alpha\). Specifically, we reject \(H_0\) if \(T=\sum_{j=1}^nG^{-1}(P_j) \leq \alpha \text{ quantile}\) or if \(T=\sum_{j=1}^nG^{-1}(1-P_j) \geq 1- \alpha \text{ quantile}\). Typically, in statistics literature it is assumed that the underlying p-values are continuous, moreover that \(P_j\) are iid uniform\((0,1)\). If \(G\) is the CDF of a continuous distribution then both \(G^{-1}(P_i)\) and \(G^{-1}(1-P_i)\) both follow a uniform\((0,1)\).

In practice, this is usually not the case as the data and resulting test statistics tend to be discrete. Typically discrete tests tend to have lower type I error than the nominal significance level \(\alpha\) otherwise known as being conservative. Another pitfall is that the resulting p-values are not uniformly distributed under the null hypothesis.

Switching gears for a bit, I will detail some of optimal transport background needed for this framework. Optimal transport has been around since 1781, but has recently grown into a full-blown mathematical field that has applications in many fields such as partial differential equations, Riemannian geometry, probability, and more recently statistics and machine learning. Historically, the Wasserstein distance, one of the main objects of study in optimal transport, has been employed in statistics to quantify the rate of convergence of empirical probability measures \(\mu_n\) to their limit \(\mu\).

In 1781, Gaspard Monge formulated the original optimal transport problem which can be stated as: how can one transport a given pile of sand to fill a given ditch so as to minimize the cost of transporting the sand? While this problem is easy to formulate mathematically, the problem is that there may not exist a valid transport map. About two centuries later, Leonid Kantorovich introduced a relaxation of Monge’s problem.

To state Kantorovich’s problem, we must first define a notion of a coupling. Let \(\mu\) and \(\nu\) be two probability measures over \(\mathbb{R}^d\). Define \(\gamma\) to be a coupling between these two distributions to be a joint distribution over \(\mathbb{R}^d \times \mathbb{R}^d\) such that its first marginal is \(\mu\) and its second marginal is \(\nu\). That is for any Borel set: \(A \subset \mathbb{R}^d\) \[\gamma(A \times \mathbb{R}^d) =\mu(A) \text{ and } \gamma(\mathbb{R}^d \times A) = \nu(A)\] Note that the random variables \(X \sim \mu\) and \(Y \sim \nu\) need not be defined on the same probability space, hence the term coupling comes from the fact that it forces to them to live on the same probability space by describing their probabilistic dependence.

Denote \(\Gamma_{\mu,\nu}\) to be the collection of couplings of \(\mu\) and \(\nu\) and let \(c: \mathbb{R}^d \times \mathbb{R}^d \to [0,\infty)\) be a measurable cost function. The general Kantorovich formulation of the optimal transport problem is as follows: \[ \inf_{\gamma \in \Gamma_{\mu, \nu}}\int c(x,y) \gamma(dx, dy) \] References

Chewi, S., Niles-Weed, J., & Rigollet, P. (2025). Statistical Optimal Transport. Lecture Notes in Mathematics. Springer.

Contador, G., & Wu, Z. (2025). Optimal Adjustment and Combination of Independent Discrete p-Values. arXiv preprint arXiv:2508.02647.