Scalable Nyström-Based Kernel Two-Sample Testing with Permutations

By Nova Solari | 2025-09-26_02-02-39

Scalable Nyström-Based Kernel Two-Sample Testing with Permutations

Kernel two-sample tests, such as the Maximum Mean Discrepancy (MMD), are powerful tools for detecting distributional differences between samples in high dimensions. Yet, a naive implementation requires computing and storing large Gram matrices, which becomes prohibitive as data sizes grow. The Nyström method offers a principled way to build a compact, low-rank surrogate of the kernel matrix, enabling scalable testing without sacrificing too much statistical power.

Foundations: why kernel two-sample testing matters

At its core, the kernel two-sample test assesses whether two samples X and Y come from the same underlying distribution. The MMD statistic measures differences in mean embeddings in a reproducing kernel Hilbert space (RKHS). When the null hypothesis is true, a permutation procedure provides a user-friendly way to approximate the null distribution of the statistic and compute p-values. This approach is model-free, makes few distributional assumptions, and adapts to complex, nonlinear differences that may be invisible to classical tests.

The challenge isn’t just accuracy; it’s doing it at scale. Nyström approximations let us keep the power of kernel tests while handling millions of observations.

Nyström approximation: how it speeds things up

The Nyström method constructs a low-rank surrogate of the full kernel matrix by selecting a subset of m landmark points from the data. Let X be the full sample with n points and S be the selected landmarks (|S| = m). Compute:

Then the kernel matrix K ≈ Ḱ = C W^{-1} C^T serves as a low-rank approximation. In practice, this surrogate enables an approximate feature map in a finite-dimensional space, derived from the eigen-decomposition of W. With this, the infinite-dimensional RKHS operations reduce to O(n m) computations rather than O(n^2), which is a substantial saving for large n. The result is a scalable path to estimate MMD without forming the full Gram matrix.

Permutation testing with Nyström

Permutation testing remains a natural way to obtain p-values under the null hypothesis of equal distributions. With the Nyström surrogate in hand, you:

The accuracy of the permutation test hinges on the stability of the Nyström approximation. In practice, a modest number of landmarks often yields robust results, and the permutation procedure helps absorb any residual approximation bias by anchoring the null distribution directly to the data.

Practical guidelines for effective deployment

Extensions and caveats

The Nyström-based approach is compatible with a variety of kernels beyond the standard Gaussian family, including specialized kernels for structured data. When the eigenvalue spectrum of the kernel matrix decays slowly, you may need a larger m to preserve power. Be mindful of potential biases introduced by subsampling landmarks; the permutation framework helps mitigate this, but explicit bias analysis can provide further reassurance in critical applications.

For extremely large-scale problems, consider a two-stage strategy: (1) use Nyström with a moderate m to obtain an initial test, and (2) zoom in on regions or subsets of data where differences appear strongest, applying the same methodology to focused comparisons. This preserves scalability while maintaining interpretability of findings.

Putting it all together

Scalable Nyström-based kernel two-sample testing with permutations marries the power of kernel methods with a practical pathway to handle datasets that were previously out of reach. By trading a controlled amount of approximation for substantial gains in speed, you retain robust detection of distributional differences across high-dimensional spaces while keeping computation within practical bounds. When your goal is to compare large populations, monitor shifts with confidence, and do so efficiently, this approach offers a compelling balance of rigor and scalability.