Testing Utilities¶
Goodness of Fit Testing¶
This module implements goodness of fit tests for checking agreement between
distributions’ .sample()
and .log_prob()
methods. The main functions
return a goodness of fit p-value gof
which for good data should be
Uniform(0,1)
distributed and for bad data should be close to zero. To use
this returned number in tests, set a global variable TEST_FAILURE_RATE
to
something smaller than 1 / number of tests in your suite, then in each test
assert gof > TEST_FAILURE_RATE
. For example:
TEST_FAILURE_RATE = 1 / 20 # For 1 in 20 chance of spurious failure.
def test_my_distribution():
d = MyDistribution()
samples = d.sample([10000])
probs = d.log_prob(samples).exp()
gof = auto_goodness_of_fit(samples, probs)
assert gof > TEST_FAILURE_RATE
This module is a port of the goftests library.
- multinomial_goodness_of_fit(probs, counts, *, total_count=None, plot=False)[source]¶
Pearson’s chi^2 test, on possibly truncated data. https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
- Parameters
probs (torch.Tensor) – Vector of probabilities.
counts (torch.Tensor) – Vector of counts.
total_count (int) – Optional total count in case data is truncated, otherwise None.
plot (bool) – Whether to print a histogram. Defaults to False.
- Returns
p-value of truncated multinomial sample.
- Return type
- unif01_goodness_of_fit(samples, *, plot=False)[source]¶
Bin uniformly distributed samples and apply Pearson’s chi^2 test.
- Parameters
samples (torch.Tensor) – A vector of real-valued samples from a candidate distribution that should be Uniform(0, 1)-distributed.
plot (bool) – Whether to print a histogram. Defaults to False.
- Returns
Goodness of fit, as a p-value.
- Return type
- exp_goodness_of_fit(samples, plot=False)[source]¶
Transform exponentially distribued samples to Uniform(0,1) distribution and assess goodness of fit via binned Pearson’s chi^2 test.
- Parameters
samples (torch.Tensor) – A vector of real-valued samples from a candidate distribution that should be Exponential(1)-distributed.
plot (bool) – Whether to print a histogram. Defaults to False.
- Returns
Goodness of fit, as a p-value.
- Return type
- density_goodness_of_fit(samples, probs, plot=False)[source]¶
Transform arbitrary continuous samples to Uniform(0,1) distribution and assess goodness of fit via binned Pearson’s chi^2 test.
- Parameters
samples (torch.Tensor) – A vector list of real-valued samples from a distribution.
probs (torch.Tensor) – A vector of probability densities evaluated at those samples.
plot (bool) – Whether to print a histogram. Defaults to False.
- Returns
Goodness of fit, as a p-value.
- Return type
- vector_density_goodness_of_fit(samples, probs, *, dim=None, plot=False)[source]¶
Transform arbitrary multivariate continuous samples to Univariate(0,1) distribution via nearest neighbor distribution [1,2,3] and assess goodness of fit via binned Pearson’s chi^2 test.
- [1] Peter J. Bickel and Leo Breiman (1983)
“Sums of Functions of Nearest Neighbor Distances, Moment Bounds, Limit Theorems and a Goodness of Fit Test” https://projecteuclid.org/download/pdf_1/euclid.aop/1176993668
- [2] Mike Williams (2010)
“How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics.” https://arxiv.org/abs/1006.3019
- [3] Nearest Neighbour Distribution
https://en.wikipedia.org/wiki/Nearest_neighbour_distribution
- Parameters
samples (torch.Tensor) – A tensor of real-vector-valued samples from a distribution.
probs (torch.Tensor) – A vector of probability densities evaluated at those samples.
dim (int) – Optional dimension of the submanifold on which data lie. Defaults to
samples.shape[-1]
.plot (bool) – Whether to print a histogram. Defaults to False.
- Returns
Goodness of fit, as a p-value.
- Return type
- auto_goodness_of_fit(samples, probs, *, dim=None, plot=False)[source]¶
Dispatch on sample dimension and delegate to either
density_goodness_of_fit()
orvector_density_goodness_of_fit()
.- Parameters
samples (torch.Tensor) – A tensor of samples stacked on their leftmost dimension.
probs (torch.Tensor) – A vector of probabilities evaluated at those samples.
dim (int) – Optional manifold dimension, defaults to
samples.shape[1:].numel()
.plot (bool) – Whether to print a histogram. Defaults to False.