Distributions¶

PyTorch Distributions¶

Most distributions in Pyro are thin wrappers around PyTorch distributions. For details on the PyTorch distribution interface, see torch.distributions.distribution.Distribution. For differences between the Pyro and PyTorch interfaces, see TorchDistributionMixin.

Bernoulli¶

class Bernoulli(probs=None, logits=None, validate_args=None)¶: Wraps torch.distributions.bernoulli.Bernoulli with TorchDistributionMixin.

Beta¶

class Beta(concentration1, concentration0, validate_args=None)[source]¶: Wraps torch.distributions.beta.Beta with TorchDistributionMixin.

Binomial¶

class Binomial(total_count=1, probs=None, logits=None, validate_args=None)[source]¶: Wraps torch.distributions.binomial.Binomial with TorchDistributionMixin.

Categorical¶

class Categorical(probs=None, logits=None, validate_args=None)[source]¶: Wraps torch.distributions.categorical.Categorical with TorchDistributionMixin.

Cauchy¶

class Cauchy(loc, scale, validate_args=None)¶: Wraps torch.distributions.cauchy.Cauchy with TorchDistributionMixin.

Chi2¶

class Chi2(df, validate_args=None)¶: Wraps torch.distributions.chi2.Chi2 with TorchDistributionMixin.

ContinuousBernoulli¶

class ContinuousBernoulli(probs=None, logits=None, lims=(0.499, 0.501), validate_args=None)¶: Wraps torch.distributions.continuous_bernoulli.ContinuousBernoulli with TorchDistributionMixin.

Dirichlet¶

class Dirichlet(concentration, validate_args=None)[source]¶: Wraps torch.distributions.dirichlet.Dirichlet with TorchDistributionMixin.

Exponential¶

class Exponential(rate, validate_args=None)¶: Wraps torch.distributions.exponential.Exponential with TorchDistributionMixin.

ExponentialFamily¶

class ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)¶: Wraps torch.distributions.exp_family.ExponentialFamily with TorchDistributionMixin.

FisherSnedecor¶

class FisherSnedecor(df1, df2, validate_args=None)¶: Wraps torch.distributions.fishersnedecor.FisherSnedecor with TorchDistributionMixin.

Gamma¶

class Gamma(concentration, rate, validate_args=None)[source]¶: Wraps torch.distributions.gamma.Gamma with TorchDistributionMixin.

Geometric¶

class Geometric(probs=None, logits=None, validate_args=None)[source]¶: Wraps torch.distributions.geometric.Geometric with TorchDistributionMixin.

Gumbel¶

class Gumbel(loc, scale, validate_args=None)¶: Wraps torch.distributions.gumbel.Gumbel with TorchDistributionMixin.

HalfCauchy¶

class HalfCauchy(scale, validate_args=None)¶: Wraps torch.distributions.half_cauchy.HalfCauchy with TorchDistributionMixin.

HalfNormal¶

class HalfNormal(scale, validate_args=None)¶: Wraps torch.distributions.half_normal.HalfNormal with TorchDistributionMixin.

Independent¶

class Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)[source]¶: Wraps torch.distributions.independent.Independent with TorchDistributionMixin.

Kumaraswamy¶

class Kumaraswamy(concentration1, concentration0, validate_args=None)¶: Wraps torch.distributions.kumaraswamy.Kumaraswamy with TorchDistributionMixin.

LKJCholesky¶

class LKJCholesky(dim, concentration=1.0, validate_args=None)¶: Wraps torch.distributions.lkj_cholesky.LKJCholesky with TorchDistributionMixin.

Laplace¶

class Laplace(loc, scale, validate_args=None)¶: Wraps torch.distributions.laplace.Laplace with TorchDistributionMixin.

LogNormal¶

class LogNormal(loc, scale, validate_args=None)[source]¶: Wraps torch.distributions.log_normal.LogNormal with TorchDistributionMixin.

LogisticNormal¶

class LogisticNormal(loc, scale, validate_args=None)¶: Wraps torch.distributions.logistic_normal.LogisticNormal with TorchDistributionMixin.

LowRankMultivariateNormal¶

class LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)[source]¶: Wraps torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal with TorchDistributionMixin.

MixtureSameFamily¶

class MixtureSameFamily(mixture_distribution, component_distribution, validate_args=None)¶: Wraps torch.distributions.mixture_same_family.MixtureSameFamily with TorchDistributionMixin.

Multinomial¶

class Multinomial(total_count=1, probs=None, logits=None, validate_args=None)[source]¶: Wraps torch.distributions.multinomial.Multinomial with TorchDistributionMixin.

MultivariateNormal¶

class MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source]¶: Wraps torch.distributions.multivariate_normal.MultivariateNormal with TorchDistributionMixin.

NegativeBinomial¶

class NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)¶: Wraps torch.distributions.negative_binomial.NegativeBinomial with TorchDistributionMixin.

Normal¶

class Normal(loc, scale, validate_args=None)[source]¶: Wraps torch.distributions.normal.Normal with TorchDistributionMixin.

OneHotCategorical¶

class OneHotCategorical(probs=None, logits=None, validate_args=None)[source]¶: Wraps torch.distributions.one_hot_categorical.OneHotCategorical with TorchDistributionMixin.

OneHotCategoricalStraightThrough¶

class OneHotCategoricalStraightThrough(probs=None, logits=None, validate_args=None)¶: Wraps torch.distributions.one_hot_categorical.OneHotCategoricalStraightThrough with TorchDistributionMixin.

Pareto¶

class Pareto(scale, alpha, validate_args=None)¶: Wraps torch.distributions.pareto.Pareto with TorchDistributionMixin.

Poisson¶

class Poisson(rate, validate_args=None)¶: Wraps torch.distributions.poisson.Poisson with TorchDistributionMixin.

RelaxedBernoulli¶

class RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)¶: Wraps torch.distributions.relaxed_bernoulli.RelaxedBernoulli with TorchDistributionMixin.

RelaxedOneHotCategorical¶

class RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)¶: Wraps torch.distributions.relaxed_categorical.RelaxedOneHotCategorical with TorchDistributionMixin.

StudentT¶

class StudentT(df, loc=0.0, scale=1.0, validate_args=None)¶: Wraps torch.distributions.studentT.StudentT with TorchDistributionMixin.

TransformedDistribution¶

class TransformedDistribution(base_distribution, transforms, validate_args=None)¶: Wraps torch.distributions.transformed_distribution.TransformedDistribution with TorchDistributionMixin.

Uniform¶

class Uniform(low, high, validate_args=None)[source]¶: Wraps torch.distributions.uniform.Uniform with TorchDistributionMixin.

VonMises¶

class VonMises(loc, concentration, validate_args=None)¶: Wraps torch.distributions.von_mises.VonMises with TorchDistributionMixin.

Weibull¶

class Weibull(scale, concentration, validate_args=None)¶: Wraps torch.distributions.weibull.Weibull with TorchDistributionMixin.

Pyro Distributions¶

Abstract Distribution¶

class Distribution[source]¶

Bases: object

Base class for parameterized probability distributions.

Distributions in Pyro are stochastic function objects with sample() and log_prob() methods. Distribution are stochastic functions with fixed parameters:

d = dist.Bernoulli(param)
x = d()                                # Draws a random sample.
p = d.log_prob(x)                      # Evaluates log probability of x.

Implementing New Distributions:

Derived classes must implement the methods: sample(), log_prob().

Examples:

Take a look at the examples to see how they interact with inference algorithms.

has_rsample = False¶

has_enumerate_support = False¶

__call__(*args, **kwargs)[source]¶

Samples a random value (just an alias for .sample(*args, **kwargs)).

For tensor distributions, the returned tensor should have the same .shape as the parameters.

Returns:	A random value.
Return type:	torch.Tensor

sample(*args, **kwargs)[source]¶

Samples a random value.

For tensor distributions, the returned tensor should have the same .shape as the parameters, unless otherwise noted.

Parameters:	sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
Returns:	A random value or batch of random values (if parameters are batched). The shape of the result should be `self.shape()`.
Return type:	torch.Tensor

log_prob(x, *args, **kwargs)[source]¶

Evaluates log probability densities for each of a batch of samples.

Parameters:	x (torch.Tensor) – A single value or a batch of values batched along axis 0.
Returns:	log probability densities as a one-dimensional `Tensor` with same batch size as value and params. The shape of the result should be `self.batch_size`.
Return type:	torch.Tensor

score_parts(x, *args, **kwargs)[source]¶

Computes ingredients for stochastic gradient estimators of ELBO.

The default implementation is correct both for non-reparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.

Setting .has_rsample on a distribution instance will determine whether inference engines like SVI use reparameterized samplers or the score function estimator.

Parameters:	x (torch.Tensor) – A single value or batch of values.
Returns:	A ScoreParts object containing parts of the ELBO estimator.
Return type:	ScoreParts

enumerate_support(expand=True)[source]¶

Returns a representation of the parametrized distribution’s support, along the first dimension. This is implemented only by discrete distributions.

Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product.

Parameters:	expand (bool) – whether to expand the result to a tensor of shape `(n,) + batch_shape + event_shape`. If false, the return value has unexpanded shape `(n,) + (1,)*len(batch_shape) + event_shape` which can be broadcasted to the full shape.
Returns:	An iterator over the distribution’s discrete support.
Return type:	iterator

conjugate_update(other)[source]¶

EXPERIMENTAL Creates an updated distribution fusing information from another compatible distribution. This is supported by only a few conjugate distributions.

This should satisfy the equation:

fg, log_normalizer = f.conjugate_update(g)
assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer

Note this is equivalent to funsor.ops.add on Funsor distributions, but we return a lazy sum (updated, log_normalizer) because PyTorch distributions must be normalized. Thus conjugate_update() should commute with dist_to_funsor() and tensor_to_funsor()

dist_to_funsor(f) + dist_to_funsor(g)
  == dist_to_funsor(fg) + tensor_to_funsor(log_normalizer)

Parameters:	other – A distribution representing `p(data\|latent)` but normalized over `latent` rather than `data`. Here `latent` is a candidate sample from `self` and `data` is a ground observation of unrelated type.
Returns:	a pair `(updated,log_normalizer)` where `updated` is an updated distribution of type `type(self)`, and `log_normalizer` is a `Tensor` representing the normalization factor.

has_rsample_(value)[source]¶

Force reparameterized or detached sampling on a single distribution instance. This sets the .has_rsample attribute in-place.

This is useful to instruct inference algorithms to avoid reparameterized gradients for variables that discontinuously determine downstream control flow.

Parameters:	value (bool) – Whether samples will be pathwise differentiable.
Returns:	self
Return type:	Distribution

rv¶

EXPERIMENTAL Switch to the Random Variable DSL for applying transformations to random variables. Supports either chaining operations or arithmetic operator overloading.

Example usage:

# This should be equivalent to an Exponential distribution.
Uniform(0, 1).rv.log().neg().dist

# These two distributions Y1, Y2 should be the same
X = Uniform(0, 1).rv
Y1 = X.mul(4).pow(0.5).sub(1).abs().neg().dist
Y2 = (-abs((4*X)**(0.5) - 1)).dist

Returns:	A :class: ~pyro.contrib.randomvariable.random_variable.RandomVariable object wrapping this distribution.
Return type:	RandomVariable

TorchDistributionMixin¶

class TorchDistributionMixin[source]¶

Bases: pyro.distributions.distribution.Distribution

Mixin to provide Pyro compatibility for PyTorch distributions.

You should instead use TorchDistribution for new distribution classes.

This is mainly useful for wrapping existing PyTorch distributions for use in Pyro. Derived classes must first inherit from torch.distributions.distribution.Distribution and then inherit from TorchDistributionMixin.

__call__(sample_shape=torch.Size([]))[source]¶

Samples a random value.

This is reparameterized whenever possible, calling rsample() for reparameterized distributions and sample() for non-reparameterized distributions.

Parameters:	sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
Returns:	A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape().
Return type:	torch.Tensor

event_dim¶

Returns:	Number of dimensions of individual events.
Return type:	int

shape(sample_shape=torch.Size([]))[source]¶

The tensor shape of samples from this distribution.

Samples are of shape:

d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape

Parameters:	sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
Returns:	Tensor shape of samples.
Return type:	torch.Size

classmethod infer_shapes(**arg_shapes)[source]¶

Infers batch_shape and event_shape given shapes of args to __init__().

Note

This assumes distribution shape depends only on the shapes of tensor inputs, not in the data contained in those inputs.

Parameters:	**arg_shapes – Keywords mapping name of input arg to `torch.Size` or tuple representing the sizes of each tensor input.
Returns:	A pair `(batch_shape, event_shape)` of the shapes of a distribution that would be created with input args of the given shapes.
Return type:	tuple

expand(batch_shape, _instance=None)[source]¶

Returns a new ExpandedDistribution instance with batch dimensions expanded to batch_shape.

Parameters:	batch_shape (tuple) – batch shape to expand to. _instance – unused argument for compatibility with `torch.distributions.Distribution.expand()`
Returns:	an instance of ExpandedDistribution.
Return type:	`ExpandedDistribution`

expand_by(sample_shape)[source]¶

Expands a distribution by adding sample_shape to the left side of its batch_shape.

To expand internal dims of self.batch_shape from 1 to something larger, use expand() instead.

Parameters:	sample_shape (torch.Size) – The size of the iid batch to be drawn from the distribution.
Returns:	An expanded version of this distribution.
Return type:	`ExpandedDistribution`

reshape(sample_shape=None, extra_event_dims=None)[source]¶

to_event(reinterpreted_batch_ndims=None)[source]¶

Reinterprets the n rightmost dimensions of this distributions batch_shape as event dims, adding them to the left side of event_shape.

Example:

>>> [d1.batch_shape, d1.event_shape]
[torch.Size([2, 3]), torch.Size([4, 5])]
>>> d2 = d1.to_event(1)
>>> [d2.batch_shape, d2.event_shape]
[torch.Size([2]), torch.Size([3, 4, 5])]
>>> d3 = d1.to_event(2)
>>> [d3.batch_shape, d3.event_shape]
[torch.Size([]), torch.Size([2, 3, 4, 5])]

Parameters:	reinterpreted_batch_ndims (int) – The number of batch dimensions to reinterpret as event dimensions. May be negative to remove dimensions from an `pyro.distributions.torch.Independent` . If None, convert all dimensions to event dimensions.
Returns:	A reshaped version of this distribution.
Return type:	`pyro.distributions.torch.Independent`

independent(reinterpreted_batch_ndims=None)[source]¶

mask(mask)[source]¶

Masks a distribution by a boolean or boolean-valued tensor that is broadcastable to the distributions batch_shape .

Parameters:	mask (bool or torch.Tensor) – A boolean or boolean valued tensor.
Returns:	A masked copy of this distribution.
Return type:	`MaskedDistribution`

TorchDistribution¶

class TorchDistribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source]¶

Bases: torch.distributions.distribution.Distribution, pyro.distributions.torch_distribution.TorchDistributionMixin

Base class for PyTorch-compatible distributions with Pyro support.

This should be the base class for almost all new Pyro distributions.

Note

Parameters and data should be of type Tensor and all methods return type Tensor unless otherwise noted.

Tensor Shapes:

TorchDistributions provide a method .shape() for the tensor shape of samples:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)

Pyro follows the same distribution shape semantics as PyTorch. It distinguishes between three different roles for tensor shapes of samples:

sample shape corresponds to the shape of the iid samples drawn from the distribution. This is taken as an argument by the distribution’s sample method.
batch shape corresponds to non-identical (independent) parameterizations of the distribution, inferred from the distribution’s parameter shapes. This is fixed for a distribution instance.
event shape corresponds to the event dimensions of the distribution, which is fixed for a distribution class. These are collapsed when we try to score a sample from the distribution via d.log_prob(x).

These shapes are related by the equation:

assert d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape

Distributions provide a vectorized log_prob() method that evaluates the log probability density of each event in a batch independently, returning a tensor of shape sample_shape + d.batch_shape:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)
log_p = d.log_prob(x)
assert log_p.shape == sample_shape + d.batch_shape

Implementing New Distributions:

Derived classes must implement the methods sample() (or rsample() if .has_rsample == True) and log_prob(), and must implement the properties batch_shape, and event_shape. Discrete classes may also implement the enumerate_support() method to improve gradient estimates and set .has_enumerate_support = True.

expand(batch_shape, _instance=None)¶

Returns a new ExpandedDistribution instance with batch dimensions expanded to batch_shape.

Parameters:	batch_shape (tuple) – batch shape to expand to. _instance – unused argument for compatibility with `torch.distributions.Distribution.expand()`
Returns:	an instance of ExpandedDistribution.
Return type:	`ExpandedDistribution`

AffineBeta¶

class AffineBeta(concentration1, concentration0, loc, scale, validate_args=None)[source]¶

Bases: pyro.distributions.torch.TransformedDistribution

Beta distribution scaled by scale and shifted by loc:

X ~ Beta(concentration1, concentration0)
f(X) = loc + scale * X
Y = f(X) ~ AffineBeta(concentration1, concentration0, loc, scale)

Parameters:	concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution. concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution. loc (float or torch.Tensor) – location parameter. scale (float or torch.Tensor) – scale parameter.

arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

concentration0¶

concentration1¶

expand(batch_shape, _instance=None)[source]¶

high¶

static infer_shapes(concentration1, concentration0, loc, scale)[source]¶

loc¶

low¶

mean¶

rsample(sample_shape=torch.Size([]))[source]¶: Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.

sample(sample_shape=torch.Size([]))[source]¶: Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.

sample_size¶

scale¶

support¶

variance¶

AVFMultivariateNormal¶

class AVFMultivariateNormal(loc, scale_tril, control_var)[source]¶

Bases: pyro.distributions.torch.MultivariateNormal

Multivariate normal (Gaussian) distribution with transport equation inspired control variates (adaptive velocity fields).

A distribution over vectors in which all the elements have a joint Gaussian density.

Parameters:

loc (torch.Tensor) – D-dimensional mean vector.
scale_tril (torch.Tensor) – Cholesky of Covariance matrix; D x D matrix.
control_var (torch.Tensor) – 2 x L x D tensor that parameterizes the control variate; L is an arbitrary positive integer. This parameter needs to be learned (i.e. adapted) to achieve lower variance gradients. In a typical use case this parameter will be adapted concurrently with the loc and scale_tril that define the distribution.

Example usage:

control_var = torch.tensor(0.1 * torch.ones(2, 1, D), requires_grad=True)
opt_cv = torch.optim.Adam([control_var], lr=0.1, betas=(0.5, 0.999))

for _ in range(1000):
    d = AVFMultivariateNormal(loc, scale_tril, control_var)
    z = d.rsample()
    cost = torch.pow(z, 2.0).sum()
    cost.backward()
    opt_cv.step()
    opt_cv.zero_grad()

arg_constraints = {'control_var': Real(), 'loc': Real(), 'scale_tril': LowerTriangular()}¶

rsample(sample_shape=torch.Size([]))[source]¶

BetaBinomial¶

class BetaBinomial(concentration1, concentration0, total_count=1, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Compound distribution comprising of a beta-binomial pair. The probability of success (probs for the Binomial distribution) is unknown and randomly drawn from a Beta distribution prior to a certain number of Bernoulli trials given by total_count.

Parameters:	concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution. concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution. total_count (float or torch.Tensor) – Number of Bernoulli trials.

approx_log_prob_tol = 0.0¶

arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': IntegerGreaterThan(lower_bound=0)}¶

concentration0¶

concentration1¶

enumerate_support(expand=True)[source]¶

expand(batch_shape, _instance=None)[source]¶

has_enumerate_support = True¶

log_prob(value)[source]¶

mean¶

sample(sample_shape=())[source]¶

support¶

variance¶

CoalescentTimes¶

class CoalescentTimes(leaf_times, rate=1.0, *, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Distribution over sorted coalescent times given irregular sampled leaf_times and constant population size.

Sample values will be sorted sets of binary coalescent times. Each sample value will have cardinality value.size(-1) = leaf_times.size(-1) - 1, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.

References

[1] J.F.C. Kingman (1982): “On the Genealogy of Large Populations” Journal of Applied Probability
[2] J.F.C. Kingman (1982): “The Coalescent” Stochastic Processes and their Applications

Parameters:	leaf_times (torch.Tensor) – Vector of times of sampling events, i.e. leaf nodes in the phylogeny. These can be arbitrary real numbers with arbitrary order and duplicates. rate (torch.Tensor) – Base coalescent rate (pairwise rate of coalescence) under a constant population size model. Defaults to 1.

arg_constraints = {'leaf_times': Real(), 'rate': GreaterThan(lower_bound=0.0)}¶

log_prob(value)[source]¶

sample(sample_shape=torch.Size([]))[source]¶

support¶

CoalescentTimesWithRate¶

class CoalescentTimesWithRate(leaf_times, rate_grid, *, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Distribution over coalescent times given irregular sampled leaf_times and piecewise constant coalescent rates defined on a regular time grid.

This assumes a piecewise constant base coalescent rate specified on time intervals (-inf,1], [1,2], …, [T-1,inf), where T = rate_grid.size(-1). Leaves may be sampled at arbitrary real times, but are commonly sampled in the interval [0, T].

Sample values will be sorted sets of binary coalescent times. Each sample value will have cardinality value.size(-1) = leaf_times.size(-1) - 1, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.

This distribution implements log_prob() but not .sample().

ConditionalDistribution¶

class ConditionalDistribution[source]¶

Bases: abc.ABC

condition(context)[source]¶

Return type:	torch.distributions.Distribution

ConditionalTransformedDistribution¶

class ConditionalTransformedDistribution(base_dist, transforms)[source]¶

Bases: pyro.distributions.conditional.ConditionalDistribution

clear_cache()[source]¶

condition(context)[source]¶

Delta¶

class Delta(v, log_density=0.0, event_dim=0, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Degenerate discrete distribution (a single point).

Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.

Parameters:	v (torch.Tensor) – The single support element. log_density (torch.Tensor) – An optional density for this Delta. This is useful to keep the class of `Delta` distributions closed under differentiable transformation. event_dim (int) – Optional event dimension, defaults to zero.

arg_constraints = {'log_density': Real(), 'v': Dependent()}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

log_prob(x)[source]¶

mean¶

rsample(sample_shape=torch.Size([]))[source]¶

support¶

variance¶

DirichletMultinomial¶

class DirichletMultinomial(concentration, total_count=1, is_sparse=False, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Compound distribution comprising of a dirichlet-multinomial pair. The probability of classes (probs for the Multinomial distribution) is unknown and randomly drawn from a Dirichlet distribution prior to a certain number of Categorical trials given by total_count.

Parameters:	or torch.Tensor concentration (float) – concentration parameter (alpha) for the Dirichlet distribution. or torch.Tensor total_count (int) – number of Categorical trials. is_sparse (bool) – Whether to assume value is mostly zero when computing `log_prob()`, which can speed up computation when data is sparse.

arg_constraints = {'concentration': IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'total_count': IntegerGreaterThan(lower_bound=0)}¶

concentration¶

expand(batch_shape, _instance=None)[source]¶

static infer_shapes(concentration, total_count=())[source]¶

log_prob(value)[source]¶

mean¶

sample(sample_shape=())[source]¶

support¶

variance¶

DiscreteHMM¶

class DiscreteHMM(initial_logits, transition_logits, observation_dist, validate_args=None, duration=None)[source]¶

Bases: pyro.distributions.hmm.HiddenMarkovModel

Hidden Markov Model with discrete latent state and arbitrary observation distribution. This uses [1] to parallelize over time, achieving O(log(time)) parallel complexity.

The event_shape of this distribution includes time on the left:

event_shape = (num_steps,) + observation_dist.event_shape

This distribution supports any combination of homogeneous/heterogeneous time dependency of transition_logits and observation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape with num_steps = 1, allowing log_prob() to work with arbitrary length data:

# homogeneous + homogeneous case:
event_shape = (1,) + observation_dist.event_shape

References:

[1] Simo Sarkka, Angel F. Garcia-Fernandez (2019): “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf

Parameters:

initial_logits (Tensor) – A logits tensor for an initial categorical distribution over latent states. Should have rightmost size state_dim and be broadcastable to batch_shape + (state_dim,).
transition_logits (Tensor) – A logits tensor for transition conditional distributions between latent states. Should have rightmost shape (state_dim, state_dim) (old, new), and be broadcastable to batch_shape + (num_steps, state_dim, state_dim).
observation_dist (Distribution) – A conditional distribution of observed data conditioned on latent state. The .batch_shape should have rightmost size state_dim and be broadcastable to batch_shape + (num_steps, state_dim). The .event_shape may be arbitrary.
duration (int) – Optional size of the time axis event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints = {'initial_logits': Real(), 'transition_logits': Real()}¶

expand(batch_shape, _instance=None)[source]¶

filter(value)[source]¶

Compute posterior over final state given a sequence of observations.

Parameters:	value (Tensor) – A sequence of observations.
Returns:	A posterior distribution over latent states at the final time step. `result.logits` can then be used as `initial_logits` in a sequential Pyro model for prediction.
Return type:	Categorical

log_prob(value)[source]¶

support¶

EmpiricalDistribution¶

class Empirical(samples, log_weights, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Empirical distribution associated with the sampled data. Note that the shape requirement for log_weights is that its shape must match the leftmost shape of samples. Samples are aggregated along the aggregation_dim, which is the rightmost dim of log_weights.

Example:

>>> emp_dist = Empirical(torch.randn(2, 3, 10), torch.ones(2, 3))
>>> emp_dist.batch_shape
torch.Size([2])
>>> emp_dist.event_shape
torch.Size([10])

>>> single_sample = emp_dist.sample()
>>> single_sample.shape
torch.Size([2, 10])
>>> batch_sample = emp_dist.sample((100,))
>>> batch_sample.shape
torch.Size([100, 2, 10])

>>> emp_dist.log_prob(single_sample).shape
torch.Size([2])
>>> # Vectorized samples cannot be scored by log_prob.
>>> with pyro.validation_enabled():
...     emp_dist.log_prob(batch_sample).shape
Traceback (most recent call last):
...
ValueError: ``value.shape`` must be torch.Size([2, 10])

Parameters:	samples (torch.Tensor) – samples from the empirical distribution. log_weights (torch.Tensor) – log weights (optional) corresponding to the samples.

arg_constraints = {}¶

enumerate_support(expand=True)[source]¶: See pyro.distributions.torch_distribution.TorchDistribution.enumerate_support()

event_shape¶: See pyro.distributions.torch_distribution.TorchDistribution.event_shape()

has_enumerate_support = True¶

log_prob(value)[source]¶

Returns the log of the probability mass function evaluated at value. Note that this currently only supports scoring values with empty sample_shape.

Parameters:	value (torch.Tensor) – scalar or tensor value to be scored.

log_weights¶

mean¶: See pyro.distributions.torch_distribution.TorchDistribution.mean()

sample(sample_shape=torch.Size([]))[source]¶: See pyro.distributions.torch_distribution.TorchDistribution.sample()

sample_size¶

Number of samples that constitute the empirical distribution.

Return int:	number of samples collected.

support = Real()¶

variance¶: See pyro.distributions.torch_distribution.TorchDistribution.variance()

ExtendedBetaBinomial¶

class ExtendedBetaBinomial(concentration1, concentration0, total_count=1, validate_args=None)[source]¶

Bases: pyro.distributions.conjugate.BetaBinomial

EXPERIMENTAL BetaBinomial distribution extended to have logical support the entire integers and to allow arbitrary integer total_count. Numerical support is still the integer interval [0, total_count].

arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': Integer}¶

log_prob(value)[source]¶

support = Integer¶

ExtendedBinomial¶

class ExtendedBinomial(total_count=1, probs=None, logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch.Binomial

EXPERIMENTAL Binomial distribution extended to have logical support the entire integers and to allow arbitrary integer total_count. Numerical support is still the integer interval [0, total_count].

arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': Integer}¶

log_prob(value)[source]¶

support = Integer¶

FoldedDistribution¶

class FoldedDistribution(base_dist, validate_args=None)[source]¶

Bases: pyro.distributions.torch.TransformedDistribution

Equivalent to TransformedDistribution(base_dist, AbsTransform()), but additionally supports log_prob() .

Parameters:	base_dist (Distribution) – The distribution to reflect.

expand(batch_shape, _instance=None)[source]¶

log_prob(value)[source]¶

support = GreaterThan(lower_bound=0.0)¶

GammaGaussianHMM¶

class GammaGaussianHMM(scale_dist, initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶

Bases: pyro.distributions.hmm.HiddenMarkovModel

Hidden Markov Model with the joint distribution of initial state, hidden state, and observed state is a MultivariateStudentT distribution along the line of references [2] and [3]. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity.

This GammaGaussianHMM class corresponds to the generative model:

s = Gamma(df/2, df/2).sample()
z = scale(initial_dist, s).sample()
x = []
for t in range(num_events):
    z = z @ transition_matrix + scale(transition_dist, s).sample()
    x.append(z @ observation_matrix + scale(observation_dist, s).sample())

where scale(mvn(loc, precision), s) := mvn(loc, s * precision).

The event_shape of this distribution includes time on the left:

event_shape = (num_steps,) + observation_dist.event_shape

This distribution supports any combination of homogeneous/heterogeneous time dependency of transition_dist and observation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape with num_steps = 1, allowing log_prob() to work with arbitrary length data:

event_shape = (1, obs_dim)  # homogeneous + homogeneous case

References:

[1] Simo Sarkka, Angel F. Garcia-Fernandez (2019): “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
[2] F. J. Giron and J. C. Rojano (1994): “Bayesian Kalman filtering with elliptically contoured errors”
[3] Filip Tronarp, Toni Karvonen, and Simo Sarkka (2019): “Student’s t-filters for noise scale estimation” https://users.aalto.fi/~ssarkka/pub/SPL2019.pdf

Variables:

hidden_dim (int) – The dimension of the hidden state.
obs_dim (int) – The dimension of the observed state.

Parameters:

scale_dist (Gamma) – Prior of the mixing distribution.
initial_dist (MultivariateNormal) – A distribution with unit scale mixing over initial states. This should have batch_shape broadcastable to self.batch_shape. This should have event_shape (hidden_dim,).
transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, hidden_dim) where the rightmost dims are ordered (old, new).
transition_dist (MultivariateNormal) – A process noise distribution with unit scale mixing. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (hidden_dim,).
observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, obs_dim).
observation_dist (MultivariateNormal) – An observation noise distribution with unit scale mixing. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (obs_dim,).
duration (int) – Optional size of the time axis event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints = {}¶

expand(batch_shape, _instance=None)[source]¶

filter(value)[source]¶

Compute posteriors over the multiplier and the final state given a sequence of observations. The posterior is a pair of Gamma and MultivariateNormal distributions (i.e. a GammaGaussian instance).

Parameters:	value (Tensor) – A sequence of observations.
Returns:	A pair of posterior distributions over the mixing and the latent state at the final time step.
Return type:	a tuple of ~pyro.distributions.Gamma and ~pyro.distributions.MultivariateNormal

log_prob(value)[source]¶

support = IndependentConstraint(Real(), 2)¶

GammaPoisson¶

class GammaPoisson(concentration, rate, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Compound distribution comprising of a gamma-poisson pair, also referred to as a gamma-poisson mixture. The rate parameter for the Poisson distribution is unknown and randomly drawn from a Gamma distribution.

Note

This can be treated as an alternate parametrization of the NegativeBinomial (total_count, probs) distribution, with concentration = total_count and rate = (1 - probs) / probs.

Parameters:	or torch.Tensor concentration (float) – shape parameter (alpha) of the Gamma distribution. or torch.Tensor rate (float) – rate parameter (beta) for the Gamma distribution.

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}¶

concentration¶

expand(batch_shape, _instance=None)[source]¶

log_prob(value)[source]¶

mean¶

rate¶

sample(sample_shape=())[source]¶

support = IntegerGreaterThan(lower_bound=0)¶

variance¶

GaussianHMM¶

class GaussianHMM(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶

Bases: pyro.distributions.hmm.HiddenMarkovModel

Hidden Markov Model with Gaussians for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure log_prob() is differentiable.

This corresponds to the generative model:

z = initial_distribution.sample()
x = []
for t in range(num_events):
    z = z @ transition_matrix + transition_dist.sample()
    x.append(z @ observation_matrix + observation_dist.sample())

The event_shape of this distribution includes time on the left:

event_shape = (num_steps,) + observation_dist.event_shape

This distribution supports any combination of homogeneous/heterogeneous time dependency of transition_dist and observation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape with num_steps = 1, allowing log_prob() to work with arbitrary length data:

event_shape = (1, obs_dim)  # homogeneous + homogeneous case

References:

[1] Simo Sarkka, Angel F. Garcia-Fernandez (2019): “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf

Variables:

hidden_dim (int) – The dimension of the hidden state.
obs_dim (int) – The dimension of the observed state.

Parameters:

initial_dist (MultivariateNormal) – A distribution over initial states. This should have batch_shape broadcastable to self.batch_shape. This should have event_shape (hidden_dim,).
transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, hidden_dim) where the rightmost dims are ordered (old, new).
transition_dist (MultivariateNormal) – A process noise distribution. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (hidden_dim,).
observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, obs_dim).
observation_dist (MultivariateNormal or Normal) – An observation noise distribution. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (obs_dim,).
duration (int) – Optional size of the time axis event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints = {}¶

conjugate_update(other)[source]¶

EXPERIMENTAL Creates an updated GaussianHMM fusing information from another compatible distribution.

This should satisfy:

fg, log_normalizer = f.conjugate_update(g)
assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer

Parameters:	other (MultivariateNormal or Normal) – A distribution representing `p(data\|self.probs)` but normalized over `self.probs` rather than `data`.
Returns:	a pair `(updated,log_normalizer)` where `updated` is an updated `GaussianHMM` , and `log_normalizer` is a `Tensor` representing the normalization factor.

expand(batch_shape, _instance=None)[source]¶

filter(value)[source]¶

Compute posterior over final state given a sequence of observations.

Parameters:	value (Tensor) – A sequence of observations.
Returns:	A posterior distribution over latent states at the final time step. `result` can then be used as `initial_dist` in a sequential Pyro model for prediction.
Return type:	MultivariateNormal

has_rsample = True¶

log_prob(value)[source]¶

prefix_condition(data)[source]¶

EXPERIMENTAL Given self has event_shape == (t+f, d) and data x of shape batch_shape + (t, d), compute a conditional distribution of event_shape (f, d). Typically t is the number of training time steps, f is the number of forecast time steps, and d is the data dimension.

Parameters:	data (Tensor) – data of dimension at least 2.

rsample(sample_shape=torch.Size([]))[source]¶

rsample_posterior(value, sample_shape=torch.Size([]))[source]¶: EXPERIMENTAL Sample from the latent state conditioned on observation.

support = IndependentConstraint(Real(), 2)¶

GaussianMRF¶

class GaussianMRF(initial_dist, transition_dist, observation_dist, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Temporal Markov Random Field with Gaussian factors for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure log_prob() is differentiable.

The event_shape of this distribution includes time on the left:

event_shape = (num_steps,) + observation_dist.event_shape

This distribution supports any combination of homogeneous/heterogeneous time dependency of transition_dist and observation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape with num_steps = 1, allowing log_prob() to work with arbitrary length data:

event_shape = (1, obs_dim)  # homogeneous + homogeneous case

References:

[1] Simo Sarkka, Angel F. Garcia-Fernandez (2019): “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf

Variables:

hidden_dim (int) – The dimension of the hidden state.
obs_dim (int) – The dimension of the observed state.

Parameters:

initial_dist (MultivariateNormal) – A distribution over initial states. This should have batch_shape broadcastable to self.batch_shape. This should have event_shape (hidden_dim,).
transition_dist (MultivariateNormal) – A joint distribution factor over a pair of successive time steps. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (hidden_dim + hidden_dim,) (old+new).
observation_dist (MultivariateNormal) – A joint distribution factor over a hidden and an observed state. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (hidden_dim + obs_dim,).

arg_constraints = {}¶

expand(batch_shape, _instance=None)[source]¶

log_prob(value)[source]¶

support¶

GaussianScaleMixture¶

class GaussianScaleMixture(coord_scale, component_logits, component_scale)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Mixture of Normal distributions with zero mean and diagonal covariance matrices.

That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with zero mean and a D-dimensional diagonal covariance matrix. The K different covariance matrices are controlled by the parameters coord_scale and component_scale. That is, the covariance matrix of the k’th component is given by

Sigma_ii = (component_scale_k * coord_scale_i) ** 2 (i = 1, …, D)

where component_scale_k is a positive scale factor and coord_scale_i are positive scale parameters shared between all K components. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution. This distribution does not currently support batched parameters.

See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research.

[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856

Note that this distribution supports both even and odd dimensions, but the former should be more a bit higher precision, since it doesn’t use any erfs in the backward call. Also note that this distribution does not support D = 1.

Parameters:	coord_scale (torch.tensor) – D-dimensional vector of scales component_logits (torch.tensor) – K-dimensional vector of logits component_scale (torch.tensor) – K-dimensional vector of scale multipliers

arg_constraints = {'component_logits': Real(), 'component_scale': GreaterThan(lower_bound=0.0), 'coord_scale': GreaterThan(lower_bound=0.0)}¶

has_rsample = True¶

log_prob(value)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

ImproperUniform¶

class ImproperUniform(support, batch_shape, event_shape)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Improper distribution with zero log_prob() and undefined sample().

This is useful for transforming a model from generative dag form to factor graph form for use in HMC. For example the following are equal in distribution:

# Version 1. a generative dag
x = pyro.sample("x", Normal(0, 1))
y = pyro.sample("y", Normal(x, 1))
z = pyro.sample("z", Normal(y, 1))

# Version 2. a factor graph
xyz = pyro.sample("xyz", ImproperUniform(constraints.real, (), (3,)))
x, y, z = xyz.unbind(-1)
pyro.sample("x", Normal(0, 1), obs=x)
pyro.sample("y", Normal(x, 1), obs=y)
pyro.sample("z", Normal(y, 1), obs=z)

Note this distribution errors when sample() is called. To create a similar distribution that instead samples from a specified distribution consider using .mask(False) as in:

xyz = dist.Normal(0, 1).expand([3]).to_event(1).mask(False)

Parameters:	support (Constraint) – The support of the distribution. batch_shape (torch.Size) – The batch shape. event_shape (torch.Size) – The event shape.

arg_constraints = {}¶

expand(batch_shape, _instance=None)[source]¶

log_prob(value)[source]¶

sample(sample_shape=torch.Size([]))[source]¶

support¶

IndependentHMM¶

class IndependentHMM(base_dist)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Wrapper class to treat a batch of independent univariate HMMs as a single multivariate distribution. This converts distribution shapes as follows:

	.batch_shape	.event_shape
base_dist	shape + (obs_dim,)	(duration, 1)
result	shape	(duration, obs_dim)

Parameters:	base_dist (HiddenMarkovModel) – A base hidden Markov model instance.

arg_constraints = {}¶

duration¶

expand(batch_shape, _instance=None)[source]¶

has_rsample¶

log_prob(value)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

support¶

InverseGamma¶

class InverseGamma(concentration, rate, validate_args=None)[source]¶

Bases: pyro.distributions.torch.TransformedDistribution

Creates an inverse-gamma distribution parameterized by concentration and rate.

X ~ Gamma(concentration, rate) Y = 1/X ~ InverseGamma(concentration, rate)

Parameters:	concentration (torch.Tensor) – the concentration parameter (i.e. alpha). rate (torch.Tensor) – the rate parameter (i.e. beta).

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}¶

concentration¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

rate¶

support = GreaterThan(lower_bound=0.0)¶

LinearHMM¶

class LinearHMM(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶

Bases: pyro.distributions.hmm.HiddenMarkovModel

Hidden Markov Model with linear dynamics and observations and arbitrary noise for initial, transition, and observation distributions. Each of those distributions can be e.g. MultivariateNormal or Independent of Normal, StudentT, or Stable . Additionally the observation distribution may be constrained, e.g. LogNormal

This corresponds to the generative model:

z = initial_distribution.sample()
x = []
for t in range(num_events):
    z = z @ transition_matrix + transition_dist.sample()
    y = z @ observation_matrix + obs_base_dist.sample()
    x.append(obs_transform(y))

where observation_dist is split into obs_base_dist and an optional obs_transform (defaulting to the identity).

This implements a reparameterized rsample() method but does not implement a log_prob() method. Derived classes may implement log_prob() .

Inference without log_prob() can be performed using either reparameterization with LinearHMMReparam or likelihood-free algorithms such as EnergyDistance . Note that while stable processes generally require a common shared stability parameter \(\alpha\) , this distribution and the above inference algorithms allow heterogeneous stability parameters.

The event_shape of this distribution includes time on the left:

event_shape = (num_steps,) + observation_dist.event_shape

This distribution supports any combination of homogeneous/heterogeneous time dependency of transition_dist and observation_dist. However at least one of the distributions or matrices must be expanded to contain the time dimension.

Variables:

hidden_dim (int) – The dimension of the hidden state.
obs_dim (int) – The dimension of the observed state.

Parameters:

initial_dist – A distribution over initial states. This should have batch_shape broadcastable to self.batch_shape. This should have event_shape (hidden_dim,).
transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, hidden_dim) where the rightmost dims are ordered (old, new).
transition_dist – A distribution over process noise. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (hidden_dim,).
observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to self.batch_shape + (num_steps, hidden_dim, obs_dim).
observation_dist – A observation noise distribution. This should have batch_shape broadcastable to self.batch_shape + (num_steps,). This should have event_shape (obs_dim,).
duration (int) – Optional size of the time axis event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints = {}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

log_prob(value)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

support¶

LKJ¶

class LKJ(dim, concentration=1.0, validate_args=None)[source]¶

Bases: pyro.distributions.torch.TransformedDistribution

LKJ distribution for correlation matrices. The distribution is controlled by concentration parameter \(\eta\) to make the probability of the correlation matrix \(M\) propotional to \(\det(M)^{\eta - 1}\). Because of that, when concentration == 1, we have a uniform distribution over correlation matrices.

When concentration > 1, the distribution favors samples with large large determinent. This is useful when we know a priori that the underlying variables are not correlated. When concentration < 1, the distribution favors samples with small determinent. This is useful when we know a priori that some underlying variables are correlated.

Parameters:	dimension (int) – dimension of the matrices concentration (ndarray) – concentration/shape parameter of the distribution (often referred to as eta)

References

[1] Generating random correlation matrices based on vines and extended onion method, Daniel Lewandowski, Dorota Kurowicka, Harry Joe

arg_constraints = {'concentration': GreaterThan(lower_bound=0.0)}¶

expand(batch_shape, _instance=None)[source]¶

mean¶

support = CorrMatrix()¶

LKJCorrCholesky¶

class LKJCorrCholesky(d, eta, validate_args=None)[source]¶: Bases: pyro.distributions.torch.LKJCholesky

MaskedDistribution¶

class MaskedDistribution(base_dist, mask)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Masks a distribution by a boolean tensor that is broadcastable to the distribution’s batch_shape.

In the special case mask is False, computation of log_prob() , score_parts() , and kl_divergence() is skipped, and constant zero values are returned instead.

Parameters:	mask (torch.Tensor or bool) – A boolean or boolean-valued tensor.

arg_constraints = {}¶

conjugate_update(other)[source]¶: EXPERIMENTAL.

enumerate_support(expand=True)[source]¶

expand(batch_shape, _instance=None)[source]¶

has_enumerate_support¶

has_rsample¶

log_prob(value)[source]¶

mean¶

rsample(sample_shape=torch.Size([]))[source]¶

sample(sample_shape=torch.Size([]))[source]¶

score_parts(value)[source]¶

support¶

variance¶

MaskedMixture¶

class MaskedMixture(mask, component0, component1, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

A masked deterministic mixture of two distributions.

This is useful when the mask is sampled from another distribution, possibly correlated across the batch. Often the mask can be marginalized out via enumeration.

Example:

change_point = pyro.sample("change_point",
                           dist.Categorical(torch.ones(len(data) + 1)),
                           infer={'enumerate': 'parallel'})
mask = torch.arange(len(data), dtype=torch.long) >= changepoint
with pyro.plate("data", len(data)):
    pyro.sample("obs", MaskedMixture(mask, dist1, dist2), obs=data)

Parameters:	mask (torch.Tensor) – A boolean tensor toggling between `component0` and `component1`. component0 (pyro.distributions.TorchDistribution) – a distribution for batch elements `mask == False`. component1 (pyro.distributions.TorchDistribution) – a distribution for batch elements `mask == True`.

arg_constraints = {}¶

expand(batch_shape)[source]¶

has_rsample¶

log_prob(value)[source]¶

mean[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

sample(sample_shape=torch.Size([]))[source]¶

support¶

variance[source]¶

MixtureOfDiagNormals¶

class MixtureOfDiagNormals(locs, coord_scale, component_logits)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Mixture of Normal distributions with arbitrary means and arbitrary diagonal covariance matrices.

That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with a D-dimensional mean parameter and a D-dimensional diagonal covariance matrix. The K different component means are gathered into the K x D dimensional parameter locs and the K different scale parameters are gathered into the K x D dimensional parameter coord_scale. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution.

See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research. Note that this distribution does not support dimension D = 1.

[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856

Parameters:	locs (torch.Tensor) – K x D mean matrix coord_scale (torch.Tensor) – K x D scale matrix component_logits (torch.Tensor) – K-dimensional vector of softmax logits

arg_constraints = {'component_logits': Real(), 'coord_scale': GreaterThan(lower_bound=0.0), 'locs': Real()}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

log_prob(value)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

MixtureOfDiagNormalsSharedCovariance¶

class MixtureOfDiagNormalsSharedCovariance(locs, coord_scale, component_logits)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Mixture of Normal distributions with diagonal covariance matrices.

That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with a D-dimensional mean parameter loc and a D-dimensional diagonal covariance matrix specified by a scale parameter coord_scale. The K different component means are gathered into the parameter locs and the scale parameter is shared between all K components. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution.

See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research. Note that this distribution does not support dimension D = 1.

[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856

Parameters:	locs (torch.Tensor) – K x D mean matrix coord_scale (torch.Tensor) – shared D-dimensional scale vector component_logits (torch.Tensor) – K-dimensional vector of softmax logits

arg_constraints = {'component_logits': Real(), 'coord_scale': GreaterThan(lower_bound=0.0), 'locs': Real()}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

log_prob(value)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

MultivariateStudentT¶

class MultivariateStudentT(df, loc, scale_tril, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Creates a multivariate Student’s t-distribution parameterized by degree of freedom df, mean loc and scale scale_tril.

Parameters:	df (Tensor) – degrees of freedom loc (Tensor) – mean of the distribution scale_tril (Tensor) – scale of the distribution, which is a lower triangular matrix with positive diagonal entries

arg_constraints = {'df': GreaterThan(lower_bound=0.0), 'loc': IndependentConstraint(Real(), 1), 'scale_tril': LowerCholesky()}¶

covariance_matrix[source]¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

static infer_shapes(df, loc, scale_tril)[source]¶

log_prob(value)[source]¶

mean¶

precision_matrix[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

scale_tril[source]¶

support = IndependentConstraint(Real(), 1)¶

variance¶

OMTMultivariateNormal¶

class OMTMultivariateNormal(loc, scale_tril)[source]¶

Bases: pyro.distributions.torch.MultivariateNormal

Multivariate normal (Gaussian) distribution with OMT gradients w.r.t. both parameters. Note the gradient computation w.r.t. the Cholesky factor has cost O(D^3), although the resulting gradient variance is generally expected to be lower.

A distribution over vectors in which all the elements have a joint Gaussian density.

Parameters:	loc (torch.Tensor) – Mean. scale_tril (torch.Tensor) – Cholesky of Covariance matrix.

arg_constraints = {'loc': Real(), 'scale_tril': LowerTriangular()}¶

rsample(sample_shape=torch.Size([]))[source]¶

OneOneMatching¶

class OneOneMatching(logits, *, bp_iters=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Random perfect matching from N sources to N destinations where each source matches exactly one destination and each destination matches exactly one source.

Samples are represented as long tensors of shape (N,) taking values in {0,...,N-1} and satisfying the above one-one constraint. The log probability of a sample v is the sum of edge logits, up to the log partition function log Z:

\[\log p(v) = \sum_s \text{logits}[s, v[s]] - \log Z\]

Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the bp_iters argument. The log_partition_function() and log_prob() methods use a Bethe approximation [1,2,3,4].

References:

[1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008): “Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
[2] Bert Huang, Tony Jebara (2009): “Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
[3] Pascal O. Vontobel (2012): “The Bethe Permanent of a Non-Negative Matrix” https://arxiv.org/pdf/1107.4196.pdf
[4] M Chertkov, AB Yedidia (2013): “Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf

Parameters:	logits (Tensor) – An `(N, N)`-shaped tensor of edge logits. bp_iters (int) – Optional number of belief propagation iterations. If unspecified or `None` expensive exact algorithms will be used.

arg_constraints = {'logits': Real()}¶

enumerate_support(expand=True)[source]¶

has_enumerate_support = True¶

log_partition_function[source]¶

log_prob(value)[source]¶

mode()[source]¶: Computes a maximum probability matching.

Note

This requires the lap package and runs on CPU.

sample(sample_shape=torch.Size([]))[source]¶

support¶

OneTwoMatching¶

class OneTwoMatching(logits, *, bp_iters=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Random matching from 2*N sources to N destinations where each source matches exactly one destination and each destination matches exactly two sources.

Samples are represented as long tensors of shape (2*N,) taking values in {0,...,N-1} and satisfying the above one-two constraint. The log probability of a sample v is the sum of edge logits, up to the log partition function log Z:

\[\log p(v) = \sum_s \text{logits}[s, v[s]] - \log Z\]

Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the bp_iters argument. The log_partition_function() and log_prob() methods use a Bethe approximation [1,2,3,4].

References:

[1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008): “Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
[2] Bert Huang, Tony Jebara (2009): “Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
[3] Pascal O. Vontobel (2012): “The Bethe Permanent of a Non-Negative Matrix” https://arxiv.org/pdf/1107.4196.pdf
[4] M Chertkov, AB Yedidia (2013): “Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf

Parameters:	logits (Tensor) – An `(2 * N, N)`-shaped tensor of edge logits. bp_iters (int) – Optional number of belief propagation iterations. If unspecified or `None` expensive exact algorithms will be used.

arg_constraints = {'logits': Real()}¶

enumerate_support(expand=True)[source]¶

has_enumerate_support = True¶

log_partition_function[source]¶

log_prob(value)[source]¶

mode()[source]¶: Computes a maximum probability matching.

Note

This requires the lap package and runs on CPU.

sample(sample_shape=torch.Size([]))[source]¶

support¶

OrderedLogistic¶

class OrderedLogistic(predictor, cutpoints, validate_args=None)[source]¶

Bases: pyro.distributions.torch.Categorical

Alternative parametrization of the distribution over a categorical variable.

Instead of the typical parametrization of a categorical variable in terms of the probability mass of the individual categories p, this provides an alternative that is useful in specifying ordered categorical models. This accepts a vector of cutpoints which are an ordered vector of real numbers denoting baseline cumulative log-odds of the individual categories, and a model vector predictor which modifies the baselines for each sample individually.

These cumulative log-odds are then transformed into a discrete cumulative probability distribution, that is finally differenced to return the probability mass matrix p that specifies the categorical distribution.

Parameters:

predictor (Tensor) – A tensor of predictor variables of arbitrary shape. The output shape of non-batched samples from this distribution will be the same shape as predictor.
cutpoints (Tensor) – A tensor of cutpoints that are used to determine the cumulative probability of each entry in predictor belonging to a given category. The first cutpoints.ndim-1 dimensions must be broadcastable to predictor, and the -1 dimension is monotonically increasing.

arg_constraints = {'cutpoints': OrderedVector(), 'predictor': Real()}¶

expand(batch_shape, _instance=None)[source]¶

ProjectedNormal¶

class ProjectedNormal(concentration, *, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Projected isotropic normal distribution of arbitrary dimension.

This distribution over directional data is qualitatively similar to the von Mises and von Mises-Fisher distributions, but permits tractable variational inference via reparametrized gradients.

To use this distribution with autoguides, use poutine.reparam with a ProjectedNormalReparam reparametrizer in the model, e.g.:

@poutine.reparam(config={"direction": ProjectedNormalReparam()})
def model():
    direction = pyro.sample("direction",
                            ProjectedNormal(torch.zeros(3)))
    ...

Note

This implements log_prob() only for dimensions {2,3}.

[1] D. Hernandez-Stumpfhauser, F.J. Breidt, M.J. van der Woerd (2017): “The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference” https://projecteuclid.org/euclid.ba/1453211962

arg_constraints = {'concentration': IndependentConstraint(Real(), 1)}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

static infer_shapes(concentration)[source]¶

log_prob(value)[source]¶

mean¶: Note this is the mean in the sense of a centroid in the submanifold that minimizes expected squared geodesic distance.

mode¶

rsample(sample_shape=torch.Size([]))[source]¶

support = Sphere¶

RelaxedBernoulliStraightThrough¶

class RelaxedBernoulliStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch.RelaxedBernoulli

An implementation of RelaxedBernoulli with a straight-through gradient estimator.

This distribution has the following properties:

The samples returned by the rsample() method are discrete/quantized.
The log_prob() method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.
In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.

References:

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,: Chris J. Maddison, Andriy Mnih, Yee Whye Teh
[2] Categorical Reparameterization with Gumbel-Softmax,: Eric Jang, Shixiang Gu, Ben Poole

log_prob(value)[source]¶: See pyro.distributions.torch.RelaxedBernoulli.log_prob()

rsample(sample_shape=torch.Size([]))[source]¶: See pyro.distributions.torch.RelaxedBernoulli.rsample()

RelaxedOneHotCategoricalStraightThrough¶

class RelaxedOneHotCategoricalStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch.RelaxedOneHotCategorical

An implementation of RelaxedOneHotCategorical with a straight-through gradient estimator.

This distribution has the following properties:

The samples returned by the rsample() method are discrete/quantized.
The log_prob() method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.
In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.

References:

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,: Chris J. Maddison, Andriy Mnih, Yee Whye Teh
[2] Categorical Reparameterization with Gumbel-Softmax,: Eric Jang, Shixiang Gu, Ben Poole

log_prob(value)[source]¶: See pyro.distributions.torch.RelaxedOneHotCategorical.log_prob()

rsample(sample_shape=torch.Size([]))[source]¶: See pyro.distributions.torch.RelaxedOneHotCategorical.rsample()

Rejector¶

class Rejector(propose, log_prob_accept, log_scale, *, batch_shape=None, event_shape=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Rejection sampled distribution given an acceptance rate function.

Parameters:	propose (Distribution) – A proposal distribution that samples batched proposals via `propose()`. `rsample()` supports a `sample_shape` arg only if `propose()` supports a `sample_shape` arg. log_prob_accept (callable) – A callable that inputs a batch of proposals and returns a batch of log acceptance probabilities. log_scale – Total log probability of acceptance.

arg_constraints = {}¶

has_rsample = True¶

log_prob(x)[source]¶

rsample(sample_shape=torch.Size([]))[source]¶

score_parts(x)[source]¶

SpanningTree¶

class SpanningTree(edge_logits, sampler_options=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Distribution over spanning trees on a fixed number V of vertices.

A tree is represented as torch.LongTensor edges of shape (V-1,2) satisfying the following properties:

The edges constitute a tree, i.e. are connected and cycle free.
Each edge (v1,v2) = edges[e] is sorted, i.e. v1 < v2.
The entire tensor is sorted in colexicographic order.

Use validate_edges() to verify edges are correctly formed.

The edge_logits tensor has one entry for each of the V*(V-1)//2 edges in the complete graph on V vertices, where edges are each sorted and the edge order is colexicographic:

(0,1), (0,2), (1,2), (0,3), (1,3), (2,3), (0,4), (1,4), (2,4), ...

This ordering corresponds to the size-independent pairing function:

k = v1 + v2 * (v2 - 1) // 2

where k is the rank of the edge (v1,v2) in the complete graph. To convert a matrix of edge logits to the linear representation used here:

assert my_matrix.shape == (V, V)
i, j = make_complete_graph(V)
edge_logits = my_matrix[i, j]

Parameters:

edge_logits (torch.Tensor) – A tensor of length V*(V-1)//2 containing logits (aka negative energies) of all edges in the complete graph on V vertices. See above comment for edge ordering.
sampler_options (dict) – An optional dict of sampler options including: mcmc_steps defaulting to a single MCMC step (which is pretty good); initial_edges defaulting to a cheap approximate sample; backend one of “python” or “cpp”, defaulting to “python”.

arg_constraints = {'edge_logits': Real()}¶

edge_mean¶

Computes marginal probabilities of each edge being active.

Note

This is similar to other distributions’ .mean() method, but with a different shape because this distribution’s values are not encoded as binary matrices.

Returns:	A symmetric square `(V,V)`-shaped matrix with values in `[0,1]` denoting the marginal probability of each edge being in a sampled value.
Return type:	Tensor

enumerate_support(expand=True)[source]¶: This is implemented for trees with up to 6 vertices (and 5 edges).

has_enumerate_support = True¶

log_partition_function[source]¶

log_prob(edges)[source]¶

mode¶

Returns:	The maximum weight spanning tree.
Return type:	Tensor

sample(sample_shape=torch.Size([]))[source]¶

This sampler is implemented using MCMC run for a small number of steps after being initialized by a cheap approximate sampler. This sampler is approximate and cubic time. This is faster than the classic Aldous-Broder sampler [1,2], especially for graphs with large mixing time. Recent research [3,4] proposes samplers that run in sub-matrix-multiply time but are more complex to implement.

References

[1] Generating random spanning trees: Andrei Broder (1989)
[2] The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees,: David J. Aldous (1990)
[3] Sampling Random Spanning Trees Faster than Matrix Multiplication,: David Durfee, Rasmus Kyng, John Peebles, Anup B. Rao, Sushant Sachdeva (2017) https://arxiv.org/abs/1611.07451
[4] An almost-linear time algorithm for uniform random spanning tree generation,: Aaron Schild (2017) https://arxiv.org/abs/1711.06455

support = IntegerGreaterThan(lower_bound=0)¶

validate_edges(edges)[source]¶

Validates a batch of edges tensors, as returned by sample() or enumerate_support() or as input to log_prob().

Parameters:	edges (torch.LongTensor) – A batch of edges.
Raises:	ValueError
Returns:	None

Stable¶

class Stable(stability, skew, scale=1.0, loc=0.0, coords='S0', validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Levy \(\alpha\)-stable distribution. See [1] for a review.

This uses Nolan’s parametrization [2] of the loc parameter, which is required for continuity and differentiability. This corresponds to the notation \(S^0_\alpha(\beta,\sigma,\mu_0)\) of [1], where \(\alpha\) = stability, \(\beta\) = skew, \(\sigma\) = scale, and \(\mu_0\) = loc. To instead use the S parameterization as in scipy, pass coords="S", but BEWARE this is discontinuous at stability=1 and has poor geometry for inference.

This implements a reparametrized sampler rsample() , but does not implement log_prob() . Inference can be performed using either likelihood-free algorithms such as EnergyDistance, or reparameterization via the reparam() handler with one of the reparameterizers LatentStableReparam , SymmetricStableReparam , or StableReparam e.g.:

with poutine.reparam(config={"x": StableReparam()}):
    pyro.sample("x", Stable(stability, skew, scale, loc))

[1] S. Borak, W. Hardle, R. Weron (2005).: Stable distributions. https://edoc.hu-berlin.de/bitstream/handle/18452/4526/8.pdf
[2] J.P. Nolan (1997).: Numerical calculation of stable densities and distribution functions.
[3] Rafal Weron (1996).: On the Chambers-Mallows-Stuck Method for Simulating Skewed Stable Random Variables.
[4] J.P. Nolan (2017).: Stable Distributions: Models for Heavy Tailed Data. http://fs2.american.edu/jpnolan/www/stable/chap1.pdf

Parameters:

stability (Tensor) – Levy stability parameter \(\alpha\in(0,2]\) .
skew (Tensor) – Skewness \(\beta\in[-1,1]\) .
scale (Tensor) – Scale \(\sigma > 0\) . Defaults to 1.
loc (Tensor) – Location \(\mu_0\) when using Nolan’s S0 parametrization [2], or \(\mu\) when using the S parameterization. Defaults to 0.
coords (str) – Either “S0” (default) to use Nolan’s continuous S0 parametrization, or “S” to use the discontinuous parameterization.

arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0), 'skew': Interval(lower_bound=-1, upper_bound=1), 'stability': Interval(lower_bound=0, upper_bound=2)}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = True¶

log_prob(value)[source]¶

mean¶

rsample(sample_shape=torch.Size([]))[source]¶

support = Real()¶

variance¶

TruncatedPolyaGamma¶

class TruncatedPolyaGamma(prototype, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

This is a PolyaGamma(1, 0) distribution truncated to have finite support in the interval (0, 2.5). See [1] for details. As a consequence of the truncation the log_prob method is only accurate to about six decimal places. In addition the provided sampler is a rough approximation that is only meant to be used in contexts where sample accuracy is not important (e.g. in initialization). Broadly, this implementation is only intended for usage in cases where good approximations of the log_prob are sufficient, as is the case e.g. in HMC.

Parameters:	prototype (tensor) – A prototype tensor of arbitrary shape used to determine the dtype and device returned by sample and log_prob.

References

[1] ‘Bayesian inference for logistic models using Polya-Gamma latent variables’: Nicholas G. Polson, James G. Scott, Jesse Windle.

arg_constraints = {}¶

expand(batch_shape, _instance=None)[source]¶

has_rsample = False¶

log_prob(value)[source]¶

num_gamma_variates = 8¶

num_log_prob_terms = 7¶

sample(sample_shape=())[source]¶

support = Interval(lower_bound=0.0, upper_bound=2.5)¶

truncation_point = 2.5¶

Unit¶

class Unit(log_factor, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Trivial nonnormalized distribution representing the unit type.

The unit type has a single value with no data, i.e. value.numel() == 0.

This is used for pyro.factor() statements.

arg_constraints = {'log_factor': Real()}¶

expand(batch_shape, _instance=None)[source]¶

log_prob(value)[source]¶

sample(sample_shape=torch.Size([]))[source]¶

support = Real()¶

VonMises3D¶

class VonMises3D(concentration, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Spherical von Mises distribution.

This implementation combines the direction parameter and concentration parameter into a single combined parameter that contains both direction and magnitude. The value arg is represented in cartesian coordinates: it must be a normalized 3-vector that lies on the 2-sphere.

See VonMises for a 2D polar coordinate cousin of this distribution. See projected_normal for a qualitatively similar distribution but implementing more functionality.

Currently only log_prob() is implemented.

Parameters:	concentration (torch.Tensor) – A combined location-and-concentration vector. The direction of this vector is the location, and its magnitude is the concentration.

arg_constraints = {'concentration': Real()}¶

expand(batch_shape)[source]¶

log_prob(value)[source]¶

support = Sphere¶

ZeroInflatedDistribution¶

class ZeroInflatedDistribution(base_dist, *, gate=None, gate_logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.torch_distribution.TorchDistribution

Generic Zero Inflated distribution.

This can be used directly or can be used as a base class as e.g. for ZeroInflatedPoisson and ZeroInflatedNegativeBinomial.

Parameters:	base_dist (TorchDistribution) – the base distribution. gate (torch.Tensor) – probability of extra zeros given via a Bernoulli distribution. gate_logits (torch.Tensor) – logits of extra zeros given via a Bernoulli distribution.

arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real()}¶

expand(batch_shape, _instance=None)[source]¶

gate[source]¶

gate_logits[source]¶

log_prob(value)[source]¶

mean[source]¶

sample(sample_shape=torch.Size([]))[source]¶

support¶

variance[source]¶

ZeroInflatedNegativeBinomial¶

class ZeroInflatedNegativeBinomial(total_count, *, probs=None, logits=None, gate=None, gate_logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.zero_inflated.ZeroInflatedDistribution

A Zero Inflated Negative Binomial distribution.

Parameters:	total_count (float or torch.Tensor) – non-negative number of negative Bernoulli trials. probs (torch.Tensor) – Event probabilities of success in the half open interval [0, 1). logits (torch.Tensor) – Event log-odds for probabilities of success. gate (torch.Tensor) – probability of extra zeros. gate_logits (torch.Tensor) – logits of extra zeros.

arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}¶

logits¶

probs¶

support = IntegerGreaterThan(lower_bound=0)¶

total_count¶

ZeroInflatedPoisson¶

class ZeroInflatedPoisson(rate, *, gate=None, gate_logits=None, validate_args=None)[source]¶

Bases: pyro.distributions.zero_inflated.ZeroInflatedDistribution

A Zero Inflated Poisson distribution.

Parameters:	rate (torch.Tensor) – rate of poisson distribution. gate (torch.Tensor) – probability of extra zeros. gate_logits (torch.Tensor) – logits of extra zeros.

arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'rate': GreaterThan(lower_bound=0.0)}¶

rate¶

support = IntegerGreaterThan(lower_bound=0)¶

Transforms¶

ConditionalTransform¶

class ConditionalTransform[source]¶

Bases: abc.ABC

condition(context)[source]¶

Return type:	torch.distributions.Transform

CholeskyTransform¶

class CholeskyTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Transform via the mapping \(y = cholesky(x)\), where x is a positive definite matrix.

bijective = True¶

codomain = LowerCholesky()¶

domain = PositiveDefinite()¶

log_abs_det_jacobian(x, y)[source]¶

CorrLCholeskyTransform¶

class CorrLCholeskyTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Transforms a vector into the cholesky factor of a correlation matrix.

The input should have shape [batch_shape] + [d * (d-1)/2]. The output will have shape [batch_shape] + [d, d].

References:

[1] Cholesky Factors of Correlation Matrices. Stan Reference Manual v2.18, Section 10.12.

bijective = True¶

codomain = CorrCholesky()¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶

CorrMatrixCholeskyTransform¶

class CorrMatrixCholeskyTransform(cache_size=0)[source]¶

Bases: pyro.distributions.transforms.cholesky.CholeskyTransform

Transform via the mapping \(y = cholesky(x)\), where x is a correlation matrix.

bijective = True¶

codomain = CorrCholesky()¶

domain = CorrMatrix()¶

log_abs_det_jacobian(x, y)[source]¶

DiscreteCosineTransform¶

class DiscreteCosineTransform(dim=-1, smooth=0.0, cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Discrete Cosine Transform of type-II.

This uses dct() and idct() to compute orthonormal DCT and inverse DCT transforms. The jacobian is 1.

Parameters:

dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right.
smooth (float) – Smoothing parameter. When 0, this transforms white noise to white noise; when 1 this transforms Brownian noise to to white noise; when -1 this transforms violet noise to white noise; etc. Any real number is allowed. https://en.wikipedia.org/wiki/Colors_of_noise.

bijective = True¶

codomain¶

domain¶

forward_shape(shape)[source]¶

inverse_shape(shape)[source]¶

log_abs_det_jacobian(x, y)[source]¶

with_cache(cache_size=1)[source]¶

ELUTransform¶

class ELUTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Bijective transform via the mapping \(y = \text{ELU}(x)\).

bijective = True¶

codomain = GreaterThan(lower_bound=0.0)¶

domain = Real()¶

log_abs_det_jacobian(x, y)[source]¶

sign = 1¶

HaarTransform¶

class HaarTransform(dim=-1, flip=False, cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Discrete Haar transform.

This uses haar_transform() and inverse_haar_transform() to compute (orthonormal) Haar and inverse Haar transforms. The jacobian is 1. For sequences with length T not a power of two, this implementation is equivalent to a block-structured Haar transform in which block sizes decrease by factors of one half from left to right.

Parameters:	dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right. flip (bool) – Whether to flip the time axis before applying the Haar transform. Defaults to false.

bijective = True¶

codomain¶

domain¶

forward_shape(shape)[source]¶

inverse_shape(shape)[source]¶

log_abs_det_jacobian(x, y)[source]¶

with_cache(cache_size=1)[source]¶

LeakyReLUTransform¶

class LeakyReLUTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Bijective transform via the mapping \(y = \text{LeakyReLU}(x)\).

bijective = True¶

codomain = GreaterThan(lower_bound=0.0)¶

domain = Real()¶

log_abs_det_jacobian(x, y)[source]¶

sign = 1¶

LowerCholeskyAffine¶

class LowerCholeskyAffine(loc, scale_tril, cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

A bijection of the form,

\(\mathbf{y} = \mathbf{L} \mathbf{x} + \mathbf{r}\)

where mathbf{L} is a lower triangular matrix and mathbf{r} is a vector.

Parameters:	loc (torch.tensor) – the fixed D-dimensional vector to shift the input by. scale_tril (torch.tensor) – the D x D lower triangular matrix used in the transformation.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian, i.e. log(abs(dy/dx)).

volume_preserving = False¶

with_cache(cache_size=1)[source]¶

Normalize¶

class Normalize(p=2, cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Safely project a vector onto the sphere wrt the p norm. This avoids the singularity at zero by mapping to the vector [1, 0, 0, ..., 0].

bijective = False¶

codomain = Sphere¶

domain = IndependentConstraint(Real(), 1)¶

with_cache(cache_size=1)[source]¶

OrderedTransform¶

class OrderedTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Transforms a real vector into an ordered vector.

Specifically, enforces monotonically increasing order on the last dimension of a given tensor via the transformation \(y_0 = x_0\), \(y_i = \sum_{1 \le j \le i} \exp(x_i)\)

bijective = True¶

codomain = OrderedVector()¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶

Permute¶

class Permute(permutation, *, dim=-1, cache_size=1)[source]¶

Bases: torch.distributions.transforms.Transform

A bijection that reorders the input dimensions, that is, multiplies the input by a permutation matrix. This is useful in between AffineAutoregressive transforms to increase the flexibility of the resulting distribution and stabilize learning. Whilst not being an autoregressive transform, the log absolute determinate of the Jacobian is easily calculable as 0. Note that reordering the input dimension between two layers of AffineAutoregressive is not equivalent to reordering the dimension inside the MADE networks that those IAFs use; using a Permute transform results in a distribution with more flexibility.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> from pyro.distributions.transforms import AffineAutoregressive, Permute
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> iaf1 = AffineAutoregressive(AutoRegressiveNN(10, [40]))
>>> ff = Permute(torch.randperm(10, dtype=torch.long))
>>> iaf2 = AffineAutoregressive(AutoRegressiveNN(10, [40]))
>>> flow_dist = dist.TransformedDistribution(base_dist, [iaf1, ff, iaf2])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:	permutation (torch.LongTensor) – a permutation ordering that is applied to the inputs. dim (int) – the tensor dimension to permute. This value must be negative and defines the event dim as abs(dim).

bijective = True¶

codomain¶

domain¶

inv_permutation[source]¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian, i.e. log(abs([dy_0/dx_0, …, dy_{N-1}/dx_{N-1}])). Note that this type of transform is not autoregressive, so the log Jacobian is not the sum of the previous expression. However, it turns out it’s always 0 (since the determinant is -1 or +1), and so returning a vector of zeros works.

volume_preserving = True¶

with_cache(cache_size=1)[source]¶

SoftplusLowerCholeskyTransform¶

class SoftplusLowerCholeskyTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Transform from unconstrained matrices to lower-triangular matrices with nonnegative diagonal entries. This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization.

codomain = LowerCholesky()¶

domain = IndependentConstraint(Real(), 2)¶

SoftplusTransform¶

class SoftplusTransform(cache_size=0)[source]¶

Bases: torch.distributions.transforms.Transform

Transform via the mapping \(\text{Softplus}(x) = \log(1 + \exp(x))\).

bijective = True¶

codomain = GreaterThan(lower_bound=0.0)¶

domain = Real()¶

log_abs_det_jacobian(x, y)[source]¶

sign = 1¶

TransformModules¶

AffineAutoregressive¶

class AffineAutoregressive(autoregressive_nn, log_scale_min_clip=-5.0, log_scale_max_clip=3.0, sigmoid_bias=2.0, stable=False)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of the bijective transform of Inverse Autoregressive Flow (IAF), using by default Eq (10) from Kingma Et Al., 2016,

\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\), and \(\sigma_t>0\).

If the stable keyword argument is set to True then the transformation used is,

\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1-\sigma_t)\odot\mu_t\)

where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = AffineAutoregressive(AutoRegressiveNN(10, [40]))
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse of the Bijector is required when, e.g., scoring the log density of a sample with TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling from TransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitrary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.

Parameters:

autoregressive_nn (callable) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).

References:

[1] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling. Improving Variational Inference with Inverse Autoregressive Flow. [arXiv:1606.04934]

[2] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]

[3] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. [arXiv:1502.03509]

autoregressive = True¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian

sign = 1¶

AffineCoupling¶

class AffineCoupling(split_dim, hypernet, *, dim=-1, log_scale_min_clip=-5.0, log_scale_max_clip=3.0)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of the affine coupling layer of RealNVP (Dinh et al., 2017) that uses the bijective transform,

\(\mathbf{y}_{1:d} = \mathbf{x}_{1:d}\) \(\mathbf{y}_{(d+1):D} = \mu + \sigma\odot\mathbf{x}_{(d+1):D}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, and \(\mu,\sigma\) are shift and translation parameters calculated as the output of a function inputting only \(\mathbf{x}_{1:d}\).

That is, the first \(d\) components remain unchanged, and the subsequent \(D-d\) are shifted and translated by a function of the previous components.

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import DenseNN
>>> input_dim = 10
>>> split_dim = 6
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim-split_dim, input_dim-split_dim]
>>> hypernet = DenseNN(split_dim, [10*input_dim], param_dims)
>>> transform = AffineCoupling(split_dim, hypernet)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse of the Bijector is required when, e.g., scoring the log density of a sample with TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling from TransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitary value is being scored, it will calculate it manually.

This is an operation that scales as O(1), i.e. constant in the input dimension. So in general, it is cheap to sample and score (an arbitrary value) from AffineCoupling.

Parameters:

split_dim (int) – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – a neural network whose forward call returns a real-valued mean and logit-scale as a tuple. The input should have final dimension split_dim and the output final dimension input_dim-split_dim for each member of the tuple.
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN

References:

[1] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. ICLR 2017.

bijective = True¶

codomain¶

domain¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log jacobian

BatchNorm¶

class BatchNorm(input_dim, momentum=0.1, epsilon=1e-05)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

A type of batch normalization that can be used to stabilize training in normalizing flows. The inverse operation is defined as

\(x = (y - \hat{\mu}) \oslash \sqrt{\hat{\sigma^2}} \otimes \gamma + \beta\)

that is, the standard batch norm equation, where \(x\) is the input, \(y\) is the output, \(\gamma,\beta\) are learnable parameters, and \(\hat{\mu}\)/\(\hat{\sigma^2}\) are smoothed running averages of the sample mean and variance, respectively. The constraint \(\gamma>0\) is enforced to ease calculation of the log-det-Jacobian term.

This is an element-wise transform, and when applied to a vector, learns two parameters (\(\gamma,\beta\)) for each dimension of the input.

When the module is set to training mode, the moving averages of the sample mean and variance are updated every time the inverse operator is called, e.g., when a normalizing flow scores a minibatch with the log_prob method.

Also, when the module is set to training mode, the sample mean and variance on the current minibatch are used in place of the smoothed averages, \(\hat{\mu}\) and \(\hat{\sigma^2}\), for the inverse operator. For this reason it is not the case that \(x=g(g^{-1}(x))\) during training, i.e., that the inverse operation is the inverse of the forward one.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> from pyro.distributions.transforms import AffineAutoregressive
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> iafs = [AffineAutoregressive(AutoRegressiveNN(10, [40])) for _ in range(2)]
>>> bn = BatchNorm(10)
>>> flow_dist = dist.TransformedDistribution(base_dist, [iafs[0], bn, iafs[1]])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:	input_dim (int) – the dimension of the input momentum (float) – momentum parameter for updating moving averages epsilon (float) – small number to add to variances to ensure numerical stability

References:

[1] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03167

[2] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density Estimation using Real NVP. In International Conference on Learning Representations, 2017. https://arxiv.org/abs/1605.08803

[3] George Papamakarios, Theo Pavlakou, and Iain Murray. Masked Autoregressive Flow for Density Estimation. In Neural Information Processing Systems, 2017. https://arxiv.org/abs/1705.07057

bijective = True¶

codomain = Real()¶

constrained_gamma¶

domain = Real()¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian, dx/dy

BlockAutoregressive¶

class BlockAutoregressive(input_dim, hidden_factors=[8, 8], activation='tanh', residual=None)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of Block Neural Autoregressive Flow (block-NAF) (De Cao et al., 2019) bijective transform. Block-NAF uses a similar transformation to deep dense NAF, building the autoregressive NN into the structure of the transform, in a sense.

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> naf = BlockAutoregressive(input_dim=10)
>>> pyro.module("my_naf", naf)  # doctest: +SKIP
>>> naf_dist = dist.TransformedDistribution(base_dist, [naf])
>>> naf_dist.sample()  # doctest: +SKIP

The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.

Parameters:

input_dim (int) – The dimensionality of the input and output variables.
hidden_factors (list) – Hidden layer i has hidden_factors[i] hidden units per input dimension. This corresponds to both \(a\) and \(b\) in De Cao et al. (2019). The elements of hidden_factors must be integers.
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
residual (string) – Type of residual connections to use. Choices are “None”, “normal” for \(\mathbf{y}+f(\mathbf{y})\), and “gated” for \(\alpha\mathbf{y} + (1 - \alpha\mathbf{y})\) for learnable parameter \(\alpha\).

References:

[1] Nicola De Cao, Ivan Titov, Wilker Aziz. Block Neural Autoregressive Flow. [arXiv:1904.04676]

autoregressive = True¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log jacobian

ConditionalAffineAutoregressive¶

class ConditionalAffineAutoregressive(autoregressive_nn, **kwargs)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

An implementation of the bijective transform of Inverse Autoregressive Flow (IAF) that conditions on an additional context variable and uses, by default, Eq (10) from Kingma Et Al., 2016,

\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\) and context \(\mathbf{z}\in\mathbb{R}^M\), and \(\sigma_t>0\).

If the stable keyword argument is set to True then the transformation used is,

\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1-\sigma_t)\odot\mu_t\)

where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).

Together with ConditionalTransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import ConditionalAutoRegressiveNN
>>> input_dim = 10
>>> context_dim = 4
>>> batch_size = 3
>>> hidden_dims = [10*input_dim, 10*input_dim]
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> hypernet = ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims)
>>> transform = ConditionalAffineAutoregressive(hypernet)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size]))  # doctest: +SKIP

The inverse of the Bijector is required when, e.g., scoring the log density of a sample with TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling from TransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitrary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.

Parameters:

autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).

References:

[1] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling. Improving Variational Inference with Inverse Autoregressive Flow. [arXiv:1606.04934]

[2] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]

[3] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. [arXiv:1502.03509]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: Conditions on a context variable, returning a non-conditional transform of of type AffineAutoregressive.

domain = IndependentConstraint(Real(), 1)¶

ConditionalAffineCoupling¶

class ConditionalAffineCoupling(split_dim, hypernet, **kwargs)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

An implementation of the affine coupling layer of RealNVP (Dinh et al., 2017) that conditions on an additional context variable and uses the bijective transform,

\(\mathbf{y}_{1:d} = \mathbf{x}_{1:d}\) \(\mathbf{y}_{(d+1):D} = \mu + \sigma\odot\mathbf{x}_{(d+1):D}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, and \(\mu,\sigma\) are shift and translation parameters calculated as the output of a function input \(\mathbf{x}_{1:d}\) and a context variable \(\mathbf{z}\in\mathbb{R}^M\).

That is, the first \(d\) components remain unchanged, and the subsequent \(D-d\) are shifted and translated by a function of the previous components.

Together with ConditionalTransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import ConditionalDenseNN
>>> input_dim = 10
>>> split_dim = 6
>>> context_dim = 4
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim-split_dim, input_dim-split_dim]
>>> hypernet = ConditionalDenseNN(split_dim, context_dim, [10*input_dim],
... param_dims)
>>> transform = ConditionalAffineCoupling(split_dim, hypernet)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size]))  # doctest: +SKIP

The inverse of the Bijector is required when, e.g., scoring the log density of a sample with ConditionalTransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling from ConditionalTransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitary value is being scored, it will calculate it manually.

This is an operation that scales as O(1), i.e. constant in the input dimension. So in general, it is cheap to sample and score (an arbitrary value) from ConditionalAffineCoupling.

Parameters:

split_dim (int) – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – A neural network whose forward call returns a real-valued mean and logit-scale as a tuple. The input should have final dimension split_dim and the output final dimension input_dim-split_dim for each member of the tuple. The network also inputs a context variable as a keyword argument in order to condition the output upon it.
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the NN

References:

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. ICLR 2017.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 1)¶

ConditionalGeneralizedChannelPermute¶

class ConditionalGeneralizedChannelPermute(nn, channels=3, permutation=None)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

A bijection that generalizes a permutation on the channels of a batch of 2D image in \([\ldots,C,H,W]\) format conditioning on an additional context variable. Specifically this transform performs the operation,

\(\mathbf{y} = \text{torch.nn.functional.conv2d}(\mathbf{x}, W)\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(W\sim C\times C\times 1\times 1\) is the filter matrix for a 1x1 convolution with \(C\) input and output channels.

Ignoring the final two dimensions, \(W\) is restricted to be the matrix product,

\(W = PLU\)

where \(P\sim C\times C\) is a permutation matrix on the channel dimensions, and \(LU\sim C\times C\) is an invertible product of a lower triangular and an upper triangular matrix that is the output of an NN with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on.

The input \(\mathbf{x}\) and output \(\mathbf{y}\) both have shape […,C,H,W], where C is the number of channels set at initialization.

This operation was introduced in [1] for Glow normalizing flow, and is also known as 1x1 invertible convolution. It appears in other notable work such as [2,3], and corresponds to the class tfp.bijectors.MatvecLU of TensorFlow Probability.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> context_dim = 5
>>> batch_size = 3
>>> channels = 3
>>> base_dist = dist.Normal(torch.zeros(channels, 32, 32),
... torch.ones(channels, 32, 32))
>>> hidden_dims = [context_dim*10, context_dim*10]
>>> nn = DenseNN(context_dim, hidden_dims, param_dims=[channels*channels])
>>> transform = ConditionalGeneralizedChannelPermute(nn, channels=channels)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

Parameters:	nn – a function inputting the context variable and outputting real-valued parameters of dimension \(C^2\). channels (int) – Number of channel dimensions in the input.

[1] Diederik P. Kingma, Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. [arXiv:1807.03039]

[2] Ryan Prenger, Rafael Valle, Bryan Catanzaro. WaveGlow: A Flow-based Generative Network for Speech Synthesis. [arXiv:1811.00002]

[3] Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. [arXiv:1906.04032]

bijective = True¶

codomain = IndependentConstraint(Real(), 3)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 3)¶

ConditionalHouseholder¶

class ConditionalHouseholder(input_dim, nn, count_transforms=1)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

Represents multiple applications of the Householder bijective transformation conditioning on an additional context. A single Householder transformation takes the form,

\(\mathbf{y} = (I - 2*\frac{\mathbf{u}\mathbf{u}^T}{||\mathbf{u}||^2})\mathbf{x}\)

where \(\mathbf{x}\) are the inputs with dimension \(D\), \(\mathbf{y}\) are the outputs, and \(\mathbf{u}\in\mathbb{R}^D\) is the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on.

The transformation represents the reflection of \(\mathbf{x}\) through the plane passing through the origin with normal \(\mathbf{u}\).

\(D\) applications of this transformation are able to transform standard i.i.d. standard Gaussian noise into a Gaussian variable with an arbitrary covariance matrix. With \(K<D\) transformations, one is able to approximate a full-rank Gaussian distribution using a linear transformation of rank \(K\).

Together with ConditionalTransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim]
>>> hypernet = DenseNN(context_dim, [50, 50], param_dims)
>>> transform = ConditionalHouseholder(input_dim, hypernet)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

Parameters:	input_dim (int) – the dimension of the input (and output) variable. nn (callable) – a function inputting the context variable and outputting a triplet of real-valued parameters of dimensions \((1, D, D)\). count_transforms (int) – number of applications of Householder transformation to apply.

References:

[1] Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow. [arXiv:1611.09630]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 1)¶

ConditionalMatrixExponential¶

class ConditionalMatrixExponential(input_dim, nn, iterations=8, normalization='none', bound=None)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

A dense matrix exponential bijective transform (Hoogeboom et al., 2020) that conditions on an additional context variable with equation,

\(\mathbf{y} = \exp(M)\mathbf{x}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\exp(\cdot)\) represents the matrix exponential, and \(M\in\mathbb{R}^D\times\mathbb{R}^D\) is the output of a neural network conditioning on a context variable \(\mathbf{z}\) for input dimension \(D\). In general, \(M\) is not required to be invertible.

Due to the favourable mathematical properties of the matrix exponential, the transform has an exact inverse and a log-determinate-Jacobian that scales in time-complexity as \(O(D)\). Both the forward and reverse operations are approximated with a truncated power series. For numerical stability, the norm of \(M\) can be restricted with the normalization keyword argument.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim*input_dim]
>>> hypernet = DenseNN(context_dim, [50, 50], param_dims)
>>> transform = ConditionalMatrixExponential(input_dim, hypernet)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

Parameters:

input_dim (int) – the dimension of the input (and output) variable.
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.

References:

[1] Emiel Hoogeboom, Victor Garcia Satorras, Jakub M. Tomczak, Max Welling. The: Convolution Exponential and Generalized Sylvester Flows. [arXiv:2006.01910]
[2] Tim Salimans, Diederik P. Kingma. Weight Normalization: A Simple: Reparameterization to Accelerate Training of Deep Neural Networks. [arXiv:1602.07868]
[3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral: Normalization for Generative Adversarial Networks. ICLR 2018.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 1)¶

ConditionalNeuralAutoregressive¶

class ConditionalNeuralAutoregressive(autoregressive_nn, **kwargs)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

An implementation of the deep Neural Autoregressive Flow (NAF) bijective transform of the “IAF flavour” conditioning on an additiona context variable that can be used for sampling and scoring samples drawn from it (but not arbitrary ones).

Example usage:

>>> from pyro.nn import ConditionalAutoRegressiveNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> arn = ConditionalAutoRegressiveNN(input_dim, context_dim, [40],
... param_dims=[16]*3)
>>> transform = ConditionalNeuralAutoregressive(arn, hidden_units=16)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size]))  # doctest: +SKIP

The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.

Parameters:

autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tuple of three real-valued tensors, whose last dimension is the input dimension, and whose penultimate dimension is equal to hidden_units.
hidden_units (int) – the number of hidden units to use in the NAF transformation (see Eq (8) in reference)
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.

Reference:

[1] Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville. Neural Autoregressive Flows. [arXiv:1804.00779]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: Conditions on a context variable, returning a non-conditional transform of of type NeuralAutoregressive.

domain = IndependentConstraint(Real(), 1)¶

ConditionalPlanar¶

class ConditionalPlanar(nn)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

A conditional ‘planar’ bijective transform using the equation,

\(\mathbf{y} = \mathbf{x} + \mathbf{u}\tanh(\mathbf{w}^T\mathbf{z}+b)\)

where \(\mathbf{x}\) are the inputs with dimension \(D\), \(\mathbf{y}\) are the outputs, and the pseudo-parameters \(b\in\mathbb{R}\), \(\mathbf{u}\in\mathbb{R}^D\), and \(\mathbf{w}\in\mathbb{R}^D\) are the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on. For this to be an invertible transformation, the condition \(\mathbf{w}^T\mathbf{u}>-1\) is enforced.

Together with ConditionalTransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [1, input_dim, input_dim]
>>> hypernet = DenseNN(context_dim, [50, 50], param_dims)
>>> transform = ConditionalPlanar(hypernet)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the planar transform can be scored.

Parameters:	nn (callable) – a function inputting the context variable and outputting a triplet of real-valued parameters of dimensions \((1, D, D)\).

References: [1] Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 1)¶

ConditionalRadial¶

class ConditionalRadial(nn)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

A conditional ‘radial’ bijective transform context using the equation,

\(\mathbf{y} = \mathbf{x} + \beta h(\alpha,r)(\mathbf{x} - \mathbf{x}_0)\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(\alpha\in\mathbb{R}^+\), \(\beta\in\mathbb{R}\), and \(\mathbf{x}_0\in\mathbb{R}^D\), are the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on. The input dimension is \(D\), \(r=||\mathbf{x}-\mathbf{x}_0||_2\), and \(h(\alpha,r)=1/(\alpha+r)\). For this to be an invertible transformation, the condition \(\beta>-\alpha\) is enforced.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim, 1, 1]
>>> hypernet = DenseNN(context_dim, [50, 50], param_dims)
>>> transform = ConditionalRadial(hypernet)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the radial transform can be scored.

Parameters:	input_dim (int) – the dimension of the input (and output) variable.

References:

[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = IndependentConstraint(Real(), 1)¶

ConditionalSpline¶

class ConditionalSpline(nn, input_dim, count_bins, bound=3.0, order='linear')[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

An implementation of the element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020) conditioning on an additional context variable.

Rational splines are functions that are comprised of segments that are the ratio of two polynomials. For instance, for the \(d\)-th dimension and the \(k\)-th segment on the spline, the function will take the form,

\(y_d = \frac{\alpha^{(k)}(x_d)}{\beta^{(k)}(x_d)},\)

where \(\alpha^{(k)}\) and \(\beta^{(k)}\) are two polynomials of order \(d\) whose parameters are the output of a function, e.g. a NN, with input \(z\\in\\mathbb{R}^{M}\) representing the context variable to condition on.. For \(d=1\), we say that the spline is linear, and for \(d=2\), quadratic. The spline is constructed on the specified bounding box, \([-K,K]\times[-K,K]\), with the identity function used elsewhere.

Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse that is of the same computational and space complexities as the forward operation. This element-wise transform permits the accurate represention of complex univariate distributions.

Example usage:

>>> from pyro.nn.dense_nn import DenseNN
>>> input_dim = 10
>>> context_dim = 5
>>> batch_size = 3
>>> count_bins = 8
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [input_dim * count_bins, input_dim * count_bins,
... input_dim * (count_bins - 1), input_dim * count_bins]
>>> hypernet = DenseNN(context_dim, [50, 50], param_dims)
>>> transform = ConditionalSpline(hypernet, input_dim, count_bins)
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP

Parameters:

input_dim (int) – Dimension of the input vector. This is required so we know how many parameters to store.
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

References:

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

bijective = True¶

codomain = Real()¶

condition(context)[source]¶: See pyro.distributions.conditional.ConditionalTransformModule.condition()

domain = Real()¶

ConditionalSplineAutoregressive¶

class ConditionalSplineAutoregressive(input_dim, autoregressive_nn, **kwargs)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransformModule

An implementation of the autoregressive layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020) that conditions on an additional context variable. Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see Spline).

The autoregressive layer uses the transformation,

\(y_d = g_{\theta_d}(x_d)\ \ \ d=1,2,\ldots,D\)

where \(\mathbf{x}=(x_1,x_2,\ldots,x_D)\) are the inputs, \(\mathbf{y}=(y_1,y_2,\ldots,y_D)\) are the outputs, \(g_{\theta_d}\) is an elementwise rational monotonic spline with parameters \(\theta_d\), and \(\theta=(\theta_1,\theta_2,\ldots,\theta_D)\) is the output of a conditional autoregressive NN inputting \(\mathbf{x}\) and conditioning on the context variable \(\mathbf{z}\).

Example usage:

>>> from pyro.nn import ConditionalAutoRegressiveNN
>>> input_dim = 10
>>> count_bins = 8
>>> context_dim = 5
>>> batch_size = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> hidden_dims = [input_dim * 10, input_dim * 10]
>>> param_dims = [count_bins, count_bins, count_bins - 1, count_bins]
>>> hypernet = ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims,
... param_dims=param_dims)
>>> transform = ConditionalSplineAutoregressive(input_dim, hypernet,
... count_bins=count_bins)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> z = torch.rand(batch_size, context_dim)
>>> flow_dist = dist.ConditionalTransformedDistribution(base_dist,
... [transform]).condition(z)
>>> flow_dist.sample(sample_shape=torch.Size([batch_size]))  # doctest: +SKIP

Parameters:

input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
autoregressive_nn (callable) – an autoregressive neural network whose forward call returns tuple of the spline parameters
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

References:

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

condition(context)[source]¶: Conditions on a context variable, returning a non-conditional transform of of type SplineAutoregressive.

domain = IndependentConstraint(Real(), 1)¶

ConditionalTransformModule¶

class ConditionalTransformModule(*args, **kwargs)[source]¶

Bases: pyro.distributions.conditional.ConditionalTransform, torch.nn.modules.module.Module

Conditional transforms with learnable parameters such as normalizing flows should inherit from this class rather than ConditionalTransform so they are also a subclass of Module and inherit all the useful methods of that class.

GeneralizedChannelPermute¶

class GeneralizedChannelPermute(channels=3, permutation=None)[source]¶

Bases: pyro.distributions.transforms.generalized_channel_permute.ConditionedGeneralizedChannelPermute, pyro.distributions.torch_transform.TransformModule

A bijection that generalizes a permutation on the channels of a batch of 2D image in \([\ldots,C,H,W]\) format. Specifically this transform performs the operation,

\(\mathbf{y} = \text{torch.nn.functional.conv2d}(\mathbf{x}, W)\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(W\sim C\times C\times 1\times 1\) is the filter matrix for a 1x1 convolution with \(C\) input and output channels.

Ignoring the final two dimensions, \(W\) is restricted to be the matrix product,

\(W = PLU\)

where \(P\sim C\times C\) is a permutation matrix on the channel dimensions, \(L\sim C\times C\) is a lower triangular matrix with ones on the diagonal, and \(U\sim C\times C\) is an upper triangular matrix. \(W\) is initialized to a random orthogonal matrix. Then, \(P\) is fixed and the learnable parameters set to \(L,U\).

The input \(\mathbf{x}\) and output \(\mathbf{y}\) both have shape […,C,H,W], where C is the number of channels set at initialization.

This operation was introduced in [1] for Glow normalizing flow, and is also known as 1x1 invertible convolution. It appears in other notable work such as [2,3], and corresponds to the class tfp.bijectors.MatvecLU of TensorFlow Probability.

Example usage:

>>> channels = 3
>>> base_dist = dist.Normal(torch.zeros(channels, 32, 32),
... torch.ones(channels, 32, 32))
>>> inv_conv = GeneralizedChannelPermute(channels=channels)
>>> flow_dist = dist.TransformedDistribution(base_dist, [inv_conv])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:	channels (int) – Number of channel dimensions in the input.

[1] Diederik P. Kingma, Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. [arXiv:1807.03039]

[2] Ryan Prenger, Rafael Valle, Bryan Catanzaro. WaveGlow: A Flow-based Generative Network for Speech Synthesis. [arXiv:1811.00002]

[3] Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. [arXiv:1906.04032]

bijective = True¶

codomain = IndependentConstraint(Real(), 3)¶

domain = IndependentConstraint(Real(), 3)¶

Householder¶

class Householder(input_dim, count_transforms=1)[source]¶

Bases: pyro.distributions.transforms.householder.ConditionedHouseholder, pyro.distributions.torch_transform.TransformModule

Represents multiple applications of the Householder bijective transformation. A single Householder transformation takes the form,

\(\mathbf{y} = (I - 2*\frac{\mathbf{u}\mathbf{u}^T}{||\mathbf{u}||^2})\mathbf{x}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(\mathbf{u}\in\mathbb{R}^D\) for input dimension \(D\).

The transformation represents the reflection of \(\mathbf{x}\) through the plane passing through the origin with normal \(\mathbf{u}\).

\(D\) applications of this transformation are able to transform standard i.i.d. standard Gaussian noise into a Gaussian variable with an arbitrary covariance matrix. With \(K<D\) transformations, one is able to approximate a full-rank Gaussian distribution using a linear transformation of rank \(K\).

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = Householder(10, count_transforms=5)
>>> pyro.module("my_transform", p) # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:	input_dim (int) – the dimension of the input (and output) variable. count_transforms (int) – number of applications of Householder transformation to apply.

References:

[1] Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow. [arXiv:1611.09630]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

reset_parameters()[source]¶

volume_preserving = True¶

MatrixExponential¶

class MatrixExponential(input_dim, iterations=8, normalization='none', bound=None)[source]¶

Bases: pyro.distributions.transforms.matrix_exponential.ConditionedMatrixExponential, pyro.distributions.torch_transform.TransformModule

A dense matrix exponential bijective transform (Hoogeboom et al., 2020) with equation,

\(\mathbf{y} = \exp(M)\mathbf{x}\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\exp(\cdot)\) represents the matrix exponential, and the learnable parameters are \(M\in\mathbb{R}^D\times\mathbb{R}^D\) for input dimension \(D\). In general, \(M\) is not required to be invertible.

Due to the favourable mathematical properties of the matrix exponential, the transform has an exact inverse and a log-determinate-Jacobian that scales in time-complexity as \(O(D)\). Both the forward and reverse operations are approximated with a truncated power series. For numerical stability, the norm of \(M\) can be restricted with the normalization keyword argument.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = MatrixExponential(10)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:

input_dim (int) – the dimension of the input (and output) variable.
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.

References:

[1] Emiel Hoogeboom, Victor Garcia Satorras, Jakub M. Tomczak, Max Welling. The: Convolution Exponential and Generalized Sylvester Flows. [arXiv:2006.01910]
[2] Tim Salimans, Diederik P. Kingma. Weight Normalization: A Simple: Reparameterization to Accelerate Training of Deep Neural Networks. [arXiv:1602.07868]
[3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral: Normalization for Generative Adversarial Networks. ICLR 2018.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

reset_parameters()[source]¶

NeuralAutoregressive¶

class NeuralAutoregressive(autoregressive_nn, hidden_units=16, activation='sigmoid')[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of the deep Neural Autoregressive Flow (NAF) bijective transform of the “IAF flavour” that can be used for sampling and scoring samples drawn from it (but not arbitrary ones).

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> arn = AutoRegressiveNN(10, [40], param_dims=[16]*3)
>>> transform = NeuralAutoregressive(arn, hidden_units=16)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.

Parameters:

autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tuple of three real-valued tensors, whose last dimension is the input dimension, and whose penultimate dimension is equal to hidden_units.
hidden_units (int) – the number of hidden units to use in the NAF transformation (see Eq (8) in reference)
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.

Reference:

[1] Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville. Neural Autoregressive Flows. [arXiv:1804.00779]

autoregressive = True¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

eps = 1e-08¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian

Planar¶

class Planar(input_dim)[source]¶

Bases: pyro.distributions.transforms.planar.ConditionedPlanar, pyro.distributions.torch_transform.TransformModule

A ‘planar’ bijective transform with equation,

\(\mathbf{y} = \mathbf{x} + \mathbf{u}\tanh(\mathbf{w}^T\mathbf{z}+b)\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(b\in\mathbb{R}\), \(\mathbf{u}\in\mathbb{R}^D\), \(\mathbf{w}\in\mathbb{R}^D\) for input dimension \(D\). For this to be an invertible transformation, the condition \(\mathbf{w}^T\mathbf{u}>-1\) is enforced.

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = Planar(10)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the planar transform can be scored.

Parameters:	input_dim (int) – the dimension of the input (and output) variable.

References:

[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

reset_parameters()[source]¶

Polynomial¶

class Polynomial(autoregressive_nn, input_dim, count_degree, count_sum)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An autoregressive bijective transform as described in Jaini et al. (2019) applying following equation element-wise,

\(y_n = c_n + \int^{x_n}_0\sum^K_{k=1}\left(\sum^R_{r=0}a^{(n)}_{r,k}u^r\right)du\)

where \(x_n\) is the \(n\) is the \(n\), \(\left\{a^{(n)}_{r,k}\in\mathbb{R}\right\}\) are learnable parameters that are the output of an autoregressive NN inputting \(x_{\prec n}={x_1,x_2,\ldots,x_{n-1}}\).

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> input_dim = 10
>>> count_degree = 4
>>> count_sum = 3
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [(count_degree + 1)*count_sum]
>>> arn = AutoRegressiveNN(input_dim, [input_dim*10], param_dims)
>>> transform = Polynomial(arn, input_dim=input_dim, count_degree=count_degree,
... count_sum=count_sum)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using a polynomial transform can be scored.

Parameters:	autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tensor of real-valued numbers of size (batch_size, (count_degree+1)count_sum, input_dim) count_degree* (int) – The degree of the polynomial to use for each element-wise transformation. count_sum (int) – The number of polynomials to sum in each element-wise transformation.

References:

[1] Priyank Jaini, Kira A. Shelby, Yaoliang Yu. Sum-of-squares polynomial flow. [arXiv:1905.02325]

autoregressive = True¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian

reset_parameters()[source]¶

Radial¶

class Radial(input_dim)[source]¶

Bases: pyro.distributions.transforms.radial.ConditionedRadial, pyro.distributions.torch_transform.TransformModule

A ‘radial’ bijective transform using the equation,

\(\mathbf{y} = \mathbf{x} + \beta h(\alpha,r)(\mathbf{x} - \mathbf{x}_0)\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(\alpha\in\mathbb{R}^+\), \(\beta\in\mathbb{R}\), \(\mathbf{x}_0\in\mathbb{R}^D\), for input dimension \(D\), \(r=||\mathbf{x}-\mathbf{x}_0||_2\), \(h(\alpha,r)=1/(\alpha+r)\). For this to be an invertible transformation, the condition \(\beta>-\alpha\) is enforced.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = Radial(10)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the radial transform can be scored.

Parameters:	input_dim (int) – the dimension of the input (and output) variable.

References:

[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

reset_parameters()[source]¶

Spline¶

class Spline(input_dim, count_bins=8, bound=3.0, order='linear')[source]¶

Bases: pyro.distributions.transforms.spline.ConditionedSpline, pyro.distributions.torch_transform.TransformModule

An implementation of the element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials. For instance, for the \(d\)-th dimension and the \(k\)-th segment on the spline, the function will take the form,

\(y_d = \frac{\alpha^{(k)}(x_d)}{\beta^{(k)}(x_d)},\)

where \(\alpha^{(k)}\) and \(\beta^{(k)}\) are two polynomials of order \(d\). For \(d=1\), we say that the spline is linear, and for \(d=2\), quadratic. The spline is constructed on the specified bounding box, \([-K,K]\times[-K,K]\), with the identity function used elsewhere.

Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse that is of the same computational and space complexities as the forward operation. This element-wise transform permits the accurate represention of complex univariate distributions.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = Spline(10, count_bins=4, bound=3.)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:

input_dim (int) – Dimension of the input vector. This is required so we know how many parameters to store.
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

References:

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

bijective = True¶

codomain = Real()¶

domain = Real()¶

SplineAutoregressive¶

class SplineAutoregressive(input_dim, autoregressive_nn, count_bins=8, bound=3.0, order='linear')[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of the autoregressive layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see Spline).

The autoregressive layer uses the transformation,

\(y_d = g_{\theta_d}(x_d)\ \ \ d=1,2,\ldots,D\)

where \(\mathbf{x}=(x_1,x_2,\ldots,x_D)\) are the inputs, \(\mathbf{y}=(y_1,y_2,\ldots,y_D)\) are the outputs, \(g_{\theta_d}\) is an elementwise rational monotonic spline with parameters \(\theta_d\), and \(\theta=(\theta_1,\theta_2,\ldots,\theta_D)\) is the output of an autoregressive NN inputting \(\mathbf{x}\).

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> input_dim = 10
>>> count_bins = 8
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> hidden_dims = [input_dim * 10, input_dim * 10]
>>> param_dims = [count_bins, count_bins, count_bins - 1, count_bins]
>>> hypernet = AutoRegressiveNN(input_dim, hidden_dims, param_dims=param_dims)
>>> transform = SplineAutoregressive(input_dim, hypernet, count_bins=count_bins)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:

input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
autoregressive_nn (callable) – an autoregressive neural network whose forward call returns tuple of the spline parameters
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

References:

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

autoregressive = True¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian

SplineCoupling¶

class SplineCoupling(input_dim, split_dim, hypernet, count_bins=8, bound=3.0, order='linear', identity=False)[source]¶

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of the coupling layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see Spline).

The spline coupling layer uses the transformation,

\(\mathbf{y}_{1:d} = g_\theta(\mathbf{x}_{1:d})\) \(\mathbf{y}_{(d+1):D} = h_\phi(\mathbf{x}_{(d+1):D};\mathbf{x}_{1:d})\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, \(g_\theta\) is either the identity function or an elementwise rational monotonic spline with parameters \(\theta\), and \(h_\phi\) is a conditional elementwise spline spline, conditioning on the first \(d\) elements.

Example usage:

>>> from pyro.nn import DenseNN
>>> input_dim = 10
>>> split_dim = 6
>>> count_bins = 8
>>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim))
>>> param_dims = [(input_dim - split_dim) * count_bins,
... (input_dim - split_dim) * count_bins,
... (input_dim - split_dim) * (count_bins - 1),
... (input_dim - split_dim) * count_bins]
>>> hypernet = DenseNN(split_dim, [10*input_dim], param_dims)
>>> transform = SplineCoupling(input_dim, split_dim, hypernet)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP

Parameters:

input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
split_dim – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – a neural network whose forward call returns a tuple of spline parameters (see ConditionalSpline).
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

References:

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log jacobian

Sylvester¶

class Sylvester(input_dim, count_transforms=1)[source]¶

Bases: pyro.distributions.transforms.householder.Householder

An implementation of the Sylvester bijective transform of the Householder variety (Van den Berg Et Al., 2018),

\(\mathbf{y} = \mathbf{x} + QR\tanh(SQ^T\mathbf{x}+\mathbf{b})\)

where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(R,S\sim D\times D\) are upper triangular matrices for input dimension \(D\), \(Q\sim D\times D\) is an orthogonal matrix, and \(\mathbf{b}\sim D\) is learnable bias term.

The Sylvester transform is a generalization of Planar. In the Householder type of the Sylvester transform, the orthogonality of \(Q\) is enforced by representing it as the product of Householder transformations.

Together with TransformedDistribution it provides a way to create richer variational approximations.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> transform = Sylvester(10, count_transforms=4)
>>> pyro.module("my_transform", transform)  # doctest: +SKIP
>>> flow_dist = dist.TransformedDistribution(base_dist, [transform])
>>> flow_dist.sample()  # doctest: +SKIP
    tensor([-0.4071, -0.5030,  0.7924, -0.2366, -0.2387, -0.1417,  0.0868,
            0.1389, -0.4629,  0.0986])

The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the Sylvester transform can be scored.

References:

[1] Rianne van den Berg, Leonard Hasenclever, Jakub M. Tomczak, Max Welling. Sylvester Normalizing Flows for Variational Inference. UAI 2018.

Q(x)[source]¶

R()[source]¶

S()[source]¶

bijective = True¶

codomain = IndependentConstraint(Real(), 1)¶

domain = IndependentConstraint(Real(), 1)¶

dtanh_dx(x)[source]¶

log_abs_det_jacobian(x, y)[source]¶: Calculates the elementwise determinant of the log Jacobian

reset_parameters2()[source]¶

TransformModule¶

class TransformModule(*args, **kwargs)[source]¶

Bases: torch.distributions.transforms.Transform, torch.nn.modules.module.Module

Transforms with learnable parameters such as normalizing flows should inherit from this class rather than Transform so they are also a subclass of nn.Module and inherit all the useful methods of that class.

ComposeTransformModule¶

class ComposeTransformModule(parts)[source]¶

Bases: torch.distributions.transforms.ComposeTransform, torch.nn.modules.container.ModuleList

This allows us to use a list of TransformModule in the same way as ComposeTransform. This is needed so that transform parameters are automatically registered by Pyro’s param store when used in PyroModule instances.

Transform Factories¶

Each Transform and TransformModule includes a corresponding helper function in lower case that inputs, at minimum, the input dimensions of the transform, and possibly additional arguments to customize the transform in an intuitive way. The purpose of these helper functions is to hide from the user whether or not the transform requires the construction of a hypernet, and if so, the input and output dimensions of that hypernet.

iterated¶

iterated(repeats, base_fn, *args, **kwargs)[source]¶

Helper function to compose a sequence of bijective transforms with potentially learnable parameters using ComposeTransformModule.

Parameters:	repeats – number of repeated transforms. base_fn – function to construct the bijective transform. args – arguments taken by base_fn. kwargs – keyword arguments taken by base_fn.
Returns:	instance of `TransformModule`.

affine_autoregressive¶

affine_autoregressive(input_dim, hidden_dims=None, **kwargs)[source]¶

A helper function to create an AffineAutoregressive object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).

affine_coupling¶

affine_coupling(input_dim, hidden_dims=None, split_dim=None, dim=-1, **kwargs)[source]¶

A helper function to create an AffineCoupling object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension(s) of input variable to permute. Note that when dim < -1 this must be a tuple corresponding to the event shape.
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
split_dim (int) – The dimension to split the input on for the coupling transform. Defaults to using input_dim // 2
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN

batchnorm¶

batchnorm(input_dim, **kwargs)[source]¶

A helper function to create a BatchNorm object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable momentum (float) – momentum parameter for updating moving averages epsilon (float) – small number to add to variances to ensure numerical stability

block_autoregressive¶

block_autoregressive(input_dim, **kwargs)[source]¶

A helper function to create a BlockAutoregressive object for consistency with other helpers.

Parameters:

input_dim (int) – Dimension of input variable
hidden_factors (list) – Hidden layer i has hidden_factors[i] hidden units per input dimension. This corresponds to both \(a\) and \(b\) in De Cao et al. (2019). The elements of hidden_factors must be integers.
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
residual (string) – Type of residual connections to use. Choices are “None”, “normal” for \(\mathbf{y}+f(\mathbf{y})\), and “gated” for \(\alpha\mathbf{y} + (1 - \alpha\mathbf{y})\) for learnable parameter \(\alpha\).

conditional_affine_autoregressive¶

conditional_affine_autoregressive(input_dim, context_dim, hidden_dims=None, **kwargs)[source]¶

A helper function to create an ConditionalAffineAutoregressive object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).

conditional_affine_coupling¶

conditional_affine_coupling(input_dim, context_dim, hidden_dims=None, split_dim=None, dim=-1, **kwargs)[source]¶

A helper function to create an ConditionalAffineCoupling object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
split_dim (int) – The dimension to split the input on for the coupling transform. Defaults to using input_dim // 2
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN

conditional_generalized_channel_permute¶

conditional_generalized_channel_permute(context_dim, channels=3, hidden_dims=None)[source]¶

A helper function to create a ConditionalGeneralizedChannelPermute object for consistency with other helpers.

Parameters:	channels (int) – Number of channel dimensions in the input.

conditional_householder¶

conditional_householder(input_dim, context_dim, hidden_dims=None, count_transforms=1)[source]¶

A helper function to create a ConditionalHouseholder object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:	input_dim (int) – Dimension of input variable context_dim (int) – Dimension of context variable hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]

conditional_matrix_exponential¶

conditional_matrix_exponential(input_dim, context_dim, hidden_dims=None, iterations=8, normalization='none', bound=None)[source]¶

A helper function to create a ConditionalMatrixExponential object for consistency with other helpers.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.

conditional_neural_autoregressive¶

conditional_neural_autoregressive(input_dim, context_dim, hidden_dims=None, activation='sigmoid', width=16)[source]¶

A helper function to create a ConditionalNeuralAutoregressive object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
width (int) – The width of the “multilayer perceptron” in the transform (see paper). Defaults to 16

conditional_planar¶

conditional_planar(input_dim, context_dim, hidden_dims=None)[source]¶

A helper function to create a ConditionalPlanar object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:	input_dim (int) – Dimension of input variable context_dim (int) – Dimension of context variable hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]

conditional_radial¶

conditional_radial(input_dim, context_dim, hidden_dims=None)[source]¶

A helper function to create a ConditionalRadial object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:	input_dim (int) – Dimension of input variable context_dim (int) – Dimension of context variable hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]

conditional_spline¶

conditional_spline(input_dim, context_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]¶

A helper function to create a ConditionalSpline object that takes care of constructing a dense network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K] imes[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

conditional_spline_autoregressive¶

conditional_spline_autoregressive(input_dim, context_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]¶

A helper function to create a ConditionalSplineAutoregressive object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [input_dim * 10, input_dim * 10]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

elu¶

elu()[source]¶: A helper function to create an ELUTransform object for consistency with other helpers.

generalized_channel_permute¶

generalized_channel_permute(**kwargs)[source]¶

A helper function to create a GeneralizedChannelPermute object for consistency with other helpers.

Parameters:	channels (int) – Number of channel dimensions in the input.

householder¶

householder(input_dim, count_transforms=None)[source]¶

A helper function to create a Householder object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable count_transforms (int) – number of applications of Householder transformation to apply.

leaky_relu¶

leaky_relu()[source]¶: A helper function to create a LeakyReLUTransform object for consistency with other helpers.

matrix_exponential¶

matrix_exponential(input_dim, iterations=8, normalization='none', bound=None)[source]¶

A helper function to create a MatrixExponential object for consistency with other helpers.

Parameters:

input_dim (int) – Dimension of input variable
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.

neural_autoregressive¶

neural_autoregressive(input_dim, hidden_dims=None, activation='sigmoid', width=16)[source]¶

A helper function to create a NeuralAutoregressive object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
width (int) – The width of the “multilayer perceptron” in the transform (see paper). Defaults to 16

permute¶

permute(input_dim, permutation=None, dim=-1)[source]¶

A helper function to create a Permute object for consistency with other helpers.

Parameters:

input_dim (int) – Dimension(s) of input variable to permute. Note that when dim < -1 this must be a tuple corresponding to the event shape.
permutation (torch.LongTensor) – Torch tensor of integer indices representing permutation. Defaults to a random permutation.
dim (int) – the tensor dimension to permute. This value must be negative and defines the event dim as abs(dim).

planar¶

planar(input_dim)[source]¶

A helper function to create a Planar object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable

polynomial¶

polynomial(input_dim, hidden_dims=None)[source]¶

A helper function to create a Polynomial object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:	input_dim (int) – Dimension of input variable hidden_dims – The desired hidden dimensions of of the autoregressive network. Defaults to using [input_dim * 10]

radial¶

radial(input_dim)[source]¶

A helper function to create a Radial object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable

spline¶

spline(input_dim, **kwargs)[source]¶

A helper function to create a Spline object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable

spline_autoregressive¶

spline_autoregressive(input_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]¶

A helper function to create an SplineAutoregressive object that takes care of constructing an autoregressive network with the correct input/output dimensions.

Parameters:

input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.

spline_coupling¶

spline_coupling(input_dim, split_dim=None, hidden_dims=None, count_bins=8, bound=3.0)[source]¶

A helper function to create a SplineCoupling object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable

sylvester¶

sylvester(input_dim, count_transforms=None)[source]¶

A helper function to create a Sylvester object for consistency with other helpers.

Parameters:	input_dim (int) – Dimension of input variable count_transforms – Number of Sylvester operations to apply. Defaults to input_dim // 2 + 1. :type count_transforms: int

Constraints¶

Pyro’s constraints library extends torch.distributions.constraints.

Constraint¶

alias of torch.distributions.constraints.Constraint

boolean¶

alias of torch.distributions.constraints.boolean

cat¶

alias of torch.distributions.constraints.cat

corr_cholesky¶

alias of torch.distributions.constraints.corr_cholesky

corr_cholesky_constraint¶

alias of torch.distributions.constraints.corr_cholesky_constraint

corr_matrix¶

class _CorrMatrix[source]¶: Constrains to a correlation matrix.

dependent¶

alias of torch.distributions.constraints.dependent

dependent_property¶

alias of torch.distributions.constraints.dependent_property

greater_than¶

alias of torch.distributions.constraints.greater_than

greater_than_eq¶

alias of torch.distributions.constraints.greater_than_eq

half_open_interval¶

alias of torch.distributions.constraints.half_open_interval

independent¶

alias of torch.distributions.constraints.independent

integer¶

class _Integer[source]¶: Constrain to integers.

integer_interval¶

alias of torch.distributions.constraints.integer_interval

interval¶

alias of torch.distributions.constraints.interval

is_dependent¶

alias of torch.distributions.constraints.is_dependent

less_than¶

alias of torch.distributions.constraints.less_than

lower_cholesky¶

alias of torch.distributions.constraints.lower_cholesky

lower_triangular¶

alias of torch.distributions.constraints.lower_triangular

multinomial¶

alias of torch.distributions.constraints.multinomial

nonnegative_integer¶

alias of torch.distributions.constraints.nonnegative_integer

ordered_vector¶

class _OrderedVector[source]¶: Constrains to a real-valued tensor where the elements are monotonically increasing along the event_shape dimension.

positive¶

alias of torch.distributions.constraints.positive

positive_definite¶

alias of torch.distributions.constraints.positive_definite

positive_integer¶

alias of torch.distributions.constraints.positive_integer

positive_ordered_vector¶

class _PositiveOrderedVector[source]¶: Constrains to a positive real-valued tensor where the elements are monotonically increasing along the event_shape dimension.

real¶

alias of torch.distributions.constraints.real

real_vector¶

alias of torch.distributions.constraints.real_vector

simplex¶

alias of torch.distributions.constraints.simplex

softplus_lower_cholesky¶

class _SoftplusLowerCholesky[source]¶

softplus_positive¶

class _SoftplusPositive[source]¶

sphere¶

class _Sphere[source]¶: Constrain to the Euclidean sphere of any dimension.

stack¶

alias of torch.distributions.constraints.stack

unit_interval¶

alias of torch.distributions.constraints.unit_interval