# Distributions¶

## PyTorch Distributions¶

Most distributions in Pyro are thin wrappers around PyTorch distributions. For details on the PyTorch distribution interface, see torch.distributions.distribution.Distribution. For differences between the Pyro and PyTorch interfaces, see TorchDistributionMixin.

### Bernoulli¶

class Bernoulli(probs=None, logits=None, validate_args=None)

### Beta¶

class Beta(concentration1, concentration0, validate_args=None)

### Binomial¶

class Binomial(total_count=1, probs=None, logits=None, validate_args=None)

### Categorical¶

class Categorical(probs=None, logits=None, validate_args=None)

### Cauchy¶

class Cauchy(loc, scale, validate_args=None)

### Chi2¶

class Chi2(df, validate_args=None)

### Dirichlet¶

class Dirichlet(concentration, validate_args=None)

### Exponential¶

class Exponential(rate, validate_args=None)

### ExponentialFamily¶

class ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)

### FisherSnedecor¶

class FisherSnedecor(df1, df2, validate_args=None)

### Gamma¶

class Gamma(concentration, rate, validate_args=None)

### Geometric¶

class Geometric(probs=None, logits=None, validate_args=None)

### Gumbel¶

class Gumbel(loc, scale, validate_args=None)

### HalfCauchy¶

class HalfCauchy(scale, validate_args=None)

### HalfNormal¶

class HalfNormal(scale, validate_args=None)

### Independent¶

class Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)[source]

### Laplace¶

class Laplace(loc, scale, validate_args=None)

### LogNormal¶

class LogNormal(loc, scale, validate_args=None)

### LogisticNormal¶

class LogisticNormal(loc, scale, validate_args=None)

Wraps torch.distributions.logistic_normal.LogisticNormal with TorchDistributionMixin.

### LowRankMultivariateNormal¶

class LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)

### Multinomial¶

class Multinomial(total_count=1, probs=None, logits=None, validate_args=None)

### MultivariateNormal¶

class MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source]

### NegativeBinomial¶

class NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)

### Normal¶

class Normal(loc, scale, validate_args=None)

### OneHotCategorical¶

class OneHotCategorical(probs=None, logits=None, validate_args=None)

### Pareto¶

class Pareto(scale, alpha, validate_args=None)

### Poisson¶

class Poisson(rate, validate_args=None)

### RelaxedBernoulli¶

class RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)

### RelaxedOneHotCategorical¶

class RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)

### StudentT¶

class StudentT(df, loc=0.0, scale=1.0, validate_args=None)

### TransformedDistribution¶

class TransformedDistribution(base_distribution, transforms, validate_args=None)

### Uniform¶

class Uniform(low, high, validate_args=None)

### Weibull¶

class Weibull(scale, concentration, validate_args=None)

## Pyro Distributions¶

### Abstract Distribution¶

class Distribution[source]

Bases: object

Base class for parameterized probability distributions.

Distributions in Pyro are stochastic function objects with sample() and log_prob() methods. Distribution are stochastic functions with fixed parameters:

d = dist.Bernoulli(param)
x = d()                                # Draws a random sample.
p = d.log_prob(x)                      # Evaluates log probability of x.


Implementing New Distributions:

Derived classes must implement the methods: sample(), log_prob().

Examples:

Take a look at the examples to see how they interact with inference algorithms.

__call__(*args, **kwargs)[source]

Samples a random value (just an alias for .sample(*args, **kwargs)).

For tensor distributions, the returned tensor should have the same .shape as the parameters.

Returns: A random value. torch.Tensor
enumerate_support(expand=True)[source]

Returns a representation of the parametrized distribution’s support, along the first dimension. This is implemented only by discrete distributions.

Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product.

Parameters: expand (bool) – whether to expand the result to a tensor of shape (n,) + batch_shape + event_shape. If false, the return value has unexpanded shape (n,) + (1,)*len(batch_shape) + event_shape which can be broadcasted to the full shape. An iterator over the distribution’s discrete support. iterator
has_enumerate_support = False
has_rsample = False
log_prob(x, *args, **kwargs)[source]

Evaluates log probability densities for each of a batch of samples.

Parameters: x (torch.Tensor) – A single value or a batch of values batched along axis 0. log probability densities as a one-dimensional Tensor with same batch size as value and params. The shape of the result should be self.batch_size. torch.Tensor
sample(*args, **kwargs)[source]

Samples a random value.

For tensor distributions, the returned tensor should have the same .shape as the parameters, unless otherwise noted.

Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape(). torch.Tensor
score_parts(x, *args, **kwargs)[source]

Computes ingredients for stochastic gradient estimators of ELBO.

The default implementation is correct both for non-reparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.

Parameters: x (torch.Tensor) – A single value or batch of values. A ScoreParts object containing parts of the ELBO estimator. ScoreParts

### TorchDistributionMixin¶

class TorchDistributionMixin[source]

Bases: pyro.distributions.distribution.Distribution

Mixin to provide Pyro compatibility for PyTorch distributions.

You should instead use TorchDistribution for new distribution classes.

This is mainly useful for wrapping existing PyTorch distributions for use in Pyro. Derived classes must first inherit from torch.distributions.distribution.Distribution and then inherit from TorchDistributionMixin.

__call__(sample_shape=torch.Size([]))[source]

Samples a random value.

This is reparameterized whenever possible, calling rsample() for reparameterized distributions and sample() for non-reparameterized distributions.

Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape(). torch.Tensor
event_dim
Returns: Number of dimensions of individual events. int
shape(sample_shape=torch.Size([]))[source]

The tensor shape of samples from this distribution.

Samples are of shape:

d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape

Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. Tensor shape of samples. torch.Size
expand_by(sample_shape)[source]

Expands a distribution by adding sample_shape to the left side of its batch_shape.

To expand internal dims of self.batch_shape from 1 to something larger, use expand() instead.

Parameters: sample_shape (torch.Size) – The size of the iid batch to be drawn from the distribution. An expanded version of this distribution. ReshapedDistribution
reshape(sample_shape=None, extra_event_dims=None)[source]
to_event(reinterpreted_batch_ndims=None)[source]

Reinterprets the n rightmost dimensions of this distributions batch_shape as event dims, adding them to the left side of event_shape.

Example:

>>> [d1.batch_shape, d1.event_shape]
[torch.Size([2, 3]), torch.Size([4, 5])]
>>> d2 = d1.to_event(1)
>>> [d2.batch_shape, d2.event_shape]
[torch.Size([2]), torch.Size([3, 4, 5])]
>>> d3 = d1.to_event(2)
>>> [d3.batch_shape, d3.event_shape]
[torch.Size([]), torch.Size([2, 3, 4, 5])]

Parameters: reinterpreted_batch_ndims (int) – The number of batch dimensions to reinterpret as event dimensions. A reshaped version of this distribution. pyro.distributions.torch.Independent
independent(reinterpreted_batch_ndims=None)[source]
mask(mask)[source]

Masks a distribution by a zero-one tensor that is broadcastable to the distributions batch_shape.

Parameters: mask (torch.Tensor) – A zero-one valued float tensor. A masked copy of this distribution. MaskedDistribution

### TorchDistribution¶

class TorchDistribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source]

Base class for PyTorch-compatible distributions with Pyro support.

This should be the base class for almost all new Pyro distributions.

Note

Parameters and data should be of type Tensor and all methods return type Tensor unless otherwise noted.

Tensor Shapes:

TorchDistributions provide a method .shape() for the tensor shape of samples:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)


Pyro follows the same distribution shape semantics as PyTorch. It distinguishes between three different roles for tensor shapes of samples:

• sample shape corresponds to the shape of the iid samples drawn from the distribution. This is taken as an argument by the distribution’s sample method.
• batch shape corresponds to non-identical (independent) parameterizations of the distribution, inferred from the distribution’s parameter shapes. This is fixed for a distribution instance.
• event shape corresponds to the event dimensions of the distribution, which is fixed for a distribution class. These are collapsed when we try to score a sample from the distribution via d.log_prob(x).

These shapes are related by the equation:

assert d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape


Distributions provide a vectorized log_prob() method that evaluates the log probability density of each event in a batch independently, returning a tensor of shape sample_shape + d.batch_shape:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)
log_p = d.log_prob(x)
assert log_p.shape == sample_shape + d.batch_shape


Implementing New Distributions:

Derived classes must implement the methods sample() (or rsample() if .has_rsample == True) and log_prob(), and must implement the properties batch_shape, and event_shape. Discrete classes may also implement the enumerate_support() method to improve gradient estimates and set .has_enumerate_support = True.

### AVFMultivariateNormal¶

class AVFMultivariateNormal(loc, scale_tril, control_var)[source]

Bases: pyro.distributions.torch.MultivariateNormal

Multivariate normal (Gaussian) distribution with transport equation inspired control variates (adaptive velocity fields).

A distribution over vectors in which all the elements have a joint Gaussian density.

Parameters: loc (torch.Tensor) – D-dimensional mean vector. scale_tril (torch.Tensor) – Cholesky of Covariance matrix; D x D matrix. control_var (torch.Tensor) – 2 x L x D tensor that parameterizes the control variate; L is an arbitrary positive integer. This parameter needs to be learned (i.e. adapted) to achieve lower variance gradients. In a typical use case this parameter will be adapted concurrently with the loc and scale_tril that define the distribution.

Example usage:

control_var = torch.tensor(0.1 * torch.ones(2, 1, D), requires_grad=True)
opt_cv = torch.optim.Adam([control_var], lr=0.1, betas=(0.5, 0.999))

for _ in range(1000):
d = AVFMultivariateNormal(loc, scale_tril, control_var)
z = d.rsample()
cost = torch.pow(z, 2.0).sum()
cost.backward()
opt_cv.step()

arg_constraints = {'control_var': Real(), 'loc': Real(), 'scale_tril': LowerTriangular()}
rsample(sample_shape=torch.Size([]))[source]

### Delta¶

class Delta(v, log_density=0.0, event_dim=0, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Degenerate discrete distribution (a single point).

Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.

Parameters: v (torch.Tensor) – The single support element. log_density (torch.Tensor) – An optional density for this Delta. This is useful to keep the class of Delta distributions closed under differentiable transformation. event_dim (int) – Optional event dimension, defaults to zero.
arg_constraints = {'log_density': Real(), 'v': Real()}
expand(batch_shape, _instance=None)[source]
has_rsample = True
log_prob(x)[source]
mean
rsample(sample_shape=torch.Size([]))[source]
support = Real()
variance

### EmpiricalDistribution¶

class Empirical(samples, log_weights, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Empirical distribution associated with the sampled data.

Parameters: samples (torch.Tensor) – samples from the empirical distribution. log_weights (torch.Tensor) – log weights (optional) corresponding to the samples. The leftmost shape of log_weights must match that of samples
arg_constraints = {}
enumerate_support(expand=True)[source]

See pyro.distributions.torch_distribution.TorchDistribution.enumerate_support()

event_shape

See pyro.distributions.torch_distribution.TorchDistribution.event_shape()

has_enumerate_support = True
log_prob(value)[source]

Returns the log of the probability mass function evaluated at value. Note that this currently only supports scoring values with empty sample_shape.

Parameters: value (torch.Tensor) – scalar or tensor value to be scored.
log_weights
mean

See pyro.distributions.torch_distribution.TorchDistribution.mean()

sample(sample_shape=torch.Size([]))[source]

See pyro.distributions.torch_distribution.TorchDistribution.sample()

sample_size

Number of samples that constitute the empirical distribution.

Return int: number of samples collected.
support = Real()
variance

See pyro.distributions.torch_distribution.TorchDistribution.variance()

### GaussianScaleMixture¶

class GaussianScaleMixture(coord_scale, component_logits, component_scale)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Mixture of Normal distributions with zero mean and diagonal covariance matrices.

That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with zero mean and a D-dimensional diagonal covariance matrix. The K different covariance matrices are controlled by the parameters coord_scale and component_scale. That is, the covariance matrix of the k’th component is given by

Sigma_ii = (component_scale_k * coord_scale_i) ** 2 (i = 1, …, D)

where component_scale_k is a positive scale factor and coord_scale_i are positive scale parameters shared between all K components. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution. This distribution does not currently support batched parameters.

See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research.

[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856

Note that this distribution supports both even and odd dimensions, but the former should be more a bit higher precision, since it doesn’t use any erfs in the backward call. Also note that this distribution does not support D = 1.

Parameters: coord_scale (torch.tensor) – D-dimensional vector of scales component_logits (torch.tensor) – K-dimensional vector of logits component_scale (torch.tensor) – K-dimensional vector of scale multipliers
arg_constraints = {'component_logits': Real(), 'component_scale': GreaterThan(lower_bound=0.0), 'coord_scale': GreaterThan(lower_bound=0.0)}
has_rsample = True
log_prob(value)[source]
rsample(sample_shape=torch.Size([]))[source]

class MaskedMixture(mask, component0, component1, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

A masked deterministic mixture of two distributions.

This is useful when the mask is sampled from another distribution, possibly correlated across the batch. Often the mask can be marginalized out via enumeration.

Example:

change_point = pyro.sample("change_point",
dist.Categorical(torch.ones(len(data) + 1)),
infer={'enumerate': 'parallel'})
mask = torch.arange(len(data), dtype=torch.long) >= changepoint
with pyro.plate("data", len(data)):

Parameters: mask (torch.Tensor) – A byte tensor toggling between component0 and component1. component0 (pyro.distributions.TorchDistribution) – a distribution for batch elements mask == 0. component1 (pyro.distributions.TorchDistribution) – a distribution for batch elements mask == 1.
arg_constraints = {}
expand(batch_shape)[source]
has_rsample
log_prob(value)[source]
mean[source]
rsample(sample_shape=torch.Size([]))[source]
sample(sample_shape=torch.Size([]))[source]
support
variance[source]

### MixtureOfDiagNormalsSharedCovariance¶

class MixtureOfDiagNormalsSharedCovariance(locs, coord_scale, component_logits)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Mixture of Normal distributions with diagonal covariance matrices.

That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with a D-dimensional mean parameter loc and a D-dimensional diagonal covariance matrix specified by a scale parameter coord_scale. The K different component means are gathered into the parameter locs and the scale parameter is shared between all K components. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution.

See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research. Note that this distribution does not support dimension D = 1.

[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856

Parameters: locs (torch.Tensor) – K x D mean matrix coord_scale (torch.Tensor) – shared D-dimensional scale vector component_logits (torch.Tensor) – K-dimensional vector of softmax logits
arg_constraints = {'component_logits': Real(), 'coord_scale': GreaterThan(lower_bound=0.0), 'locs': Real()}
expand(batch_shape, _instance=None)[source]
has_rsample = True
log_prob(value)[source]
rsample(sample_shape=torch.Size([]))[source]

### OMTMultivariateNormal¶

class OMTMultivariateNormal(loc, scale_tril)[source]

Bases: pyro.distributions.torch.MultivariateNormal

Multivariate normal (Gaussian) distribution with OMT gradients w.r.t. both parameters. Note the gradient computation w.r.t. the Cholesky factor has cost O(D^3), although the resulting gradient variance is generally expected to be lower.

A distribution over vectors in which all the elements have a joint Gaussian density.

Parameters: loc (torch.Tensor) – Mean. scale_tril (torch.Tensor) – Cholesky of Covariance matrix.
arg_constraints = {'loc': Real(), 'scale_tril': LowerTriangular()}
rsample(sample_shape=torch.Size([]))[source]

### RelaxedBernoulliStraightThrough¶

class RelaxedBernoulliStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]

Bases: pyro.distributions.torch.RelaxedBernoulli

An implementation of RelaxedBernoulli with a straight-through gradient estimator.

This distribution has the following properties:

• The samples returned by the rsample() method are discrete/quantized.
• The log_prob() method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.
• In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.

References:

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
Chris J. Maddison, Andriy Mnih, Yee Whye Teh
[2] Categorical Reparameterization with Gumbel-Softmax,
Eric Jang, Shixiang Gu, Ben Poole
log_prob(value)[source]

See pyro.distributions.torch.RelaxedBernoulli.log_prob()

rsample(sample_shape=torch.Size([]))[source]

See pyro.distributions.torch.RelaxedBernoulli.rsample()

### RelaxedOneHotCategoricalStraightThrough¶

class RelaxedOneHotCategoricalStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]

Bases: pyro.distributions.torch.RelaxedOneHotCategorical

An implementation of RelaxedOneHotCategorical with a straight-through gradient estimator.

This distribution has the following properties:

• The samples returned by the rsample() method are discrete/quantized.
• The log_prob() method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.
• In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.

References:

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
Chris J. Maddison, Andriy Mnih, Yee Whye Teh
[2] Categorical Reparameterization with Gumbel-Softmax,
Eric Jang, Shixiang Gu, Ben Poole
log_prob(value)[source]

See pyro.distributions.torch.RelaxedOneHotCategorical.log_prob()

rsample(sample_shape=torch.Size([]))[source]

See pyro.distributions.torch.RelaxedOneHotCategorical.rsample()

### Rejector¶

class Rejector(propose, log_prob_accept, log_scale)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Rejection sampled distribution given an acceptance rate function.

Parameters: propose (Distribution) – A proposal distribution that samples batched proposals via propose(). rsample() supports a sample_shape arg only if propose() supports a sample_shape arg. log_prob_accept (callable) – A callable that inputs a batch of proposals and returns a batch of log acceptance probabilities. log_scale – Total log probability of acceptance.
has_rsample = True
log_prob(x)[source]
rsample(sample_shape=torch.Size([]))[source]
score_parts(x)[source]

### VonMises¶

class VonMises(loc, concentration, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

A circular von Mises distribution.

This implementation uses polar coordinates. The loc and value args can be any real number (to facilitate unconstrained optimization), but are interpreted as angles modulo 2 pi.

See VonMises3D for a 3D cartesian coordinate cousin of this distribution.

Currently only log_prob() is implemented.

Parameters: loc (torch.Tensor) – an angle in radians. concentration (torch.Tensor) – concentration parameter
arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'loc': Real()}
expand(batch_shape)[source]
log_prob(value)[source]
support = Real()

### VonMises3D¶

class VonMises3D(concentration, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Spherical von Mises distribution.

This implementation combines the direction parameter and concentration parameter into a single combined parameter that contains both direction and magnitude. The value arg is represented in cartesian coordinates: it must be a normalized 3-vector that lies on the 2-sphere.

See VonMises for a 2D polar coordinate cousin of this distribution.

Currently only log_prob() is implemented.

Parameters: concentration (torch.Tensor) – A combined location-and-concentration vector. The direction of this vector is the location, and its magnitude is the concentration.
arg_constraints = {'concentration': Real()}
expand(batch_shape)[source]
log_prob(value)[source]
support = Real()

## Transformed Distributions¶

### InverseAutoRegressiveFlow¶

class InverseAutoregressiveFlow(autoregressive_nn, log_scale_min_clip=-5.0, log_scale_max_clip=3.0)[source]

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of Inverse Autoregressive Flow, using Eq (10) from Kingma Et Al., 2016,

$$\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}$$

where $$\mathbf{x}$$ are the inputs, $$\mathbf{y}$$ are the outputs, $$\mu_t,\sigma_t$$ are calculated from an autoregressive network on $$\mathbf{x}$$, and $$\sigma_t>0$$.

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> iaf = InverseAutoregressiveFlow(AutoRegressiveNN(10, [40]))
>>> iaf_module = pyro.module("my_iaf", iaf)
>>> iaf_dist = dist.TransformedDistribution(base_dist, [iaf])
>>> iaf_dist.sample()
tensor([-0.4071, -0.5030,  0.7924, -0.2366, -0.2387, -0.1417,  0.0868,
0.1389, -0.4629,  0.0986])


The inverse of the Bijector is required when, e.g., scoring the log density of a sample with TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling from TransformedDistribution. However, if the cached value isn’t available, either because it was already popped from the cache, or an arbitary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.

Parameters: autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN

References:

1. Improving Variational Inference with Inverse Autoregressive Flow [arXiv:1606.04934] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling

2. Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed

3. MADE: Masked Autoencoder for Distribution Estimation [arXiv:1502.03509] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

codomain = Real()
log_abs_det_jacobian(x, y)[source]

Calculates the elementwise determinant of the log jacobian

### InverseAutoRegressiveFlowStable¶

class InverseAutoregressiveFlowStable(autoregressive_nn, sigmoid_bias=2.0)[source]

Bases: pyro.distributions.torch_transform.TransformModule

An implementation of an Inverse Autoregressive Flow, using Eqs (13)/(14) from Kingma Et Al., 2016,

$$\mathbf{y} = \sigma_t\odot\mathbf{x} + (1-\sigma_t)\odot\mu_t$$

where $$\mathbf{x}$$ are the inputs, $$\mathbf{y}$$ are the outputs, $$\mu_t,\sigma_t$$ are calculated from an autoregressive network on $$\mathbf{x}$$, and $$\sigma_t$$ is restricted to $$[0,1]$$.

This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on $$[0,1]$$.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> iaf = InverseAutoregressiveFlowStable(AutoRegressiveNN(10, [40]))
>>> iaf_module = pyro.module("my_iaf", iaf)
>>> iaf_dist = dist.TransformedDistribution(base_dist, [iaf])
>>> iaf_dist.sample()
tensor([-0.4071, -0.5030,  0.7924, -0.2366, -0.2387, -0.1417,  0.0868,
0.1389, -0.4629,  0.0986])


See InverseAutoregressiveFlow docs for a discussion of the running cost.

Parameters: autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple sigmoid_bias (float) – bias on the hidden units fed into the sigmoid; default=2.0

References:

1. Improving Variational Inference with Inverse Autoregressive Flow [arXiv:1606.04934] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling

2. Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed

3. MADE: Masked Autoencoder for Distribution Estimation [arXiv:1502.03509] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

codomain = Real()
log_abs_det_jacobian(x, y)[source]

Calculates the elementwise determinant of the log jacobian

### PermuteTransform¶

class PermuteTransform(permutation)[source]

A bijection that reorders the input dimensions, that is, multiplies the input by a permutation matrix. This is useful in between InverseAutoregressiveFlow transforms to increase the flexibility of the resulting distribution and stabilize learning. Whilst not being an autoregressive transform, the log absolute determinate of the Jacobian is easily calculable as 0. Note that reordering the input dimension between two layers of InverseAutoregressiveFlow is not equivalent to reordering the dimension inside the MADE networks that those IAFs use; using a PermuteTransform results in a distribution with more flexibility.

Example usage:

>>> from pyro.nn import AutoRegressiveNN
>>> from pyro.distributions import InverseAutoregressiveFlow, PermuteTransform
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> iaf1 = InverseAutoregressiveFlow(AutoRegressiveNN(10, [40]))
>>> ff = PermuteTransform(torch.randperm(10, dtype=torch.long))
>>> iaf2 = InverseAutoregressiveFlow(AutoRegressiveNN(10, [40]))
>>> iaf_dist = dist.TransformedDistribution(base_dist, [iaf1, ff, iaf2])
>>> iaf_dist.sample()
tensor([-0.4071, -0.5030,  0.7924, -0.2366, -0.2387, -0.1417,  0.0868,
0.1389, -0.4629,  0.0986])

Parameters: permutation (torch.LongTensor) – a permutation ordering that is applied to the inputs.
bijective = True
codomain = Real()
inv_permutation[source]
log_abs_det_jacobian(x, y)[source]

Calculates the elementwise determinant of the log Jacobian, i.e. log(abs([dy_0/dx_0, …, dy_{N-1}/dx_{N-1}])). Note that this type of transform is not autoregressive, so the log Jacobian is not the sum of the previous expression. However, it turns out it’s always 0 (since the determinant is -1 or +1), and so returning a vector of zeros works.

### PlanarFlow¶

class PlanarFlow(input_dim)[source]

Bases: pyro.distributions.torch_transform.TransformModule

A ‘planar’ normalizing flow that uses the transformation

$$\mathbf{y} = \mathbf{x} + \mathbf{u}\tanh(\mathbf{w}^T\mathbf{z}+b)$$

where $$\mathbf{x}$$ are the inputs, $$\mathbf{y}$$ are the outputs, and the learnable parameters are $$b\in\mathbb{R}$$, $$\mathbf{u}\in\mathbb{R}^D$$, $$\mathbf{w}\in\mathbb{R}^D$$ for input dimension $$D$$. For this to be an invertible transformation, the condition $$\mathbf{w}^T\mathbf{u}>-1$$ is enforced.

Together with TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10))
>>> plf = PlanarFlow(10)
>>> plf_module = pyro.module("my_plf", plf)
>>> plf_dist = dist.TransformedDistribution(base_dist, [plf])
>>> plf_dist.sample()
tensor([-0.4071, -0.5030,  0.7924, -0.2366, -0.2387, -0.1417,  0.0868,
0.1389, -0.4629,  0.0986])


The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using planar flow can be scored.

Parameters: input_dim – the dimension of the input (and output) variable.

References:

Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed

codomain = Real()
log_abs_det_jacobian(x, y)[source]

Calculates the elementwise determinant of the log jacobian

reset_parameters()[source]
u_hat()[source]

### TransformModule¶

class TransformModule[source]

Bases: torch.distributions.transforms.Transform, torch.nn.modules.module.Module

Transforms with learnable parameters such as normalizing flows should inherit from this class rather than Transform so they are also a subclass of nn.Module and inherit all the useful methods of that class.