Distributions

PyTorch Distributions

Most distributions in Pyro are thin wrappers around PyTorch distributions. For details on the PyTorch distribution interface, see torch.distributions.distribution.Distribution. For differences between the Pyro and PyTorch interfaces, see TorchDistributionMixin.

Bernoulli

class Bernoulli(probs=None, logits=None, validate_args=None)

Wraps torch.distributions.bernoulli.Bernoulli with TorchDistributionMixin.

Beta

class Beta(concentration1, concentration0, validate_args=None)

Wraps torch.distributions.beta.Beta with TorchDistributionMixin.

Categorical

class Categorical(probs=None, logits=None, validate_args=None)

Wraps torch.distributions.categorical.Categorical with TorchDistributionMixin.

Cauchy

class Cauchy(loc, scale, validate_args=None)

Wraps torch.distributions.cauchy.Cauchy with TorchDistributionMixin.

Chi2

class Chi2(df, validate_args=None)

Wraps torch.distributions.chi2.Chi2 with TorchDistributionMixin.

Dirichlet

class Dirichlet(concentration, validate_args=None)

Wraps torch.distributions.dirichlet.Dirichlet with TorchDistributionMixin.

Exponential

class Exponential(rate, validate_args=None)

Wraps torch.distributions.exponential.Exponential with TorchDistributionMixin.

ExponentialFamily

class ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)

Wraps torch.distributions.exp_family.ExponentialFamily with TorchDistributionMixin.

FisherSnedecor

class FisherSnedecor(df1, df2, validate_args=None)

Wraps torch.distributions.fishersnedecor.FisherSnedecor with TorchDistributionMixin.

Gamma

class Gamma(concentration, rate, validate_args=None)

Wraps torch.distributions.gamma.Gamma with TorchDistributionMixin.

Geometric

class Geometric(probs=None, logits=None, validate_args=None)

Wraps torch.distributions.geometric.Geometric with TorchDistributionMixin.

Gumbel

class Gumbel(loc, scale, validate_args=None)

Wraps torch.distributions.gumbel.Gumbel with TorchDistributionMixin.

Independent

class Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)

Wraps torch.distributions.independent.Independent with TorchDistributionMixin.

Laplace

class Laplace(loc, scale, validate_args=None)

Wraps torch.distributions.laplace.Laplace with TorchDistributionMixin.

LogNormal

class LogNormal(loc, scale, validate_args=None)

Wraps torch.distributions.log_normal.LogNormal with TorchDistributionMixin.

LogisticNormal

class LogisticNormal(loc, scale, validate_args=None)

Wraps torch.distributions.logistic_normal.LogisticNormal with TorchDistributionMixin.

Multinomial

class Multinomial(total_count=1, probs=None, logits=None, validate_args=None)

Wraps torch.distributions.multinomial.Multinomial with TorchDistributionMixin.

MultivariateNormal

class MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)

Wraps torch.distributions.multivariate_normal.MultivariateNormal with TorchDistributionMixin.

Normal

class Normal(loc, scale, validate_args=None)

Wraps torch.distributions.normal.Normal with TorchDistributionMixin.

OneHotCategorical

class OneHotCategorical(probs=None, logits=None, validate_args=None)

Wraps torch.distributions.one_hot_categorical.OneHotCategorical with TorchDistributionMixin.

Pareto

class Pareto(scale, alpha, validate_args=None)

Wraps torch.distributions.pareto.Pareto with TorchDistributionMixin.

Poisson

class Poisson(rate, validate_args=None)

Wraps torch.distributions.poisson.Poisson with TorchDistributionMixin.

RelaxedBernoulli

class RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)

Wraps torch.distributions.relaxed_bernoulli.RelaxedBernoulli with TorchDistributionMixin.

RelaxedOneHotCategorical

class RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)

Wraps torch.distributions.relaxed_categorical.RelaxedOneHotCategorical with TorchDistributionMixin.

StudentT

class StudentT(df, loc=0.0, scale=1.0, validate_args=None)

Wraps torch.distributions.studentT.StudentT with TorchDistributionMixin.

TransformedDistribution

class TransformedDistribution(base_distribution, transforms, validate_args=None)

Wraps torch.distributions.transformed_distribution.TransformedDistribution with TorchDistributionMixin.

Uniform

class Uniform(low, high, validate_args=None)

Wraps torch.distributions.uniform.Uniform with TorchDistributionMixin.

Pyro Distributions

Abstract Distribution

class Distribution[source]

Bases: object

Base class for parameterized probability distributions.

Distributions in Pyro are stochastic function objects with sample() and log_prob() methods. Distribution are stochastic functions with fixed parameters:

d = dist.Bernoulli(param)
x = d()                                # Draws a random sample.
p = d.log_prob(x)                      # Evaluates log probability of x.

Implementing New Distributions:

Derived classes must implement the methods: sample(), log_prob().

Examples:

Take a look at the examples to see how they interact with inference algorithms.

__call__(*args, **kwargs)[source]

Samples a random value (just an alias for .sample(*args, **kwargs)).

For tensor distributions, the returned tensor should have the same .shape as the parameters.

Returns:A random value.
Return type:torch.Tensor
enumerate_support()[source]

Returns a representation of the parametrized distribution’s support, along the first dimension. This is implemented only by discrete distributions.

Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product.

Returns:An iterator over the distribution’s discrete support.
Return type:iterator
has_enumerate_support = False
has_rsample = False
log_prob(x, *args, **kwargs)[source]

Evaluates log probability densities for each of a batch of samples.

Parameters:x (torch.Tensor) – A single value or a batch of values batched along axis 0.
Returns:log probability densities as a one-dimensional Tensor with same batch size as value and params. The shape of the result should be self.batch_size.
Return type:torch.Tensor
sample(*args, **kwargs)[source]

Samples a random value.

For tensor distributions, the returned tensor should have the same .shape as the parameters, unless otherwise noted.

Parameters:sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
Returns:A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape().
Return type:torch.Tensor
score_parts(x, *args, **kwargs)[source]

Computes ingredients for stochastic gradient estimators of ELBO.

The default implementation is correct both for non-reparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.

Parameters:x (torch.Tensor) – A single value or batch of values.
Returns:A ScoreParts object containing parts of the ELBO estimator.
Return type:ScoreParts

TorchDistribution

class TorchDistribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source]

Bases: torch.distributions.distribution.Distribution, pyro.distributions.torch_distribution.TorchDistributionMixin

Base class for PyTorch-compatible distributions with Pyro support.

This should be the base class for almost all new Pyro distributions.

Note

Parameters and data should be of type Tensor and all methods return type Tensor unless otherwise noted.

Tensor Shapes:

TorchDistributions provide a method .shape() for the tensor shape of samples:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)

Pyro follows the same distribution shape semantics as PyTorch. It distinguishes between three different roles for tensor shapes of samples:

  • sample shape corresponds to the shape of the iid samples drawn from the distribution. This is taken as an argument by the distribution’s sample method.
  • batch shape corresponds to non-identical (independent) parameterizations of the distribution, inferred from the distribution’s parameter shapes. This is fixed for a distribution instance.
  • event shape corresponds to the event dimensions of the distribution, which is fixed for a distribution class. These are collapsed when we try to score a sample from the distribution via d.log_prob(x).

These shapes are related by the equation:

assert d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape

Distributions provide a vectorized :meth`~torch.distributions.distribution.Distribution.log_prob` method that evaluates the log probability density of each event in a batch independently, returning a tensor of shape sample_shape + d.batch_shape:

x = d.sample(sample_shape)
assert x.shape == d.shape(sample_shape)
log_p = d.log_prob(x)
assert log_p.shape == sample_shape + d.batch_shape

Implementing New Distributions:

Derived classes must implement the methods sample() (or rsample() if .has_rsample == True) and log_prob(), and must implement the properties batch_shape, and event_shape. Discrete classes may also implement the enumerate_support() method to improve gradient estimates and set .has_enumerate_support = True.

Binomial

class Binomial(total_count=1, probs=None, logits=None, validate_args=None)[source]

Bases: torch.distributions.distribution.Distribution, pyro.distributions.torch_distribution.TorchDistributionMixin

Creates a Binomial distribution parameterized by total_count and either probs or logits (but not both). total_count must be broadcastable with probs/logits.

This is adapted from torch.distributions.binomial.Binomial, with the important difference that total_count is not limited to being a single int, but can be a torch.Tensor.

Example:

>>> m = Binomial(100, torch.Tensor([0 , .2, .8, 1]))
>>> x = m.sample()
 0
 22
 71
 100
[torch.FloatTensor of size 4]]

>>> m = Binomial(torch.Tensor([[5.], [10.]]), torch.Tensor([0.5, 0.8]))
>>> x = m.sample()
 4  5
 7  6
[torch.FloatTensor of size (2,2)]
Parameters:
  • total_count ((Tensor)) – number of Bernoulli trials
  • probs ((Tensor)) – Event probabilities
  • logits ((Tensor)) – Event log-odds
arg_constraints = {'probs': <torch.distributions.constraints._Interval object at 0x7f011d1ed710>, 'total_count': <torch.distributions.constraints._IntegerGreaterThan object at 0x7f011d1ed5d0>}
enumerate_support()[source]

Returns tensor containing all values supported by a discrete distribution. The result will enumerate over dimension 0, so the shape of the result will be (cardinality,) + batch_shape + event_shape (where event_shape = () for univariate distributions).

Note that this enumerates over all batched tensors in lock-step [[0, 0], [1, 1], …]. To iterate over the full Cartesian product use itertools.product(m.enumerate_support()).

Returns:
Tensor iterating over dimension 0.
has_enumerate_support = True
log_prob(value)[source]

Returns the log of the probability density/mass function evaluated at value.

Args:
value (Tensor):
logits[source]
mean
param_shape
probs[source]
sample(sample_shape=torch.Size([]))[source]

Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched.

support
variance

Delta

class Delta(v, log_density=0.0, event_dim=0, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Degenerate discrete distribution (a single point).

Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.

Parameters:
  • v (torch.Tensor) – The single support element.
  • log_density (torch.Tensor) – An optional density for this Delta. This is useful to keep the class of Delta distributions closed under differentiable transformation.
  • event_dim (int) – Optional event dimension, defaults to zero.
arg_constraints = {'log_density': <torch.distributions.constraints._Real object at 0x7f011d1ed650>, 'v': <torch.distributions.constraints._Real object at 0x7f011d1ed650>}
has_rsample = True
log_prob(x)[source]

Returns the log of the probability density/mass function evaluated at value.

Args:
value (Tensor):
mean
rsample(sample_shape=torch.Size([]))[source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

support = <torch.distributions.constraints._Real object>
variance

EmpiricalDistribution

class Empirical(validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Empirical distribution associated with the sampled data.

add(value, weight=None, log_weight=None)[source]

Adds a new data point to the sample. The values in successive calls to add must have the same tensor shape and size. Optionally, an importance weight can be specified via log_weight or weight (default value of 1 is used if not specified).

Parameters:
  • value (torch.Tensor) – tensor to add to the sample.
  • weight (torch.Tensor) – log weight (optional) corresponding to the sample.
  • log_weight (torch.Tensor) – weight (optional) corresponding to the sample.
arg_constraints = {}
enumerate_support()[source]

See pyro.distributions.torch_distribution.TorchDistribution.enumerate_support()

event_shape

See pyro.distributions.torch_distribution.TorchDistribution.event_shape()

get_samples_and_weights()[source]
has_enumerate_support = True
log_prob(value)[source]

Returns the log of the probability mass function evaluated at value. Note that this currently only supports scoring values with empty sample_shape, i.e. an arbitrary batched sample is not allowed.

Parameters:value (torch.Tensor) – scalar or tensor value to be scored.
mean

See pyro.distributions.torch_distribution.TorchDistribution.mean()

sample(sample_shape=torch.Size([]))[source]

See pyro.distributions.torch_distribution.TorchDistribution.sample()

sample_size

Number of samples that constitute the empirical distribution.

Return int:number of samples collected.
support = <torch.distributions.constraints._Real object>
variance

See pyro.distributions.torch_distribution.TorchDistribution.variance()

HalfCauchy

class HalfCauchy(loc, scale)[source]

Bases: pyro.distributions.torch.TransformedDistribution

Half-Cauchy distribution.

This is a continuous distribution with lower-bounded domain (x > loc). See also the Cauchy distribution.

Parameters:
arg_constraints = {'loc': <torch.distributions.constraints._Real object at 0x7f011d1ed650>, 'scale': <torch.distributions.constraints._GreaterThan object at 0x7f011d1ed6d0>}
entropy()[source]

Returns entropy of distribution, batched over batch_shape.

Returns:
Tensor of shape batch_shape.
loc
log_prob(value)[source]

Scores the sample by inverting the transform(s) and computing the score using the score of the base distribution and the log abs det jacobian.

scale
support

LowRankMultivariateNormal

class LowRankMultivariateNormal(loc, W_term, D_term, trace_term=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Low Rank Multivariate Normal distribution.

Implements fast computation for log probability of Multivariate Normal distribution when the covariance matrix has the form:

covariance_matrix = W.T @ W + D.

Here D is a diagonal vector and W is a matrix of size M x N. The computation will be beneficial when M << N.

Parameters:
  • loc (torch.Tensor) – Mean. Must be a 1D or 2D tensor with the last dimension of size N.
  • W_term (torch.Tensor) – W term of covariance matrix. Must be in 2 dimensional of size M x N.
  • D_term (torch.Tensor) – D term of covariance matrix. Must be in 1 dimensional of size N.
  • trace_term (float) – A optional term to be added into Mahalabonis term according to p(y) = N(y|loc, cov).exp(-1/2 * trace_term).
arg_constraints = {'covariance_matrix_D_term': <torch.distributions.constraints._GreaterThan object at 0x7f011d1ed6d0>, 'loc': <torch.distributions.constraints._Real object at 0x7f011d1ed650>, 'scale_tril': <torch.distributions.constraints._LowerTriangular object at 0x7f011d1ed790>}
has_rsample = True
log_prob(value)[source]

Returns the log of the probability density/mass function evaluated at value.

Args:
value (Tensor):
mean
rsample(sample_shape=torch.Size([]))[source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

scale_tril[source]
support = <torch.distributions.constraints._Real object>
variance

OMTMultivariateNormal

class OMTMultivariateNormal(loc, scale_tril)[source]

Bases: pyro.distributions.torch.MultivariateNormal

Multivariate normal (Gaussian) distribution with OMT gradients w.r.t. both parameters. Note the gradient computation w.r.t. the Cholesky factor has cost O(D^3), although the resulting gradient variance is generally expected to be lower.

A distribution over vectors in which all the elements have a joint Gaussian density.

Parameters:
arg_constraints = {'loc': <torch.distributions.constraints._Real object at 0x7f011d1ed650>, 'scale_tril': <torch.distributions.constraints._LowerTriangular object at 0x7f011d1ed790>}
rsample(sample_shape=torch.Size([]))[source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

Rejector

class Rejector(propose, log_prob_accept, log_scale)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

Rejection sampled distribution given an acceptance rate function.

Parameters:
  • propose (Distribution) – A proposal distribution that samples batched proposals via propose(). rsample() supports a sample_shape arg only if propose() supports a sample_shape arg.
  • log_prob_accept (callable) – A callable that inputs a batch of proposals and returns a batch of log acceptance probabilities.
  • log_scale – Total log probability of acceptance.
has_rsample = True
log_prob(x)[source]

Returns the log of the probability density/mass function evaluated at value.

Args:
value (Tensor):
rsample(sample_shape=torch.Size([]))[source]

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

score_parts(x)[source]

Computes ingredients for stochastic gradient estimators of ELBO.

The default implementation is correct both for non-reparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.

Parameters:x (torch.Tensor) – A single value or batch of values.
Returns:A ScoreParts object containing parts of the ELBO estimator.
Return type:ScoreParts

VonMises

class VonMises(loc, concentration, validate_args=None)[source]

Bases: pyro.distributions.torch_distribution.TorchDistribution

A circular von Mises distribution.

Currently only log_prob() is implemented.

Parameters:
arg_constraints = {'concentration': <torch.distributions.constraints._GreaterThan object at 0x7f011d1ed6d0>, 'loc': <torch.distributions.constraints._Real object at 0x7f011d1ed650>}
log_prob(value)[source]

Returns the log of the probability density/mass function evaluated at value.

Args:
value (Tensor):
support = <torch.distributions.constraints._Real object>

Transformed Distributions

InverseAutoRegressiveFlow

class InverseAutoregressiveFlow(input_dim, hidden_dim, sigmoid_bias=2.0, permutation=None)[source]

Bases: torch.distributions.transforms.Transform

An implementation of an Inverse Autoregressive Flow. Together with the TransformedDistribution this provides a way to create richer variational approximations.

Example usage:

>>> base_dist = Normal(...)
>>> iaf = InverseAutoregressiveFlow(...)
>>> pyro.module("my_iaf", iaf.module)
>>> iaf_dist = TransformedDistribution(base_dist, [iaf])

Note that this implementation is only meant to be used in settings where the inverse of the Bijector is never explicitly computed (rather the result is cached from the forward call). In the context of variational inference, this means that the InverseAutoregressiveFlow should only be used in the guide, i.e. in the variational distribution. In other contexts the inverse could in principle be computed but this would be a (potentially) costly computation that scales with the dimension of the input (and in any case support for this is not included in this implementation).

Parameters:
  • input_dim (int) – dimension of input
  • hidden_dim (int) – hidden dimension (number of hidden units)
  • sigmoid_bias (float) – bias on the hidden units fed into the sigmoid; default=`2.0`
  • permutation (bool) – whether the order of the inputs should be permuted (by default the conditional dependence structure of the autoregression follows the sequential order)

References:

1. Improving Variational Inference with Inverse Autoregressive Flow [arXiv:1606.04934] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling

2. Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed

3. MADE: Masked Autoencoder for Distribution Estimation [arXiv:1502.03509] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

arn
Return type:pyro.nn.AutoRegressiveNN

Return the AutoRegressiveNN associated with the InverseAutoregressiveFlow

log_abs_det_jacobian(x, y)[source]

Calculates the elementwise determinant of the log jacobian