# Primitive Distributions¶

class Distribution(reparameterized=None)[source]

Bases: object

Base class for parameterized probability distributions.

Distributions in Pyro are stochastic function objects with .sample() and .log_pdf() methods. Pyro provides two versions of each stochastic function:

(i) lowercase versions that take parameters:

x = dist.bernoulli(param)             # Returns a sample of size size(param).
p = dist.bernoulli.log_pdf(x, param)  # Evaluates log probability of x.


and (ii) UpperCase distribution classes that can construct stochastic functions with fixed parameters:

d = dist.Bernoulli(param)
x = d()                               # Samples a sample of size size(param).
p = d.log_pdf(x)                      # Evaluates log probability of x.


Under the hood the lowercase versions are aliases for the UpperCase versions.

Note

Parameters and data should be of type torch.autograd.Variable and all methods return type torch.autograd.Variable unless otherwise noted.

Tensor Shapes:

Distributions provide a method .shape() for the tensor shape of samples:

x = d.sample(*args, **kwargs)
assert x.shape == d.shape(*args, **kwargs)


Pyro distinguishes two different roles for tensor shapes of samples:

• The leftmost dimension corresponds to iid batching, which can be treated specially during inference via the .batch_log_pdf() method.
• The rightmost dimensions correspond to event shape.

These shapes are related by the equation:

assert d.shape(*args, **kwargs) == (d.batch_shape(*args, **kwargs) +
d.event_shape(*args, **kwargs))


There are exceptions, for instance, in the case of the Categorical distribution, without one hot encoding.

Distributions provide a vectorized .batch_log_pdf() method that evaluates the log probability density of each event in a batch independently, returning a tensor of shape d.batch_shape(x) + (1,):

x = d.sample(*args, **kwargs)
assert x.shape == d.shape(*args, **kwargs)
log_p = d.batch_log_pdf(x, *args, **kwargs)
assert log_p.shape == d.batch_shape(*args, **kwargs) + (1,)


Distributions may also support broadcasting of the .log_pdf() and .batch_log_pdf() methods, which may each be evaluated with a sample tensor x that is larger than (but broadcastable from) the parameters. In this case, d.batch_shape(x) will return the shape of the broadcasted batch shape using the data tensor x:

x = d.sample()
xx = torch.stack([x, x])
d.batch_log_pdf(xx).size() == d.batch_shape(xx) + (1,))  # returns True


Implementing New Distributions:

Derived classes must implement the following methods: .sample(), .batch_log_pdf(), .batch_shape(), and .event_shape(). Discrete classes may also implement the .enumerate_support() method to improve gradient estimates and set .enumerable = True.

Examples:

Take a look at the examples to see how they interact with inference algorithms.

analytic_mean(*args, **kwargs)[source]

Analytic mean of the distribution, to be implemented by derived classes.

Note that this is optional, and currently only used for testing distributions.

Returns: Analytic mean. torch.autograd.Variable. NotImplementedError if mean cannot be analytically computed.
analytic_var(*args, **kwargs)[source]

Analytic variance of the distribution, to be implemented by derived classes.

Note that this is optional, and currently only used for testing distributions.

Returns: Analytic variance. torch.autograd.Variable. NotImplementedError if variance cannot be analytically computed.
batch_log_pdf(x, *args, **kwargs)[source]

Evaluates log probability densities for each of a batch of samples.

Parameters: x (torch.autograd.Variable) – A single value or a batch of values batched along axis 0. log probability densities as a one-dimensional torch.autograd.Variable with same batch size as value and params. The shape of the result should be self.batch_size(). torch.autograd.Variable
batch_shape(x=None, *args, **kwargs)[source]

The left-hand tensor shape of samples, used for batching.

Samples are of shape d.shape(x) == d.batch_shape(x) + d.event_shape().

Parameters: x – Data that is used to determine the batch shape. This is optional. If not specified, the distribution parameters are used to determine the shape of the batch that is returned from sample(). Tensor shape used for batching. torch.Size ValueError if the parameters are not broadcastable to the data shape
enumerable = False
enumerate_support(*args, **kwargs)[source]

Returns a representation of the parametrized distribution’s support.

This is implemented only by discrete distributions.

Returns: An iterator over the distribution’s discrete support. iterator
event_dim(*args, **kwargs)[source]
Returns: Number of dimensions of individual events. int
event_shape(x=None, *args, **kwargs)[source]

The right-hand tensor shape of samples, used for individual events. The event dimension(/s) is used to designate random variables that could potentially depend on each other, for instance in the case of Dirichlet or the categorical distribution, but could also simply be used for logical grouping, for example in the case of a normal distribution with a diagonal covariance matrix.

Samples are of shape d.shape(x) == d.batch_shape(x) + d.event_shape().

Returns: Tensor shape used for individual events. torch.Size
log_pdf(x, *args, **kwargs)[source]

Evaluates total log probability density of a batch of samples.

Parameters: x (torch.autograd.Variable) – A value. total log probability density as a one-dimensional torch.autograd.Variable of size 1. torch.autograd.Variable
reparameterized = False
sample(*args, **kwargs)[source]

Samples a random value.

For tensor distributions, the returned Variable should have the same .size() as the parameters, unless otherwise noted.

Returns: A random value or batch of random values (if parameters are batched). The shape of the result should be self.size(). torch.autograd.Variable
shape(x=None, *args, **kwargs)[source]

The tensor shape of samples from this distribution.

Samples are of shape d.shape(x) == d.batch_shape(x) + d.event_shape().

Returns: Tensor shape of samples. torch.Size

## Bernoulli¶

class Bernoulli(ps=None, logits=None, batch_size=None, log_pdf_mask=None, *args, **kwargs)[source]

Bernoulli distribution.

Distribution over a vector of independent Bernoulli variables. Each element of the vector takes on a value in {0, 1}.

This is often used in conjunction with torch.nn.Sigmoid to ensure the ps parameters are in the interval [0, 1].

Parameters: ps (torch.autograd.Variable) – Probabilities. Should lie in the interval [0,1]. logits – Log odds, i.e. $$\log(\frac{p}{1 - p})$$. Either ps or logits should be specified, but not both. batch_size – The number of elements in the batch used to generate a sample. The batch dimension will be the leftmost dimension in the generated sample. log_pdf_mask – Tensor that is applied to the batch log pdf values as a multiplier. The most common use case is supplying a boolean tensor mask to mask out certain batch sites in the log pdf computation.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
enumerable = True
enumerate_support()[source]

Returns the Bernoulli distribution’s support, as a tensor along the first dimension.

Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product. To iterate over the cartesian product, you must construct univariate Bernoullis and use itertools.product() over all univariate variables (may be expensive).

Returns: torch variable enumerating the support of the Bernoulli distribution. Each item in the return value, when enumerated along the first dimensions, yields a value from the distribution’s support which has the same dimension as would be returned by sample. torch.autograd.Variable.
event_shape()[source]
sample()[source]

## Beta¶

class Beta(alpha, beta, batch_size=None, *args, **kwargs)[source]

Univariate beta distribution parameterized by alpha and beta.

This is often used in conjunction with torch.nn.Softplus to ensure alpha and beta parameters are positive.

Parameters: alpha (torch.autograd.Variable) – Lower shape parameter. Should be positive. beta (torch.autograd.Variable) – Upper shape parameter. Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
sample()[source]

Ref: pyro.distributions.distribution.Distribution.sample.()

## Categorical¶

class Categorical(ps=None, vs=None, logits=None, one_hot=True, batch_size=None, log_pdf_mask=None, *args, **kwargs)[source]

Categorical (discrete) distribution.

Discrete distribution over elements of vs with $$P(vs[i]) \propto ps[i]$$. If one_hot=True, .sample() returns a one-hot vector; otherwise .sample() returns the category index.

Parameters: ps (torch.autograd.Variable) – Probabilities. These should be non-negative and normalized along the rightmost axis. logits (torch.autograd.Variable) – Log probability values. When exponentiated, these should sum to 1 along the last axis. Either ps or logits should be specified but not both. vs (list or numpy.ndarray or torch.autograd.Variable) – Optional list of values in the support. one_hot – Whether sample() returns a one_hot sample. Defaults to False if vs is specified, or True if vs is not specified. batch_size (int) – Optional number of elements in the batch used to generate a sample. The batch dimension will be the leftmost dimension in the generated sample.
batch_log_pdf(x)[source]

Evaluates log probability densities for one or a batch of samples and parameters. The last dimension for ps encodes the event probabilities, and the remaining dimensions are considered batch dimensions.

ps and vs are first broadcasted to the size of the data x. The data tensor is used to to create a mask over vs where a 1 in the mask indicates that the corresponding value in vs was selected. Since, ps and vs have the same size, this mask when applied over ps gives the probabilities of the selected events. The method returns the logarithm of these probabilities.

Returns: tensor with log probabilities for each of the batches. torch.autograd.Variable
batch_shape(x=None)[source]
enumerable = True
enumerate_support()[source]

Returns the categorical distribution’s support, as a tensor along the first dimension.

Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product. To iterate over the cartesian product, you must construct univariate Categoricals and use itertools.product() over all univariate variables (but this is very expensive).

Parameters: ps (torch.autograd.Variable) – Tensor where the last dimension denotes the event probabilities, p_k, which must sum to 1. The remaining dimensions are considered batch dimensions. vs (list or numpy.ndarray or torch.autograd.Variable) – Optional parameter, enumerating the items in the support. This could either have a numeric or string type. This should have the same dimension as ps. one_hot (boolean) – Denotes whether one hot encoding is enabled. This is True by default. When set to false, and no explicit vs is provided, the last dimension gives the one-hot encoded value from the support. Torch variable or numpy array enumerating the support of the categorical distribution. Each item in the return value, when enumerated along the first dimensions, yields a value from the distribution’s support which has the same dimension as would be returned by sample. If one_hot=True, the last dimension is used for the one-hot encoding. torch.autograd.Variable or numpy.ndarray.
event_shape()[source]
sample()[source]

Returns a sample which has the same shape as ps (or vs), except that if one_hot=True (and no vs is specified), the last dimension will have the same size as the number of events. The type of the sample is numpy.ndarray if vs is a list or a numpy array, else a tensor is returned.

Returns: sample from the Categorical distribution numpy.ndarray or torch.LongTensor
shape(x=None)[source]

## Cauchy¶

class Cauchy(mu, gamma, batch_size=None, *args, **kwargs)[source]

Cauchy (a.k.a. Lorentz) distribution.

This is a continuous distribution which is roughly the ratio of two Gaussians if the second Gaussian is zero mean. The distribution is over tensors that have the same shape as the parameters muand gamma, which in turn must have the same shape as each other.

This is often used in conjunction with torch.nn.Softplus to ensure the gamma parameter is positive.

Parameters: mu (torch.autograd.Variable) – Location parameter. gamma (torch.autograd.Variable) – Scale parameter. Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
sample()[source]

## Delta¶

class Delta(v, batch_size=None, *args, **kwargs)[source]

Degenerate discrete distribution (a single point).

Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.

Parameters: v (torch.autograd.Variable) – The single support element.
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
enumerable = True
enumerate_support(v=None)[source]

Returns the delta distribution’s support, as a tensor along the first dimension.

Parameters: v – torch variable where each element of the tensor represents the point at which the delta distribution is concentrated. torch variable enumerating the support of the delta distribution. torch.autograd.Variable.
event_shape()[source]
sample()[source]

## Normal¶

class Normal(mu, sigma, batch_size=None, log_pdf_mask=None, *args, **kwargs)[source]

Univariate normal (Gaussian) distribution.

A distribution over tensors in which each element is independent and Gaussian distributed, with its own mean and standard deviation. The distribution is over tensors that have the same shape as the parameters mu and sigma, which in turn must have the same shape as each other.

This is often used in conjunction with torch.nn.Softplus to ensure the sigma parameters are positive.

Parameters: mu (torch.autograd.Variable) – Means. sigma (torch.autograd.Variable) – Standard deviations. Should be positive and the same shape as mu.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]

Diagonal Normal log-likelihood

batch_shape(x=None)[source]
event_shape()[source]
reparameterized = True
sample()[source]

Reparameterized Normal sampler.

## Exponential¶

class Exponential(lam, batch_size=None, *args, **kwargs)[source]

Exponential parameterized by scale lambda.

This is often used in conjunction with torch.nn.Softplus to ensure the lam parameter is positive.

Parameters: lam (torch.autograd.Variable) – Scale parameter (a.k.a. lambda). Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
reparameterized = True
sample()[source]

Reparameterized sampler.

## Gamma¶

class Gamma(alpha, beta, batch_size=None, *args, **kwargs)[source]

Gamma distribution parameterized by alpha and beta.

This is often used in conjunction with torch.nn.Softplus to ensure alpha and beta parameters are positive.

Parameters: alpha (torch.autograd.Variable) – Shape parameter. Should be positive. beta (torch.autograd.Variable) – Shape parameter. Should be positive. Shouldb be the same shape as alpha.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
sample()[source]

## HalfCauchy¶

class HalfCauchy(mu, gamma, batch_size=None, *args, **kwargs)[source]

Half-Cauchy distribution.

This is a continuous distribution with lower-bounded domain (x > mu). See also the Cauchy distribution.

This is often used in conjunction with torch.nn.Softplus to ensure the gamma parameter is positive.

Parameters: mu – mean (tensor) gamma – scale (tensor (0, Infinity))
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
sample()[source]

## LogNormal¶

class LogNormal(mu, sigma, batch_size=None, *args, **kwargs)[source]

Log-normal distribution.

A distribution over probability vectors obtained by exp-transforming a random variable drawn from Normal({mu: mu, sigma: sigma}).

This is often used in conjunction with torch.nn.Softplus to ensure the sigma parameters are positive.

Parameters: mu (torch.autograd.Variable) – log mean parameter. sigma (torch.autograd.Variable) – log standard deviations. Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
reparameterized = True
sample()[source]

Reparameterized log-normal sampler. Ref: pyro.distributions.distribution.Distribution.sample()

## Multinomial¶

class Multinomial(ps, n, batch_size=None, *args, **kwargs)[source]

Multinomial distribution.

Distribution over counts for n independent Categorical(ps) trials.

This is often used in conjunction with torch.nn.Softmax to ensure probabilites ps are normalized.

Parameters: ps (torch.autograd.Variable) – Probabilities (real). Should be positive and should normalized over the rightmost axis. n (int) – Number of trials. Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
expanded_sample()[source]
sample()[source]

## Poisson¶

class Poisson(lam, batch_size=None, *args, **kwargs)[source]

Poisson distribution over integers parameterized by scale lambda.

This is often used in conjunction with torch.nn.Softplus to ensure the lam parameter is positive.

Parameters: lam (torch.autograd.Variable) – Mean parameter (a.k.a. lambda). Should be positive.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
sample()[source]

## Uniform¶

class Uniform(a, b, batch_size=None, *args, **kwargs)[source]

Uniform distribution over the continuous interval [a, b].

Parameters: a (torch.autograd.Variable) – lower bound (real). b (torch.autograd.Variable) – upper bound (real). Should be greater than a.
analytic_mean()[source]
analytic_var()[source]
batch_log_pdf(x)[source]
batch_shape(x=None)[source]
event_shape()[source]
reparameterized = False
sample()[source]
shape(x=None)[source]