Pyro Documentation¶
Getting Started¶
 Install Pyro.
 Learn the basic concepts of Pyro: models and inference.
 Dive in to other tutorials and examples.
Primitives¶

get_param_store
()[source]¶ Returns the global
ParamStoreDict
.

clear_param_store
()[source]¶ Clears the global
ParamStoreDict
.This is especially useful if you’re working in a REPL. We recommend calling this before each training loop (to avoid leaking parameters from past models), and before each unit test (to avoid leaking parameters across tests).

param
(name, init_tensor=None, constraint=Real(), event_dim=None)[source]¶ Saves the variable as a parameter in the param store. To interact with the param store or write to disk, see Parameters.
Parameters:  name (str) – name of parameter
 init_tensor (torch.Tensor or callable) – initial tensor or lazy callable that returns a tensor.
For large tensors, it may be cheaper to write e.g.
lambda: torch.randn(100000)
, which will only be evaluated on the initial statement.  constraint (torch.distributions.constraints.Constraint) – torch constraint, defaults to
constraints.real
.  event_dim (int) – (optional) number of rightmost dimensions unrelated to batching. Dimension to the left of this will be considered batch dimensions; if the param statement is inside a subsampled plate, then corresponding batch dimensions of the parameter will be correspondingly subsampled. If unspecified, all dimensions will be considered event dims and no subsampling will be performed.
Returns: A constrained parameter. The underlying unconstrained parameter is accessible via
pyro.param(...).unconstrained()
, where.unconstrained
is a weakref attribute.Return type:

sample
(name, fn, *args, **kwargs)[source]¶ Calls the stochastic function
fn
with additional sideeffects depending onname
and the enclosing context (e.g. an inference algorithm). See Intro I and Intro II for a discussion.Parameters:  name – name of sample
 fn – distribution class or function
 obs – observed datum (optional; should only be used in context of inference) optionally specified in kwargs
 obs_mask (bool or Tensor) – Optional boolean tensor mask of shape
broadcastable with
fn.batch_shape
. If provided, events with mask=True will be conditioned onobs
and remaining events will be imputed by sampling. This introduces a latent sample site namedname + "_unobserved"
which should be used by guides.  infer (dict) – Optional dictionary of inference parameters specified in kwargs. See inference documentation for details.
Returns: sample

factor
(name, log_factor)[source]¶ Factor statement to add arbitrary log probability factor to a probabilisitic model.
Warning
Beware using factor statements in guides. Factor statements assume
log_factor
is computed from nonreparametrized statements such as observation statementspyro.sample(..., obs=...)
. If insteadlog_factor
is computed from e.g. the Jacobian determinant of a transformation of a reparametrized variable, factor statements in the guide will result in incorrect results.Parameters:  name (str) – Name of the trivial sample
 log_factor (torch.Tensor) – A possibly batched log probability factor.

deterministic
(name, value, event_dim=None)[source]¶ Deterministic statement to add a
Delta
site with name name and value value to the trace. This is useful when we want to record values which are completely determined by their parents. For example:x = pyro.sample("x", dist.Normal(0, 1)) x2 = pyro.deterministic("x2", x ** 2)
Note
The site does not affect the model density. This currently converts to a
sample()
statement, but may change in the future.Parameters:  name (str) – Name of the site.
 value (torch.Tensor) – Value of the site.
 event_dim (int) – Optional event dimension, defaults to value.ndim.

subsample
(data, event_dim)[source]¶ Subsampling statement to subsample data tensors based on enclosing
plate
s.This is typically called on arguments to
model()
when subsampling is performed automatically byplate
s by passing either thesubsample
orsubsample_size
kwarg. For example the following are equivalent:# Version 1. using indexing def model(data): with pyro.plate("data", len(data), subsample_size=10, dim=data.dim()) as ind: data = data[ind] # ... # Version 2. using pyro.subsample() def model(data): with pyro.plate("data", len(data), subsample_size=10, dim=data.dim()): data = pyro.subsample(data, event_dim=0) # ...
Parameters: Returns: A subsampled version of
data
Return type:

class
plate
(name, size=None, subsample_size=None, subsample=None, dim=None, use_cuda=None, device=None)[source]¶ Bases:
pyro.poutine.plate_messenger.PlateMessenger
Construct for conditionally independent sequences of variables.
plate
can be used either sequentially as a generator or in parallel as a context manager (formerlyirange
andiarange
, respectively).Sequential
plate
is similar torange()
in that it generates a sequence of values.Vectorized
plate
is similar totorch.arange()
in that it yields an array of indices by which other tensors can be indexed.plate
differs fromtorch.arange()
in that it also informs inference algorithms that the variables being indexed are conditionally independent. To do this,plate
is a provided as context manager rather than a function, and users must guarantee that all computation within anplate
context is conditionally independent:with pyro.plate("name", size) as ind: # ...do conditionally independent stuff with ind...
Additionally,
plate
can take advantage of the conditional independence assumptions by subsampling the indices and informing inference algorithms to scale various computed values. This is typically used to subsample minibatches of data:with pyro.plate("data", len(data), subsample_size=100) as ind: batch = data[ind] assert len(batch) == 100
By default
subsample_size=False
and this simply yields atorch.arange(0, size)
. If0 < subsample_size <= size
this yields a single random batch of indices of sizesubsample_size
and scales all log likelihood terms bysize/batch_size
, within this context.Warning
This is only correct if all computation is conditionally independent within the context.
Parameters:  name (str) – A unique name to help inference algorithms match
plate
sites between models and guides.  size (int) – Optional size of the collection being subsampled (like stop in builtin range).
 subsample_size (int) – Size of minibatches used in subsampling. Defaults to size.
 subsample (Anything supporting len().) – Optional custom subsample for userdefined subsampling schemes. If specified, then subsample_size will be set to len(subsample).
 dim (int) – An optional dimension to use for this independence index.
If specified,
dim
should be negative, i.e. should index from the right. If not specified,dim
is set to the rightmost dim that is left of all enclosingplate
contexts.  use_cuda (bool) – DEPRECATED, use the device arg instead.
Optional bool specifying whether to use cuda tensors for subsample
and log_prob. Defaults to
torch.Tensor.is_cuda
.  device (str) – Optional keyword specifying which device to place the results of subsample and log_prob on. By default, results are placed on the same device as the default tensor.
Returns: A reusabe context manager yielding a single 1dimensional
torch.Tensor
of indices.Examples:
>>> # This version declares sequential independence and subsamples data: >>> for i in pyro.plate('data', 100, subsample_size=10): ... if z[i]: # Control flow in this example prevents vectorization. ... obs = pyro.sample(f'obs_{i}', dist.Normal(loc, scale), ... obs=data[i])
>>> # This version declares vectorized independence: >>> with pyro.plate('data'): ... obs = pyro.sample('obs', dist.Normal(loc, scale), obs=data)
>>> # This version subsamples data in vectorized way: >>> with pyro.plate('data', 100, subsample_size=10) as ind: ... obs = pyro.sample('obs', dist.Normal(loc, scale), obs=data[ind])
>>> # This wraps a userdefined subsampling method for use in pyro: >>> ind = torch.randint(0, 100, (10,)).long() # custom subsample >>> with pyro.plate('data', 100, subsample=ind): ... obs = pyro.sample('obs', dist.Normal(loc, scale), obs=data[ind])
>>> # This reuses two different independence contexts. >>> x_axis = pyro.plate('outer', 320, dim=1) >>> y_axis = pyro.plate('inner', 200, dim=2) >>> with x_axis: ... x_noise = pyro.sample("x_noise", dist.Normal(loc, scale)) ... assert x_noise.shape == (320,) >>> with y_axis: ... y_noise = pyro.sample("y_noise", dist.Normal(loc, scale)) ... assert y_noise.shape == (200, 1) >>> with x_axis, y_axis: ... xy_noise = pyro.sample("xy_noise", dist.Normal(loc, scale)) ... assert xy_noise.shape == (200, 320)
See SVI Part II for an extended discussion.
 name (str) – A unique name to help inference algorithms match

plate_stack
(prefix, sizes, rightmost_dim=1)[source]¶ Create a contiguous stack of
plate
s with dimensions:rightmost_dim  len(sizes), ..., rightmost_dim
Parameters:

module
(name, nn_module, update_module_params=False)[source]¶ Registers all parameters of a
torch.nn.Module
with Pyro’sparam_store
. In conjunction with theParamStoreDict
save()
andload()
functionality, this allows the user to save and load modules.Note
Consider instead using
PyroModule
, a newer alternative topyro.module()
that has better support for: jitting, serving in C++, and converting parameters to random variables. For details see the Modules Tutorial .Parameters:  name (str) – name of module
 nn_module (torch.nn.Module) – the module to be registered with Pyro
 update_module_params – determines whether Parameters in the PyTorch module get overridden with the values found in the ParamStore (if any). Defaults to False
Returns: torch.nn.Module

random_module
(name, nn_module, prior, *args, **kwargs)[source]¶ Warning
The random_module primitive is deprecated, and will be removed in a future release. Use
PyroModule
instead to to create Bayesian modules fromtorch.nn.Module
instances. See the Bayesian Regression tutorial for an example.DEPRECATED Places a prior over the parameters of the module nn_module. Returns a distribution (callable) over nn.Modules, which upon calling returns a sampled nn.Module.
Parameters:  name (str) – name of pyro module
 nn_module (torch.nn.Module) – the module to be registered with pyro
 prior – pyro distribution, stochastic function, or python dict with parameter names as keys and respective distributions/stochastic functions as values.
Returns: a callable which returns a sampled module

barrier
(data)[source]¶ EXPERIMENTAL Ensures all values in
data
are ground, rather than lazy funsor values. This is useful in combination withpyro.poutine.collapse()
.

enable_validation
(is_validate=True)[source]¶ Enable or disable validation checks in Pyro. Validation checks provide useful warnings and errors, e.g. NaN checks, validating distribution arguments and support values, detecting incorrect use of ELBO and MCMC. Since some of these checks may be expensive, you may want to disable validation of mature models to speed up inference.
The default behavior mimics Python’s
assert
statement: validation is on by default, but is disabled if Python is run in optimized mode (viapython O
). Equivalently, the default behavior depends on Python’s global__debug__
value viapyro.enable_validation(__debug__)
.Validation is temporarily disabled during jit compilation, for all inference algorithms that support the PyTorch jit. We recommend developing models with nonjitted inference algorithms to ease debugging, then optionally moving to jitted inference once a model is correct.
Parameters: is_validate (bool) – (optional; defaults to True) whether to enable validation checks.

validation_enabled
(is_validate=True)[source]¶ Context manager that is useful when temporarily enabling/disabling validation checks.
Parameters: is_validate (bool) – (optional; defaults to True) temporary validation check override.

trace
(fn=None, ignore_warnings=False, jit_options=None)[source]¶ Lazy replacement for
torch.jit.trace()
that works with Pyro functions that callpyro.param()
.The actual compilation artifact is stored in the
compiled
attribute of the output. Call diagnostic methods on this attribute.Example:
def model(x): scale = pyro.param("scale", torch.tensor(0.5), constraint=constraints.positive) return pyro.sample("y", dist.Normal(x, scale)) @pyro.ops.jit.trace def model_log_prob_fn(x, y): cond_model = pyro.condition(model, data={"y": y}) tr = pyro.poutine.trace(cond_model).get_trace(x) return tr.log_prob_sum()
Parameters:  fn (callable) – The function to be traced.
 ignore_warnins (bool) – Whether to ignore jit warnings.
 jit_options (dict) – Optional dict of options to pass to
torch.jit.trace()
, e.g.{"optimize": False}
.
Inference¶
In the context of probabilistic modeling, learning is usually called inference. In the particular case of Bayesian inference, this often involves computing (approximate) posterior distributions. In the case of parameterized models, this usually involves some sort of optimization. Pyro supports multiple inference algorithms, with support for stochastic variational inference (SVI) being the most extensive. Look here for more inference algorithms in future versions of Pyro.
See Intro II for a discussion of inference in Pyro.
SVI¶

class
SVI
(model, guide, optim, loss, loss_and_grads=None, num_samples=0, num_steps=0, **kwargs)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Parameters:  model – the model (callable containing Pyro primitives)
 guide – the guide (callable containing Pyro primitives)
 optim (PyroOptim) – a wrapper a for a PyTorch optimizer
 loss (pyro.infer.elbo.ELBO) – an instance of a subclass of
ELBO
. Pyro provides three builtin losses:Trace_ELBO
,TraceGraph_ELBO
, andTraceEnum_ELBO
. See theELBO
docs to learn how to implement a custom loss.  num_samples – (DEPRECATED) the number of samples for Monte Carlo posterior approximation
 num_steps – (DEPRECATED) the number of optimization steps to take in
run()
A unified interface for stochastic variational inference in Pyro. The most commonly used loss is
loss=Trace_ELBO()
. See the tutorial SVI Part I for a discussion.
evaluate_loss
(*args, **kwargs)[source]¶ Returns: estimate of the loss Return type: float Evaluate the loss function. Any args or kwargs are passed to the model and guide.

run
(*args, **kwargs)[source]¶ Warning
This method is deprecated, and will be removed in a future release. For inference, use
step()
directly, and for predictions, use thePredictive
class.
ELBO¶

class
ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
object
ELBO
is the toplevel interface for stochastic variational inference via optimization of the evidence lower bound.Most users will not interact with this base class
ELBO
directly; instead they will create instances of derived classes:Trace_ELBO
,TraceGraph_ELBO
, orTraceEnum_ELBO
.Parameters:  num_particles – The number of particles/samples used to form the ELBO (gradient) estimators.
 max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is only required when enumerating over sample sites in parallel, e.g. if a site setsinfer={"enumerate": "parallel"}
. If omitted, ELBO may guess a valid value by running the (model,guide) pair once, however this guess may be incorrect if model or guide structure is dynamic.  vectorize_particles (bool) – Whether to vectorize the ELBO computation over num_particles. Defaults to False. This requires static structure in model and guide.
 strict_enumeration_warning (bool) – Whether to warn about possible
misuse of enumeration, i.e. that
pyro.infer.traceenum_elbo.TraceEnum_ELBO
is used iff there are enumerated sample sites.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer. When this is True, all
torch.jit.TracerWarning
will be ignored. Defaults to False.  jit_options (bool) – Optional dict of options to pass to
torch.jit.trace()
, e.g.{"check_trace": True}
.  retain_graph (bool) – Whether to retain autograd graph during an SVI step. Defaults to None (False).
 tail_adaptive_beta (float) – Exponent beta with
1.0 <= beta < 0.0
for use with TraceTailAdaptive_ELBO.
References
[1] Automated Variational Inference in Probabilistic Programming David Wingate, Theo Weber
[2] Black Box Variational Inference, Rajesh Ranganath, Sean Gerrish, David M. Blei

class
Trace_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A trace implementation of ELBObased SVI. The estimator is constructed along the lines of references [1] and [2]. There are no restrictions on the dependency structure of the model or the guide. The gradient estimator includes partial RaoBlackwellization for reducing the variance of the estimator when nonreparameterizable random variables are present. The RaoBlackwellization is partial in that it only uses conditional independence information that is marked by
plate
contexts. For more finegrained RaoBlackwellization, seeTraceGraph_ELBO
.References
 [1] Automated Variational Inference in Probabilistic Programming,
 David Wingate, Theo Weber
 [2] Black Box Variational Inference,
 Rajesh Ranganath, Sean Gerrish, David M. Blei

loss
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Evaluates the ELBO with an estimator that uses num_particles many samples/particles.

class
JitTrace_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
Like
Trace_ELBO
but usespyro.ops.jit.compile()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceGraph_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A TraceGraph implementation of ELBObased SVI. The gradient estimator is constructed along the lines of reference [1] specialized to the case of the ELBO. It supports arbitrary dependency structure for the model and guide as well as baselines for nonreparameterizable random variables. Where possible, conditional dependency information as recorded in the
Trace
is used to reduce the variance of the gradient estimator. In particular two kinds of conditional dependency information are used to reduce variance: the sequential order of samples (z is sampled after y => y does not depend on z)
plate
generators
References
 [1] Gradient Estimation Using Stochastic Computation Graphs,
 John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
 [2] Neural Variational Inference and Learning in Belief Networks
 Andriy Mnih, Karol Gregor

loss
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Evaluates the ELBO with an estimator that uses num_particles many samples/particles.

loss_and_grads
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Computes the ELBO as well as the surrogate ELBO that is used to form the gradient estimator. Performs backward on the latter. Num_particle many samples are used to form the estimators. If baselines are present, a baseline loss is also constructed and differentiated.

class
JitTraceGraph_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.tracegraph_elbo.TraceGraph_ELBO
Like
TraceGraph_ELBO
but usestorch.jit.trace()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
BackwardSampleMessenger
(enum_trace, guide_trace)[source]¶ Bases:
pyro.poutine.messenger.Messenger
Implements forward filtering / backward sampling for sampling from the joint posterior distribution

class
TraceEnum_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A trace implementation of ELBObased SVI that supports  exhaustive enumeration over discrete sample sites, and  local parallel sampling over any sample site in the guide.
To enumerate over a sample site in the
guide
, mark the site with eitherinfer={'enumerate': 'sequential'}
orinfer={'enumerate': 'parallel'}
. To configure all guide sites at once, useconfig_enumerate()
. To enumerate over a sample site in themodel
, mark the siteinfer={'enumerate': 'parallel'}
and ensure the site does not appear in theguide
.This assumes restricted dependency structure on the model and guide: variables outside of an
plate
can never depend on variables inside thatplate
.
loss
(model, guide, *args, **kwargs)[source]¶ Returns: an estimate of the ELBO Return type: float Estimates the ELBO using
num_particles
many samples (particles).

differentiable_loss
(model, guide, *args, **kwargs)[source]¶ Returns: a differentiable estimate of the ELBO Return type: torch.Tensor Raises: ValueError – if the ELBO is not differentiable (e.g. is identically zero) Estimates a differentiable ELBO using
num_particles
many samples (particles). The result should be infinitely differentiable (as long as underlying derivatives have been implemented).

loss_and_grads
(model, guide, *args, **kwargs)[source]¶ Returns: an estimate of the ELBO Return type: float Estimates the ELBO using
num_particles
many samples (particles). Performs backward on the ELBO of each particle.


class
JitTraceEnum_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.traceenum_elbo.TraceEnum_ELBO
Like
TraceEnum_ELBO
but usespyro.ops.jit.compile()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceMeanField_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
A trace implementation of ELBObased SVI. This is currently the only ELBO estimator in Pyro that uses analytic KL divergences when those are available.
In contrast to, e.g.,
TraceGraph_ELBO
andTrace_ELBO
this estimator places restrictions on the dependency structure of the model and guide. In particular it assumes that the guide has a meanfield structure, i.e. that it factorizes across the different latent variables present in the guide. It also assumes that all of the latent variables in the guide are reparameterized. This latter condition is satisfied for, e.g., the Normal distribution but is not satisfied for, e.g., the Categorical distribution.Warning
This estimator may give incorrect results if the meanfield condition is not satisfied.
Note for advanced users:
The mean field condition is a sufficient but not necessary condition for this estimator to be correct. The precise condition is that for every latent variable z in the guide, its parents in the model must not include any latent variables that are descendants of z in the guide. Here ‘parents in the model’ and ‘descendants in the guide’ is with respect to the corresponding (statistical) dependency structure. For example, this condition is always satisfied if the model and guide have identical dependency structures.

class
JitTraceMeanField_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_mean_field_elbo.TraceMeanField_ELBO
Like
TraceMeanField_ELBO
but usespyro.ops.jit.trace()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceTailAdaptive_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
Interface for Stochastic Variational Inference with an adaptive fdivergence as described in ref. [1]. Users should specify num_particles > 1 and vectorize_particles==True. The argument tail_adaptive_beta can be specified to modify how the adaptive fdivergence is constructed. See reference for details.
Note that this interface does not support computing the varational objective itself; rather it only supports computing gradients of the variational objective. Consequently, one might want to use another SVI interface (e.g. RenyiELBO) in order to monitor convergence.
Note that this interface only supports models in which all the latent variables are fully reparameterized. It also does not support data subsampling.
References [1] “Variational Inference with Tailadaptive fDivergence”, Dilin Wang, Hao Liu, Qiang Liu, NeurIPS 2018 https://papers.nips.cc/paper/7816variationalinferencewithtailadaptivefdivergence

class
RenyiELBO
(alpha=0, num_particles=2, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True)[source]¶ Bases:
pyro.infer.elbo.ELBO
An implementation of Renyi’s \(\alpha\)divergence variational inference following reference [1].
In order for the objective to be a strict lower bound, we require \(\alpha \ge 0\). Note, however, that according to reference [1], depending on the dataset \(\alpha < 0\) might give better results. In the special case \(\alpha = 0\), the objective function is that of the important weighted autoencoder derived in reference [2].
Note
Setting \(\alpha < 1\) gives a better bound than the usual ELBO. For \(\alpha = 1\), it is better to use
Trace_ELBO
class because it helps reduce variances of gradient estimations.Parameters:  alpha (float) – The order of \(\alpha\)divergence. Here \(\alpha \neq 1\). Default is 0.
 num_particles – The number of particles/samples used to form the objective (gradient) estimator. Default is 2.
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts. Default is infinity.  strict_enumeration_warning (bool) – Whether to warn about possible
misuse of enumeration, i.e. that
TraceEnum_ELBO
is used iff there are enumerated sample sites.
References:
 [1] Renyi Divergence Variational Inference,
 Yingzhen Li, Richard E. Turner
 [2] Importance Weighted Autoencoders,
 Yuri Burda, Roger Grosse, Ruslan Salakhutdinov

class
TraceTMC_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A tracebased implementation of Tensor Monte Carlo [1] by way of Tensor Variable Elimination [2] that supports:  local parallel sampling over any sample site in the model or guide  exhaustive enumeration over any sample site in the model or guide
To take multiple samples, mark the site with
infer={'enumerate': 'parallel', 'num_samples': N}
. To configure all sites in a model or guide at once, useconfig_enumerate()
. To enumerate or sample a sample site in themodel
, mark the site and ensure the site does not appear in theguide
.This assumes restricted dependency structure on the model and guide: variables outside of an
plate
can never depend on variables inside thatplate
.References
 [1] Tensor Monte Carlo: Particle Methods for the GPU Era,
 Laurence Aitchison (2018)
 [2] Tensor Variable Elimination for Plated Factor Graphs,
 Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexander Rush, Noah Goodman (2019)

differentiable_loss
(model, guide, *args, **kwargs)[source]¶ Returns: a differentiable estimate of the marginal loglikelihood Return type: torch.Tensor Raises: ValueError – if the ELBO is not differentiable (e.g. is identically zero) Computes a differentiable TMC estimate using
num_particles
many samples (particles). The result should be infinitely differentiable (as long as underlying derivatives have been implemented).
Importance¶

class
Importance
(model, guide=None, num_samples=None)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Parameters:  model – probabilistic model defined as a function
 guide – guide used for sampling defined as a function
 num_samples – number of samples to draw from the guide (default 10)
This method performs posterior inference by importance sampling using the guide as the proposal distribution. If no guide is provided, it defaults to proposing from the model’s prior.

psis_diagnostic
(model, guide, *args, **kwargs)[source]¶ Computes the Pareto tail index k for a model/guide pair using the technique described in [1], which builds on previous work in [2]. If \(0 < k < 0.5\) the guide is a good approximation to the model posterior, in the sense described in [1]. If \(0.5 \le k \le 0.7\), the guide provides a suboptimal approximation to the posterior, but may still be useful in practice. If \(k > 0.7\) the guide program provides a poor approximation to the full posterior, and caution should be used when using the guide. Note, however, that a guide may be a poor fit to the full posterior while still yielding reasonable model predictions. If \(k < 0.0\) the importance weights corresponding to the model and guide appear to be bounded from above; this would be a bizarre outcome for a guide trained via ELBO maximization. Please see [1] for a more complete discussion of how the tail index k should be interpreted.
Please be advised that a large number of samples may be required for an accurate estimate of k.
Note that we assume that the model and guide are both vectorized and have static structure. As is canonical in Pyro, the args and kwargs are passed to the model and guide.
References [1] ‘Yes, but Did It Work?: Evaluating Variational Inference.’ Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman [2] ‘Pareto Smoothed Importance Sampling.’ Aki Vehtari, Andrew Gelman, Jonah Gabry
Parameters:  model (callable) – the model program.
 guide (callable) – the guide program.
 num_particles (int) – the total number of times we run the model and guide in order to compute the diagnostic. defaults to 1000.
 max_simultaneous_particles – the maximum number of simultaneous samples drawn from the model and guide. defaults to num_particles. num_particles must be divisible by max_simultaneous_particles. compute the diagnostic. defaults to 1000.
 max_plate_nesting (int) – optional bound on max number of nested
pyro.plate()
contexts in the model/guide. defaults to 7.
Returns float: the PSIS diagnostic k

vectorized_importance_weights
(model, guide, *args, **kwargs)[source]¶ Parameters:  model – probabilistic model defined as a function
 guide – guide used for sampling defined as a function
 num_samples – number of samples to draw from the guide (default 1)
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts.  normalized (bool) – set to True to return selfnormalized importance weights
Returns: returns a
(num_samples,)
shaped tensor of importance weights and the model and guide traces that produced themVectorized computation of importance weights for models with static structure:
log_weights, model_trace, guide_trace = \ vectorized_importance_weights(model, guide, *args, num_samples=1000, max_plate_nesting=4, normalized=False)
Reweighted WakeSleep¶

class
ReweightedWakeSleep
(num_particles=2, insomnia=1.0, model_has_params=True, num_sleep_particles=None, vectorize_particles=True, max_plate_nesting=inf, strict_enumeration_warning=True)[source]¶ Bases:
pyro.infer.elbo.ELBO
An implementation of Reweighted Wake Sleep following reference [1].
Note
Sampling and log_prob evaluation asymptotic complexity:
 Using waketheta and/or wakephi
 O(num_particles) samples from guide, O(num_particles) log_prob evaluations of model and guide
 Using sleepphi
 O(num_sleep_particles) samples from model, O(num_sleep_particles) log_prob evaluations of guide
 if 1) and 2) are combined,
 O(num_particles) samples from the guide, O(num_sleep_particles) from the model, O(num_particles + num_sleep_particles) log_prob evaluations of the guide, and O(num_particles) evaluations of the model
Note
This is particularly useful for models with stochastic branching, as described in [2].
Note
This returns _two_ losses, one each for (a) the model parameters (theta), computed using the iwae objective, and (b) the guide parameters (phi), computed using (a combination of) the csis objective and a selfnormalized importancesampled version of the csis objective.
Note
In order to enable computing the sleepphi terms, the guide program must have its observations explicitly passed in through the keyworded argument observations. Where the value of the observations is unknown during definition, such as for amortized variational inference, it may be given a default argument as observations=None, and the correct value supplied during learning through svi.step(observations=…).
Warning
Minibatch training is not supported yet.
Parameters:  num_particles (int) – The number of particles/samples used to form the objective (gradient) estimator. Default is 2.
 insomnia – The scaling between the wakephi and sleepphi terms. Default is 1.0 [wakephi]
 model_has_params (bool) – Indicate if model has learnable params. Useful in avoiding extra computation when running in pure sleep mode [csis]. Default is True.
 num_sleep_particles (int) – The number of particles used to form the sleepphi estimator. Matches num_particles by default.
 vectorize_particles (bool) – Whether the traces should be vectorised across num_particles. Default is True.
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts. Default is infinity.  strict_enumeration_warning (bool) – Whether to warn about possible
misuse of enumeration, i.e. that
TraceEnum_ELBO
is used iff there are enumerated sample sites.
References:
 [1] Reweighted WakeSleep,
 Jörg Bornschein, Yoshua Bengio
 [2] Revisiting Reweighted WakeSleep for Models with Stochastic Control Flow,
 Tuan Anh Le, Adam R. Kosiorek, N. Siddharth, Yee Whye Teh, Frank Wood
Sequential Monte Carlo¶

exception
SMCFailed
[source]¶ Bases:
ValueError
Exception raised when
SMCFilter
fails to find any hypothesis with nonzero probability.

class
SMCFilter
(model, guide, num_particles, max_plate_nesting, *, ess_threshold=0.5)[source]¶ Bases:
object
SMCFilter
is the toplevel interface for filtering via sequential monte carlo.The model and guide should be objects with two methods:
.init(state, ...)
and.step(state, ...)
, intended to be called first withinit()
, then withstep()
repeatedly. These two methods should have the same signature asSMCFilter
‘sinit()
andstep()
of this class, but with an extra first argumentstate
that should be used to store all tensors that depend on sampled variables. Thestate
will be a dictlike object,SMCState
, with arbitrary keys andtorch.Tensor
values. Models can read and writestate
but guides can only read from it.Inference complexity is
O(len(state) * num_time_steps)
, so to avoid quadratic complexity in Markov models, ensure thatstate
has fixed size.Parameters:  model (object) – probabilistic model with
init
andstep
methods  guide (object) – guide used for sampling, with
init
andstep
methods  num_particles (int) – The number of particles used to form the distribution.
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts.  ess_threshold (float) – Effective sample size threshold for deciding
when to importance resample: resampling occurs when
ess < ess_threshold * num_particles
.

get_empirical
()[source]¶ Returns: a marginal distribution over all state tensors. Return type: a dictionary with keys which are latent variables and values which are Empirical
objects.
 model (object) – probabilistic model with

class
SMCState
(num_particles)[source]¶ Bases:
dict
Dictionarylike object to hold a vectorized collection of tensors to represent all state during inference with
SMCFilter
. During inference, theSMCFilter
resample these tensors.Keys may have arbitrary hashable type. Values must be
torch.Tensor
s.Parameters: num_particles (int) –
Stein Methods¶

class
IMQSteinKernel
(alpha=0.5, beta=0.5, bandwidth_factor=None)[source]¶ Bases:
pyro.infer.svgd.SteinKernel
An IMQ (inverse multiquadratic) kernel for use in the SVGD inference algorithm [1]. The bandwidth of the kernel is chosen from the particles using a simple heuristic as in reference [2]. The kernel takes the form
\(K(x, y) = (\alpha + xy^2/h)^{\beta}\)
where \(\alpha\) and \(\beta\) are userspecified parameters and \(h\) is the bandwidth.
Parameters: Variables: bandwidth_factor (float) – Property that controls the factor by which to scale the bandwidth at each iteration.
References
[1] “Stein Points,” Wilson Ye Chen, Lester Mackey, Jackson Gorham, FrancoisXavier Briol, Chris. J. Oates. [2] “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,” Qiang Liu, Dilin Wang

bandwidth_factor
¶


class
RBFSteinKernel
(bandwidth_factor=None)[source]¶ Bases:
pyro.infer.svgd.SteinKernel
A RBF kernel for use in the SVGD inference algorithm. The bandwidth of the kernel is chosen from the particles using a simple heuristic as in reference [1].
Parameters: bandwidth_factor (float) – Optional factor by which to scale the bandwidth, defaults to 1.0. Variables: bandwidth_factor (float) – Property that controls the factor by which to scale the bandwidth at each iteration. References
 [1] “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,”
 Qiang Liu, Dilin Wang

bandwidth_factor
¶

class
SVGD
(model, kernel, optim, num_particles, max_plate_nesting, mode='univariate')[source]¶ Bases:
object
A basic implementation of Stein Variational Gradient Descent as described in reference [1].
Parameters:  model – The model (callable containing Pyro primitives). Model must be fully vectorized and may only contain continuous latent variables.
 kernel – a SVGD compatible kernel like
RBFSteinKernel
.  optim (pyro.optim.PyroOptim) – A wrapper for a PyTorch optimizer.
 num_particles (int) – The number of particles used in SVGD.
 max_plate_nesting (int) – The max number of nested
pyro.plate()
contexts in the model.  mode (str) – Whether to use a Kernelized Stein Discrepancy that makes use of multivariate test functions (as in [1]) or univariate test functions (as in [2]). Defaults to univariate.
Example usage:
from pyro.infer import SVGD, RBFSteinKernel from pyro.optim import Adam kernel = RBFSteinKernel() adam = Adam({"lr": 0.1}) svgd = SVGD(model, kernel, adam, num_particles=50, max_plate_nesting=0) for step in range(500): svgd.step(model_arg1, model_arg2) final_particles = svgd.get_named_particles()
References
 [1] “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,”
 Qiang Liu, Dilin Wang
 [2] “Kernelized Complete Conditional Stein Discrepancy,”
 Raghav Singhal, Saad Lahlou, Rajesh Ranganath

class
SteinKernel
[source]¶ Bases:
object
Abstract class for kernels used in the
SVGD
inference algorithm.
log_kernel_and_grad
(particles)[source]¶ Compute the component kernels and their gradients.
Parameters: particles – a tensor with shape (N, D) Returns: A pair (log_kernel, kernel_grad) where log_kernel is a (N, N, D)shaped tensor equal to the logarithm of the kernel and kernel_grad is a (N, N, D)shaped tensor where the entry (n, m, d) represents the derivative of log_kernel w.r.t. x_{m,d}, where x_{m,d} is the d^th dimension of particle m.

Likelihood free methods¶

class
EnergyDistance
(beta=1.0, prior_scale=0.0, num_particles=2, max_plate_nesting=inf)[source]¶ Bases:
object
Posterior predictive energy distance [1,2] with optional Bayesian regularization by the prior.
Let p(x,z)=p(z) p(xz) be the model, q(zx) be the guide. Then given data x and drawing an iid pair of samples \((Z,X)\) and \((Z',X')\) (where Z is latent and X is the posterior predictive),
\[\begin{split}& Z \sim q(zx); \quad X \sim p(xZ) \\ & Z' \sim q(zx); \quad X' \sim p(xZ') \\ & loss = \mathbb E_X \Xx\^\beta  \frac 1 2 \mathbb E_{X,X'}\XX'\^\beta  \lambda \mathbb E_Z \log p(Z)\end{split}\]This is a likelihoodfree inference algorithm, and can be used for likelihoods without tractable density functions. The \(\beta\) energy distance is a robust loss functions, and is well defined for any distribution with finite fractional moment \(\mathbb E[\X\^\beta]\).
This requires static model structure, a fully reparametrized guide, and reparametrized likelihood distributions in the model. Model latent distributions may be nonreparametrized.
References
 [1] Gabor J. Szekely, Maria L. Rizzo (2003)
 Energy Statistics: A Class of Statistics Based on Distances.
 [2] Tilmann Gneiting, Adrian E. Raftery (2007)
 Strictly Proper Scoring Rules, Prediction, and Estimation. https://www.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf
Parameters:  beta (float) – Exponent \(\beta\) from [1,2]. The loss function is
strictly proper for distributions with finite \(beta\)absolute moment
\(E[\X\^\beta]\). Thus for heavy tailed distributions
beta
should be small, e.g. forCauchy
distributions, \(\beta<1\) is strictly proper. Defaults to 1. Must be in the open interval (0,2).  prior_scale (float) – Nonnegative scale for prior regularization. Model parameters are trained only if this is positive. If zero (default), then model log densities will not be computed (guide log densities are never computed).
 num_particles (int) – The number of particles/samples used to form the gradient estimators. Must be at least 2.
 max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. If omitted, this will guess a valid value by running the (model,guide) pair once.
Discrete Inference¶

infer_discrete
(fn=None, first_available_dim=None, temperature=1, *, strict_enumeration_warning=True)[source]¶ A poutine that samples discrete sites marked with
site["infer"]["enumerate"] = "parallel"
from the posterior, conditioned on observations.Example:
@infer_discrete(first_available_dim=1, temperature=0) @config_enumerate def viterbi_decoder(data, hidden_dim=10): transition = 0.3 / hidden_dim + 0.7 * torch.eye(hidden_dim) means = torch.arange(float(hidden_dim)) states = [0] for t in pyro.markov(range(len(data))): states.append(pyro.sample("states_{}".format(t), dist.Categorical(transition[states[1]]))) pyro.sample("obs_{}".format(t), dist.Normal(means[states[1]], 1.), obs=data[t]) return states # returns maximum likelihood states
Parameters:  fn – a stochastic function (callable containing Pyro primitive calls)
 first_available_dim (int) – The first tensor dimension (counting from the right) that is available for parallel enumeration. This dimension and all dimensions left may be used internally by Pyro. This should be a negative integer.
 temperature (int) – Either 1 (sample via forwardfilter backwardsample) or 0 (optimize via Viterbilike MAP inference). Defaults to 1 (sample).
 strict_enumeration_warning (bool) – Whether to warn in case no enumerated sample sites are found. Defalts to True.

class
TraceEnumSample_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.traceenum_elbo.TraceEnum_ELBO
This extends
TraceEnum_ELBO
to make it cheaper to sample from discrete latent states during SVI.The following are equivalent but the first is cheaper, sharing work between the computations of
loss
andz
:# Version 1. elbo = TraceEnumSample_ELBO(max_plate_nesting=1) loss = elbo.loss(*args, **kwargs) z = elbo.sample_saved() # Version 2. elbo = TraceEnum_ELBO(max_plate_nesting=1) loss = elbo.loss(*args, **kwargs) guide_trace = poutine.trace(guide).get_trace(*args, **kwargs) z = infer_discrete(poutine.replay(model, guide_trace), first_available_dim=2)(*args, **kwargs)
Inference Utilities¶

class
Predictive
(model, posterior_samples=None, guide=None, num_samples=None, return_sites=(), parallel=False)[source]¶ Bases:
torch.nn.modules.module.Module
EXPERIMENTAL class used to construct predictive distribution. The predictive distribution is obtained by running the model conditioned on latent samples from posterior_samples. If a guide is provided, then posterior samples from all the latent sites are also returned.
Warning
The interface for the
Predictive
class is experimental, and might change in the future.Parameters:  model – Python callable containing Pyro primitives.
 posterior_samples (dict) – dictionary of samples from the posterior.
 guide (callable) – optional guide to get posterior samples of sites not present in posterior_samples.
 num_samples (int) – number of samples to draw from the predictive distribution.
This argument has no effect if
posterior_samples
is nonempty, in which case, the leading dimension size of samples inposterior_samples
is used.  return_sites (list, tuple, or set) – sites to return; by default only sample sites not present in posterior_samples are returned.
 parallel (bool) – predict in parallel by wrapping the existing model
in an outermost plate messenger. Note that this requires that the model has
all batch dims correctly annotated via
plate
. Default is False.

call
(*args, **kwargs)[source]¶ Method that calls
forward()
and returns parameter values of the guide as a tuple instead of a dict, which is a requirement for JIT tracing. Unlikeforward()
, this method can be traced bytorch.jit.trace_module()
.Warning
This method may be removed once PyTorch JIT tracer starts accepting dict as valid return types. See issue.

forward
(*args, **kwargs)[source]¶ Returns dict of samples from the predictive distribution. By default, only sample sites not contained in posterior_samples are returned. This can be modified by changing the return_sites keyword argument of this
Predictive
instance.Note
This method is used internally by
Module
. Users should instead use__call__()
as inPredictive(model)(*args, **kwargs)
.Parameters:  args – model arguments.
 kwargs – model keyword arguments.

class
EmpiricalMarginal
(trace_posterior, sites=None, validate_args=None)[source]¶ Bases:
pyro.distributions.empirical.Empirical
Marginal distribution over a single site (or multiple, provided they have the same shape) from the
TracePosterior
’s model.Note
If multiple sites are specified, they must have the same tensor shape. Samples from each site will be stacked and stored within a single tensor. See
Empirical
. To hold the marginal distribution of sites having different shapes, useMarginals
instead.Parameters:  trace_posterior (TracePosterior) – a
TracePosterior
instance representing a Monte Carlo posterior.  sites (list) – optional list of sites for which we need to generate the marginal distribution.
 trace_posterior (TracePosterior) – a

class
Marginals
(trace_posterior, sites=None, validate_args=None)[source]¶ Bases:
object
Holds the marginal distribution over one or more sites from the
TracePosterior
’s model. This is a convenience container class, which can be extended byTracePosterior
subclasses. e.g. for implementing diagnostics.Parameters:  trace_posterior (TracePosterior) – a TracePosterior instance representing a Monte Carlo posterior.
 sites (list) – optional list of sites for which we need to generate the marginal distribution.

empirical
¶ A dictionary of sites’ names and their corresponding
EmpiricalMarginal
distribution.Type: OrderedDict

support
(flatten=False)[source]¶ Gets support of this marginal distribution.
Parameters: flatten (bool) – A flag to decide if we want to flatten batch_shape when the marginal distribution is collected from the posterior with num_chains > 1
. Defaults to False.Returns: a dict with keys are sites’ names and values are sites’ supports. Return type: OrderedDict

class
TracePosterior
(num_chains=1)[source]¶ Bases:
object
Abstract TracePosterior object from which posterior inference algorithms inherit. When run, collects a bag of execution traces from the approximate posterior. This is designed to be used by other utility classes like EmpiricalMarginal, that need access to the collected execution traces.

information_criterion
(pointwise=False)[source]¶ Computes information criterion of the model. Currently, returns only “Widely Applicable/WatanabeAkaike Information Criterion” (WAIC) and the corresponding effective number of parameters.
Reference:
[1] Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC, Aki Vehtari, Andrew Gelman, and Jonah Gabry
Parameters: pointwise (bool) – a flag to decide if we want to get a vectorized WAIC or not. When pointwise=False
, returns the sum.Returns: a dictionary containing values of WAIC and its effective number of parameters. Return type: OrderedDict


class
TracePredictive
(model, posterior, num_samples, keep_sites=None)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Warning
This class is deprecated and will be removed in a future release. Use the
Predictive
class instead.Generates and holds traces from the posterior predictive distribution, given model execution traces from the approximate posterior. This is achieved by constraining latent sites to randomly sampled parameter values from the model execution traces and running the model forward to generate traces with new response (“_RETURN”) sites. :param model: arbitrary Python callable containing Pyro primitives. :param TracePosterior posterior: trace posterior instance holding samples from the model’s approximate posterior. :param int num_samples: number of samples to generate. :param keep_sites: The sites which should be sampled from posterior distribution (default: all)
MCMC¶
MCMC¶

class
MCMC
(kernel, num_samples, warmup_steps=None, initial_params=None, num_chains=1, hook_fn=None, mp_context=None, disable_progbar=False, disable_validation=True, transforms=None, save_params=None)[source]¶ Bases:
pyro.infer.mcmc.api.AbstractMCMC
Wrapper class for Markov Chain Monte Carlo algorithms. Specific MCMC algorithms are TraceKernel instances and need to be supplied as a
kernel
argument to the constructor.Note
The case of num_chains > 1 uses python multiprocessing to run parallel chains in multiple processes. This goes with the usual caveats around multiprocessing in python, e.g. the model used to initialize the
kernel
must be serializable via pickle, and the performance / constraints will be platform dependent (e.g. only the “spawn” context is available in Windows). This has also not been extensively tested on the Windows platform.Parameters:  kernel – An instance of the
TraceKernel
class, which when given an execution trace returns another sample trace from the target (posterior) distribution.  num_samples (int) – The number of samples that need to be generated, excluding the samples discarded during the warmup phase.
 warmup_steps (int) – Number of warmup iterations. The samples generated during the warmup phase are discarded. If not provided, default is is the same as num_samples.
 num_chains (int) – Number of MCMC chains to run in parallel. Depending on whether num_chains is 1 or more than 1, this class internally dispatches to either _UnarySampler or _MultiSampler.
 initial_params (dict) – dict containing initial tensors in unconstrained space to initiate the markov chain. The leading dimension’s size must match that of num_chains. If not specified, parameter values will be sampled from the prior.
 hook_fn – Python callable that takes in (kernel, samples, stage, i) as arguments. stage is either sample or warmup and i refers to the i’th sample for the given stage. This can be used to implement additional logging, or more generally, run arbitrary code per generated sample.
 mp_context (str) – Multiprocessing context to use when num_chains > 1. Only applicable for Python 3.5 and above. Use mp_context=”spawn” for CUDA.
 disable_progbar (bool) – Disable progress bar and diagnostics update.
 disable_validation (bool) – Disables distribution validation check.
Defaults to
True
, disabling validation, since divergent transitions will lead to exceptions. Switch toFalse
to enable validation, or toNone
to preserve existing global values.  transforms (dict) – dictionary that specifies a transform for a sample site with constrained support to unconstrained space.
 save_params (List[str]) – Optional list of a subset of parameter names to save during sampling and diagnostics. This is useful in models with large nuisance variables. Defaults to None, saving all params.

diagnostics
()[source]¶ Gets some diagnostics statistics such as effective sample size, split GelmanRubin, or divergent transitions from the sampler.

get_samples
(num_samples=None, group_by_chain=False)[source]¶ Get samples from the MCMC run, potentially resampling with replacement.
For parameter details see:
select_samples
.

run
[source]¶ Run MCMC to generate samples and populate self._samples.
Example usage:
def model(data): ... nuts_kernel = NUTS(model) mcmc = MCMC(nuts_kernel, num_samples=500) mcmc.run(data) samples = mcmc.get_samples()
Parameters:  args – optional arguments taken by
MCMCKernel.setup
.  kwargs – optional keywords arguments taken by
MCMCKernel.setup
.
 args – optional arguments taken by

summary
(prob=0.9)[source]¶ Prints a summary table displaying diagnostics of samples obtained from posterior. The diagnostics displayed are mean, standard deviation, median, the 90% Credibility Interval,
effective_sample_size()
,split_gelman_rubin()
.Parameters: prob (float) – the probability mass of samples within the credibility interval.
 kernel – An instance of the
StreamingMCMC¶

class
StreamingMCMC
(kernel, num_samples, warmup_steps=None, initial_params=None, statistics=None, num_chains=1, hook_fn=None, disable_progbar=False, disable_validation=True, transforms=None, save_params=None)[source]¶ Bases:
pyro.infer.mcmc.api.AbstractMCMC
MCMC that computes required statistics in a streaming fashion. For this class no samples are retained but only aggregated statistics. This is useful for running memory expensive models where we care only about specific statistics (especially useful in a memory constrained environments like GPU).
For available streaming ops please see
streaming
.
diagnostics
()[source]¶ Gets diagnostics. Currently a split GelmanRubin is only supported and requires ‘mean’ and ‘variance’ streaming statistics to be present.

MCMCKernel¶

class
MCMCKernel
[source]¶ Bases:
object

initial_params
¶ Returns a dict of initial params (by default, from the prior) to initiate the MCMC run.
Returns: dict of parameter values keyed by their name.

logging
()[source]¶ Relevant logging information to be printed at regular intervals of the MCMC run. Returns None by default.
Returns: String containing the diagnostic summary. e.g. acceptance rate Return type: string

HMC¶

class
HMC
(model=None, potential_fn=None, step_size=1, trajectory_length=None, num_steps=None, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8, init_strategy=<function init_to_uniform>)[source]¶ Bases:
pyro.infer.mcmc.mcmc_kernel.MCMCKernel
Simple Hamiltonian Monte Carlo kernel, where
step_size
andnum_steps
need to be explicitly specified by the user.References
[1] MCMC Using Hamiltonian Dynamics, Radford M. Neal
Parameters:  model – Python callable containing Pyro primitives.
 potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.
 step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.
 trajectory_length (float) – Length of a MCMC trajectory. If not
specified, it will be set to
step_size x num_steps
. In casenum_steps
is not specified, it will be set to \(2\pi\).  num_steps (int) – The number of discrete steps over which to simulate
Hamiltonian dynamics. The state at the end of the trajectory is
returned as the proposal. This value is always equal to
int(trajectory_length / step_size)
.  adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warmup phase using Dual Averaging scheme.
 adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warmup phase using Welford scheme.
 full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  target_accept_prob (float) – Increasing this value will lead to a smaller step size, hence the sampling will be slower and more robust. Default to 0.8.
 init_strategy (callable) – A persite initialization function. See Initialization section for available functions.
Note
Internally, the mass matrix will be ordered according to the order of the names of latent variables, not the order of their appearance in the model.
Example:
>>> true_coefs = torch.tensor([1., 2., 3.]) >>> data = torch.randn(2000, 3) >>> dim = 3 >>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(1)).sample() >>> >>> def model(data): ... coefs_mean = torch.zeros(dim) ... coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3))) ... y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(1)), obs=labels) ... return y >>> >>> hmc_kernel = HMC(model, step_size=0.0855, num_steps=4) >>> mcmc = MCMC(hmc_kernel, num_samples=500, warmup_steps=100) >>> mcmc.run(data) >>> mcmc.get_samples()['beta'].mean(0) # doctest: +SKIP tensor([ 0.9819, 1.9258, 2.9737])

initial_params
¶

inverse_mass_matrix
¶

mass_matrix_adapter
¶

num_steps
¶

step_size
¶
NUTS¶

class
NUTS
(model=None, potential_fn=None, step_size=1, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, use_multinomial_sampling=True, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8, max_tree_depth=10, init_strategy=<function init_to_uniform>)[source]¶ Bases:
pyro.infer.mcmc.hmc.HMC
NoUTurn Sampler kernel, which provides an efficient and convenient way to run Hamiltonian Monte Carlo. The number of steps taken by the integrator is dynamically adjusted on each call to
sample
to ensure an optimal length for the Hamiltonian trajectory [1]. As such, the samples generated will typically have lower autocorrelation than those generated by theHMC
kernel. Optionally, the NUTS kernel also provides the ability to adapt step size during the warmup phase.Refer to the baseball example to see how to do Bayesian inference in Pyro using NUTS.
References
 [1] The NoUturn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo,
 Matthew D. Hoffman, and Andrew Gelman.
 [2] A Conceptual Introduction to Hamiltonian Monte Carlo,
 Michael Betancourt
 [3] Slice Sampling,
 Radford M. Neal
Parameters:  model – Python callable containing Pyro primitives.
 potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.
 step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.
 adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warmup phase using Dual Averaging scheme.
 adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warmup phase using Welford scheme.
 full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.
 use_multinomial_sampling (bool) – A flag to decide if we want to sample candidates along its trajectory using “multinomial sampling” or using “slice sampling”. Slice sampling is used in the original NUTS paper [1], while multinomial sampling is suggested in [2]. By default, this flag is set to True. If it is set to False, NUTS uses slice sampling.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  target_accept_prob (float) – Target acceptance probability of step size adaptation scheme. Increasing this value will lead to a smaller step size, so the sampling will be slower but more robust. Default to 0.8.
 max_tree_depth (int) – Max depth of the binary tree created during the doubling scheme of NUTS sampler. Default to 10.
 init_strategy (callable) – A persite initialization function. See Initialization section for available functions.
Example:
>>> true_coefs = torch.tensor([1., 2., 3.]) >>> data = torch.randn(2000, 3) >>> dim = 3 >>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(1)).sample() >>> >>> def model(data): ... coefs_mean = torch.zeros(dim) ... coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3))) ... y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(1)), obs=labels) ... return y >>> >>> nuts_kernel = NUTS(model, adapt_step_size=True) >>> mcmc = MCMC(nuts_kernel, num_samples=500, warmup_steps=300) >>> mcmc.run(data) >>> mcmc.get_samples()['beta'].mean(0) # doctest: +SKIP tensor([ 0.9221, 1.9464, 2.9228])
BlockMassMatrix¶

class
BlockMassMatrix
(init_scale=1.0)[source]¶ Bases:
object
EXPERIMENTAL This class is used to adapt (inverse) mass matrix and provide useful methods to calculate algebraic terms which involves the mass matrix.
The mass matrix will have block structure, which can be specified by using the method
configure()
with the corresponding structured mass_matrix_shape arg.Parameters: init_scale (float) – initial scale to construct the initial mass matrix. 
configure
(mass_matrix_shape, adapt_mass_matrix=True, options={})[source]¶ Sets up an initial mass matrix.
Parameters:  mass_matrix_shape (dict) – a dict that maps tuples of site names to the shape of the corresponding mass matrix. Each tuple of site names corresponds to a block.
 adapt_mass_matrix (bool) – a flag to decide whether an adaptation scheme will be used.
 options (dict) – tensor options to construct the initial mass matrix.

inverse_mass_matrix
¶

kinetic_grad
(r)[source]¶ Computes the gradient of kinetic energy w.r.t. the momentum r. It is equivalent to compute velocity given the momentum r.
Parameters: r (dict) – a dictionary maps site names to a tensor momentum. Returns: a dictionary maps site names to the corresponding gradient

mass_matrix_size
¶ A dict that maps site names to the size of the corresponding mass matrix.

scale
(r_unscaled, r_prototype)[source]¶ Computes M^{1/2} @ r_unscaled.
Note that r is generated from a gaussian with scale mass_matrix_sqrt. This method will scale it.
Parameters: Returns: a dictionary maps site names to the corresponding tensor

Utilities¶

initialize_model
(model, model_args=(), model_kwargs={}, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, skip_jit_warnings=False, num_chains=1, init_strategy=<function init_to_uniform>, initial_params=None)[source]¶ Given a Python callable with Pyro primitives, generates the following modelspecific properties needed for inference using HMC/NUTS kernels:
 initial parameters to be sampled using a HMC kernel,
 a potential function whose input is a dict of parameters in unconstrained space,
 transforms to transform latent sites of model to unconstrained space,
 a prototype trace to be used in MCMC to consume traces from sampled parameters.
Parameters:  model – a Pyro model which contains Pyro primitives.
 model_args (tuple) – optional args taken by model.
 model_kwargs (dict) – optional kwargs taken by model.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  num_chains (int) – Number of parallel chains. If num_chains > 1, the returned initial_params will be a list with num_chains elements.
 init_strategy (callable) – A persite initialization function. See Initialization section for available functions.
 initial_params (dict) – dict containing initial tensors in unconstrained space to initiate the markov chain.
Returns: a tuple of (initial_params, potential_fn, transforms, prototype_trace)

diagnostics
(samples, group_by_chain=True)[source]¶ Gets diagnostics statistics such as effective sample size and split GelmanRubin using the samples drawn from the posterior distribution.
Parameters: Returns: dictionary of diagnostic stats for each sample site.

select_samples
(samples, num_samples=None, group_by_chain=False)[source]¶ Performs selection from given MCMC samples.
Parameters:  samples (dictionary) – Samples object to sample from.
 num_samples (int) – Number of samples to return. If None, all the samples from an MCMC chain are returned in their original ordering.
 group_by_chain (bool) – Whether to preserve the chain dimension. If True, all samples will have num_chains as the size of their leading dimension.
Returns: dictionary of samples keyed by site name.
Automatic Guide Generation¶
AutoGuide¶

class
AutoGuide
(model, *, create_plates=None)[source]¶ Bases:
pyro.nn.module.PyroModule
Base class for automatic guides.
Derived classes must implement the
forward()
method, with the same*args, **kwargs
as the basemodel
.Auto guides can be used individually or combined in an
AutoGuideList
object.Parameters:  model (callable) – A pyro model.
 create_plates (callable) – An optional function inputing the same
*args,**kwargs
asmodel()
and returning apyro.plate
or iterable of plates. Plates not returned will be created automatically as usual. This is useful for data subsampling.

call
(*args, **kwargs)[source]¶ Method that calls
forward()
and returns parameter values of the guide as a tuple instead of a dict, which is a requirement for JIT tracing. Unlikeforward()
, this method can be traced bytorch.jit.trace_module()
.Warning
This method may be removed once PyTorch JIT tracer starts accepting dict as valid return types. See issue <https://github.com/pytorch/pytorch/issues/27743>_.

median
(*args, **kwargs)[source]¶ Returns the posterior median value of each latent variable.
Returns: A dict mapping sample site name to median tensor. Return type: dict

model
¶
AutoGuideList¶

class
AutoGuideList
(model, *, create_plates=None)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
,torch.nn.modules.container.ModuleList
Container class to combine multiple automatic guides.
Example usage:
guide = AutoGuideList(my_model) guide.append(AutoDiagonalNormal(poutine.block(model, hide=["assignment"]))) guide.append(AutoDiscreteParallel(poutine.block(model, expose=["assignment"]))) svi = SVI(model, guide, optim, Trace_ELBO())
Parameters: model (callable) – a Pyro model 
append
(part)[source]¶ Add an automatic guide for part of the model. The guide should have been created by blocking the model to restrict to a subset of sample sites. No two parts should operate on any one sample site.
Parameters: part (AutoGuide or callable) – a partial guide to add

forward
(*args, **kwargs)[source]¶ A composite guide with the same
*args, **kwargs
as the basemodel
.Note
This method is used internally by
Module
. Users should instead use__call__()
.Returns: A dict mapping sample site name to sampled value. Return type: dict

AutoCallable¶

class
AutoCallable
(model, guide, median=<function AutoCallable.<lambda>>)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
AutoGuide
wrapper for simple callable guides.This is used internally for composing autoguides with custom userdefined guides that are simple callables, e.g.:
def my_local_guide(*args, **kwargs): ... guide = AutoGuideList(model) guide.add(AutoDelta(poutine.block(model, expose=['my_global_param'])) guide.add(my_local_guide) # automatically wrapped in an AutoCallable
To specify a median callable, you can instead:
def my_local_median(*args, **kwargs) ... guide.add(AutoCallable(model, my_local_guide, my_local_median))
For more complex guides that need e.g. access to plates, users should instead subclass
AutoGuide
.Parameters:  model (callable) – a Pyro model
 guide (callable) – a Pyro guide (typically over only part of the model)
 median (callable) – an optional callable returning a dict mapping sample site name to computed median tensor.
AutoNormal¶

class
AutoNormal
(model, *, init_loc_fn=<function init_to_feasible>, init_scale=0.1, create_plates=None)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
This implementation of
AutoGuide
uses a Normal distribution with a diagonal covariance matrix to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.It should be equivalent to :class: AutoDiagonalNormal , but with more convenient site names and with better support for
TraceMeanField_ELBO
.In
AutoDiagonalNormal
, if your model has N named parameters with dimensions k_i and sum k_i = D, you get a single vector of length D for your mean, and a single vector of length D for sigmas. This guide gives you N distinct normals that you can call by name.Usage:
guide = AutoNormal(model) svi = SVI(model, guide, ...)
Parameters:  model (callable) – A Pyro model.
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 init_scale (float) – Initial scale for the standard deviation of each (unconstrained transformed) latent variable.
 create_plates (callable) – An optional function inputing the same
*args,**kwargs
asmodel()
and returning apyro.plate
or iterable of plates. Plates not returned will be created automatically as usual. This is useful for data subsampling.

forward
(*args, **kwargs)[source]¶ An automatic guide with the same
*args, **kwargs
as the basemodel
.Note
This method is used internally by
Module
. Users should instead use__call__()
.Returns: A dict mapping sample site name to sampled value. Return type: dict

median
(*args, **kwargs)[source]¶ Returns the posterior median value of each latent variable.
Returns: A dict mapping sample site name to median tensor. Return type: dict

quantiles
(quantiles, *args, **kwargs)[source]¶ Returns posterior quantiles each latent variable. Example:
print(guide.quantiles([0.05, 0.5, 0.95]))
Parameters: quantiles (torch.Tensor or list) – A list of requested quantiles between 0 and 1. Returns: A dict mapping sample site name to a tensor of quantile values. Return type: dict

scale_constraint
= SoftplusPositive(lower_bound=0.0)¶
AutoDelta¶

class
AutoDelta
(model, init_loc_fn=<function init_to_median>, *, create_plates=None)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
This implementation of
AutoGuide
uses Delta distributions to construct a MAP guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Note
This class does MAP inference in constrained space.
Usage:
guide = AutoDelta(model) svi = SVI(model, guide, ...)
Latent variables are initialized using
init_loc_fn()
. To change the default behavior, create a custominit_loc_fn()
as described in Initialization , for example:def my_init_fn(site): if site["name"] == "level": return torch.tensor([1., 0., 1.]) if site["name"] == "concentration": return torch.ones(k) return init_to_sample(site)
Parameters:  model (callable) – A Pyro model.
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 create_plates (callable) – An optional function inputing the same
*args,**kwargs
asmodel()
and returning apyro.plate
or iterable of plates. Plates not returned will be created automatically as usual. This is useful for data subsampling.
AutoContinuous¶

class
AutoContinuous
(model, init_loc_fn=<function init_to_median>)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
Base class for implementations of continuousvalued Automatic Differentiation Variational Inference [1].
This uses
torch.distributions.transforms
to transform each constrained latent variable to an unconstrained space, then concatenate all variables into a single unconstrained latent variable. Each derived class implements aget_posterior()
method returning a distribution over this single unconstrained latent variable.Assumes model structure and latent dimension are fixed, and all latent variables are continuous.
Parameters: model (callable) – a Pyro model Reference:
 [1] Automatic Differentiation Variational Inference,
 Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei
Parameters:  model (callable) – A Pyro model.
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.

forward
(*args, **kwargs)[source]¶ An automatic guide with the same
*args, **kwargs
as the basemodel
.Note
This method is used internally by
Module
. Users should instead use__call__()
.Returns: A dict mapping sample site name to sampled value. Return type: dict

get_base_dist
()[source]¶ Returns the base distribution of the posterior when reparameterized as a
TransformedDistribution
. This should not depend on the model’s *args, **kwargs.posterior = TransformedDistribution(self.get_base_dist(), self.get_transform(*args, **kwargs))
Returns: TorchDistribution
instance representing the base distribution.

get_transform
(*args, **kwargs)[source]¶ Returns the transform applied to the base distribution when the posterior is reparameterized as a
TransformedDistribution
. This may depend on the model’s *args, **kwargs.posterior = TransformedDistribution(self.get_base_dist(), self.get_transform(*args, **kwargs))
Returns: a Transform
instance.

median
(*args, **kwargs)[source]¶ Returns the posterior median value of each latent variable.
Returns: A dict mapping sample site name to median tensor. Return type: dict

quantiles
(quantiles, *args, **kwargs)[source]¶ Returns posterior quantiles each latent variable. Example:
print(guide.quantiles([0.05, 0.5, 0.95]))
Parameters: quantiles (torch.Tensor or list) – A list of requested quantiles between 0 and 1. Returns: A dict mapping sample site name to a tensor of quantile values. Return type: dict
AutoMultivariateNormal¶

class
AutoMultivariateNormal
(model, init_loc_fn=<function init_to_median>, init_scale=0.1)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoContinuous
This implementation of
AutoContinuous
uses a Cholesky factorization of a Multivariate Normal distribution to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Usage:
guide = AutoMultivariateNormal(model) svi = SVI(model, guide, ...)
By default the mean vector is initialized by
init_loc_fn()
and the Cholesky factor is initialized to the identity times a small factor.Parameters:  model (callable) – A generative model.
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 init_scale (float) – Initial scale for the standard deviation of each (unconstrained transformed) latent variable.

scale_tril_constraint
= SoftplusLowerCholesky()¶
AutoDiagonalNormal¶

class
AutoDiagonalNormal
(model, init_loc_fn=<function init_to_median>, init_scale=0.1)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoContinuous
This implementation of
AutoContinuous
uses a Normal distribution with a diagonal covariance matrix to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Usage:
guide = AutoDiagonalNormal(model) svi = SVI(model, guide, ...)
By default the mean vector is initialized to zero and the scale is initialized to the identity times a small factor.
Parameters:  model (callable) – A generative model.
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 init_scale (float) – Initial scale for the standard deviation of each (unconstrained transformed) latent variable.

scale_constraint
= SoftplusPositive(lower_bound=0.0)¶
AutoLowRankMultivariateNormal¶

class
AutoLowRankMultivariateNormal
(model, init_loc_fn=<function init_to_median>, init_scale=0.1, rank=None)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoContinuous
This implementation of
AutoContinuous
uses a low rank plus diagonal Multivariate Normal distribution to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Usage:
guide = AutoLowRankMultivariateNormal(model, rank=10) svi = SVI(model, guide, ...)
By default the
cov_diag
is initialized to a small constant and thecov_factor
is initialized randomly such that on averagecov_factor.matmul(cov_factor.t())
has the same scale ascov_diag
.Parameters:  model (callable) – A generative model.
 rank (int or None) – The rank of the lowrank part of the covariance matrix.
Defaults to approximately
sqrt(latent dim)
.  init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 init_scale (float) – Approximate initial scale for the standard deviation of each (unconstrained transformed) latent variable.

scale_constraint
= SoftplusPositive(lower_bound=0.0)¶
AutoNormalizingFlow¶

class
AutoNormalizingFlow
(model, init_transform_fn)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoContinuous
This implementation of
AutoContinuous
uses a Diagonal Normal distribution transformed via a sequence of bijective transforms (e.g. variousTransformModule
subclasses) to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Usage:
transform_init = partial(iterated, block_autoregressive, repeats=2) guide = AutoNormalizingFlow(model, transform_init) svi = SVI(model, guide, ...)
Parameters:  model (callable) – a generative model
 init_transform_fn – a callable which when provided with the latent
dimension returns an instance of
Transform
, orTransformModule
if the transform has trainable params.
AutoIAFNormal¶

class
AutoIAFNormal
(model, hidden_dim=None, init_loc_fn=None, num_transforms=1, **init_transform_kwargs)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoNormalizingFlow
This implementation of
AutoContinuous
uses a Diagonal Normal distribution transformed via aAffineAutoregressive
to construct a guide over the entire latent space. The guide does not depend on the model’s*args, **kwargs
.Usage:
guide = AutoIAFNormal(model, hidden_dim=latent_dim) svi = SVI(model, guide, ...)
Parameters:  model (callable) – a generative model
 hidden_dim (list[int]) – number of hidden dimensions in the IAF
 init_loc_fn (callable) –
A persite initialization function. See Initialization section for available functions.
Warning
This argument is only to preserve backwards compatibility and has no effect in practice.
 num_transforms (int) – number of
AffineAutoregressive
transforms to use in sequence.  init_transform_kwargs – other keyword arguments taken by
affine_autoregressive()
.
AutoLaplaceApproximation¶

class
AutoLaplaceApproximation
(model, init_loc_fn=<function init_to_median>)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoContinuous
Laplace approximation (quadratic approximation) approximates the posterior \(\log p(z  x)\) by a multivariate normal distribution in the unconstrained space. Under the hood, it uses Delta distributions to construct a MAP guide over the entire (unconstrained) latent space. Its covariance is given by the inverse of the hessian of \(\log p(x, z)\) at the MAP point of z.
Usage:
delta_guide = AutoLaplaceApproximation(model) svi = SVI(model, delta_guide, ...) # ...then train the delta_guide... guide = delta_guide.laplace_approximation()
By default the mean vector is initialized to an empirical prior median.
Parameters:  model (callable) – a generative model
 init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.

laplace_approximation
(*args, **kwargs)[source]¶ Returns a
AutoMultivariateNormal
instance whose posterior’s loc and scale_tril are given by Laplace approximation.
AutoDiscreteParallel¶
AutoStructured¶

class
AutoStructured
(model, *, conditionals: Dict[str, Union[str, Callable]] = 'normal', dependencies: Dict[str, Dict[str, Union[str, Callable]]] = 'linear', init_loc_fn=<function init_to_feasible>, init_scale=0.1, create_plates=None)[source]¶ Bases:
pyro.infer.autoguide.guides.AutoGuide
Structured guide whose conditional distributions are Delta, Normal, MultivariateNormal, or by a callable, and whose latent variables can depend on each other either linearly (in unconstrained space) or via shearing by a callable.
Usage:
def model(data): x = pyro.sample("x", dist.LogNormal(0, 1)) with pyro.plate("plate", len(data)): y = pyro.sample("y", dist.Normal(0, 1)) pyro.sample("z", dist.Normal(y, x), obs=data) guide = AutoStructured( model=model, conditionals={"x": "normal", "y": "normal"}, dependencies={"x": {"y": "linear"}}, )
Once trained, this guide can be used with
StructuredReparam
to precondition a model for use in HMC and NUTS inference.Note
If you declare a dependency of a highdimensional downstream variable on a lowdimensional upstream variable, you may want to use a lower learning rate for that weight, e.g.:
def optim_config(param_name): config = {"lr": 0.01} if "deps.my_downstream.my_upstream" in param_name: config["lr"] *= 0.1 return config adam = pyro.optim.Adam(optim_config)
Parameters:  model (callable) – A Pyro model.
 conditionals – Family of distribution with which to model each latent
variable’s conditional posterior. This should be a dict mapping each
latent variable name to either a string in (“delta”, “normal”, or
“mvn”) or to a callable that returns a sample from a zero mean (or
approximately centered) noise distribution (such callables typically
call
pyro.param()
andpyro.sample()
internally).  dependencies – Dict mapping each site name to a dict of its upstream
dependencies; each inner dict maps upstream site name to either the
string “linear” or a callable that maps a flattened upstream
perturbation to flattened downstream perturbation. The string
“linear” is equivalent to
nn.Linear(upstream.numel(), downstream.numel(), bias=False)
. Dependencies must not contain cycles or selfloops.  init_loc_fn (callable) – A persite initialization function. See Initialization section for available functions.
 init_scale (float) – Initial scale for the standard deviation of each (unconstrained transformed) latent variable.
 create_plates (callable) – An optional function inputing the same
*args,**kwargs
asmodel()
and returning apyro.plate
or iterable of plates. Plates not returned will be created automatically as usual. This is useful for data subsampling.

get_deltas
¶

scale_constraint
= SoftplusPositive(lower_bound=0.0)¶

scale_tril_constraint
= SoftplusLowerCholesky()¶
Initialization¶
The pyro.infer.autoguide.initialization module contains initialization functions for automatic guides.
The standard interface for initialization is a function that inputs a Pyro
trace site
dict and returns an appropriately sized value
to serve
as an initial constrained value for a guide estimate.

init_to_feasible
(site=None)[source]¶ Initialize to an arbitrary feasible point, ignoring distribution parameters.

init_to_median
(site=None, num_samples=15, *, fallback: Optional[Callable] = <function init_to_feasible>)[source]¶ Initialize to the prior median; fallback to
fallback
(defaults toinit_to_feasible()
) if mean is undefined.Parameters: fallback (callable) – Fallback init strategy, for sites not specified in values
.Raises: ValueError – If fallback=None
and no value for a site is given invalues
.

init_to_mean
(site=None, *, fallback: Optional[Callable] = <function init_to_median>)[source]¶ Initialize to the prior mean; fallback to
fallback
(defaults toinit_to_median()
) if mean is undefined.Parameters: fallback (callable) – Fallback init strategy, for sites not specified in values
.Raises: ValueError – If fallback=None
and no value for a site is given invalues
.

init_to_uniform
(site: Optional[dict] = None, radius: float = 2.0)[source]¶ Initialize to a random point in the area
(radius, radius)
of unconstrained domain.Parameters: radius (float) – specifies the range to draw an initial point in the unconstrained domain.

init_to_value
(site: Optional[dict] = None, values: dict = {}, *, fallback: Optional[Callable] = <function init_to_uniform>)[source]¶ Initialize to the value specified in
values
. Fallback tofallback
(defaults toinit_to_uniform()
) strategy for sites not appearing invalues
.Parameters:  values (dict) – dictionary of initial values keyed by site name.
 fallback (callable) – Fallback init strategy, for sites not specified
in
values
.
Raises: ValueError – If
fallback=None
and no value for a site is given invalues
.

init_to_generated
(site=None, generate=<function <lambda>>)[source]¶ Initialize to another initialization strategy returned by the callback
generate
which is called once per model execution.This is like
init_to_value()
but can produce different (e.g. random) values once per model execution. For example to generate values and returninit_to_value
you could define:def generate(): values = {"x": torch.randn(100), "y": torch.rand(5)} return init_to_value(values=values) my_init_fn = init_to_generated(generate=generate)
Parameters: generate (callable) – A callable returning another initialization function, e.g. returning an init_to_value(values={...})
populated with a dictionary of random samples.

class
InitMessenger
(init_fn)[source]¶ Bases:
pyro.poutine.messenger.Messenger
Initializes a site by replacing
.sample()
calls with values drawn from an initialization strategy. This is mainly for internal use by autoguide classes.Parameters: init_fn (callable) – An initialization function.
Reparameterizers¶
The pyro.infer.reparam
module contains reparameterization strategies for
the pyro.poutine.handlers.reparam()
effect. These are useful for altering
geometry of a poorlyconditioned parameter space to make the posterior better
shaped. These can be used with a variety of inference algorithms, e.g.
Auto*Normal
guides and MCMC.

class
Reparam
[source]¶ Abstract base class for reparameterizers.
Derived classes should implement
apply()
.
apply
(msg: dict) → dict[source]¶ Abstract method to apply reparameterizer.
Parameters: name (dict) – A simplified Pyro message with fields:  name: str
the sample site’s name fn: Callable
a distribution value: Optional[torch.Tensor]
an observed or initial value is_observed: bool
whethervalue
is an observationReturns: A simplified Pyro message with fields fn
,value
, andis_observed
.Return type: dict

Conjugate Updating¶

class
ConjugateReparam
(guide)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
EXPERIMENTAL Reparameterize to a conjugate updated distribution.
This updates a prior distribution
fn
using theconjugate_update()
method. The guide may be either a distribution object or a callable inputting model*args,**kwargs
and returning a distribution object. The guide may be approximate or learned.For example consider the model and naive variational guide:
total = torch.tensor(10.) count = torch.tensor(2.) def model(): prob = pyro.sample("prob", dist.Beta(0.5, 1.5)) pyro.sample("count", dist.Binomial(total, prob), obs=count) guide = AutoDiagonalNormal(model) # learns the posterior over prob
Instead of using this learned guide, we can handcompute the conjugate posterior distribution over “prob”, and then use a simpler guide during inference, in this case an empty guide:
reparam_model = poutine.reparam(model, { "prob": ConjugateReparam(dist.Beta(1 + count, 1 + total  count)) }) def reparam_guide(): pass # nothing remains to be modeled!
Parameters: guide (Distribution or callable) – A likelihood distribution or a callable returning a guide distribution. Only a few distributions are supported, depending on the prior distribution’s conjugate_update()
implementation.
LocScale Decentering¶

class
LocScaleReparam
(centered=None, shape_params=None)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Generic decentering reparameterizer [1] for latent variables parameterized by
loc
andscale
(and possibly additionalshape_params
).This reparameterization works only for latent variables, not likelihoods.
 [1] Maria I. Gorinova, Dave Moore, Matthew D. Hoffman (2019)
 “Automatic Reparameterisation of Probabilistic Programs” https://arxiv.org/pdf/1906.03028.pdf
Parameters:  centered (float) – optional centered parameter. If None (default) learn
a persite perelement centering parameter in
[0,1]
. If 0, fully decenter the distribution; if 1, preserve the centered distribution unchanged.  shape_params (tuple or list) – Optional list of additional parameter names to copy
unchanged from the centered to decentered distribution. If absent,
all params in a distributions
.arg_constraints
will be copied.
GumbelSoftmax¶

class
GumbelSoftmaxReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Reparametrizer for
RelaxedOneHotCategorical
latent variables.This is useful for transforming multimodal posteriors to unimodal posteriors. Note this increases the latent dimension by 1 per event.
This reparameterization works only for latent variables, not likelihoods.
Transformed Distributions¶

class
TransformReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Reparameterizer for
pyro.distributions.torch.TransformedDistribution
latent variables.This is useful for transformed distributions with complex, geometrychanging transforms, where the posterior has simple shape in the space of
base_dist
.This reparameterization works only for latent variables, not likelihoods.
Discrete Cosine Transform¶

class
DiscreteCosineReparam
(dim=1, smooth=0.0, *, experimental_allow_batch=False)[source]¶ Bases:
pyro.infer.reparam.unit_jacobian.UnitJacobianReparam
Discrete Cosine reparameterizer, using a
DiscreteCosineTransform
.This is useful for sequential models where coupling along a timelike axis (e.g. a banded precision matrix) introduces longrange correlation. This reparameterizes to a frequencydomain representation where posterior covariance should be closer to diagonal, thereby improving the accuracy of diagonal guides in SVI and improving the effectiveness of a diagonal mass matrix in HMC.
When reparameterizing variables that are approximately continuous along the time dimension, set
smooth=1
. For variables that are approximately continuously differentiable along the time axis, setsmooth=2
.This reparameterization works only for latent variables, not likelihoods.
Parameters:  dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right.
 smooth (float) – Smoothing parameter. When 0, this transforms white noise to white noise; when 1 this transforms Brownian noise to to white noise; when 1 this transforms violet noise to white noise; etc. Any real number is allowed. https://en.wikipedia.org/wiki/Colors_of_noise.
 experimental_allow_batch (bool) – EXPERIMENTAL allow coupling across a batch dimension. The targeted batch dimension and all batch dimensions to the right will be converted to event dimensions. Defaults to False.
Haar Transform¶

class
HaarReparam
(dim=1, flip=False, *, experimental_allow_batch=False)[source]¶ Bases:
pyro.infer.reparam.unit_jacobian.UnitJacobianReparam
Haar wavelet reparameterizer, using a
HaarTransform
.This is useful for sequential models where coupling along a timelike axis (e.g. a banded precision matrix) introduces longrange correlation. This reparameterizes to a frequencydomain representation where posterior covariance should be closer to diagonal, thereby improving the accuracy of diagonal guides in SVI and improving the effectiveness of a diagonal mass matrix in HMC.
This reparameterization works only for latent variables, not likelihoods.
Parameters:  dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right.
 flip (bool) – Whether to flip the time axis before applying the Haar transform. Defaults to false.
 experimental_allow_batch (bool) – EXPERIMENTAL allow coupling across a batch dimension. The targeted batch dimension and all batch dimensions to the right will be converted to event dimensions. Defaults to False.
Unit Jacobian Transforms¶

class
UnitJacobianReparam
(transform, suffix='transformed', *, experimental_allow_batch=False)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Reparameterizer for
Transform
objects whose Jacobian determinant is one.Parameters:  transform (Transform) – A transform whose Jacobian has determinant 1.
 suffix (str) – A suffix to append to the transformed site.
 experimental_allow_batch (bool) – EXPERIMENTAL allow coupling across a batch dimension. The targeted batch dimension and all batch dimensions to the right will be converted to event dimensions. Defaults to False.
StudentT Distributions¶

class
StudentTReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Auxiliary variable reparameterizer for
StudentT
random variables.This is useful in combination with
LinearHMMReparam
because it allows StudentT processes to be treated as conditionally Gaussian processes, permitting cheap inference viaGaussianHMM
.This reparameterizes a
StudentT
by introducing an auxiliaryGamma
variable conditioned on which the result isNormal
.
Stable Distributions¶

class
LatentStableReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Auxiliary variable reparameterizer for
Stable
latent variables.This is useful in inference of latent
Stable
variables because thelog_prob()
is not implemented.This uses the ChambersMallowsStuck method [1], creating a pair of parameterfree auxiliary distributions (
Uniform(pi/2,pi/2)
andExponential(1)
) with welldefined.log_prob()
methods, thereby permitting use of reparameterized stable distributions in likelihoodbased inference algorithms like SVI and MCMC.This reparameterization works only for latent variables, not likelihoods. For likelihoodcompatible reparameterization see
SymmetricStableReparam
orStableReparam
. [1] J.P. Nolan (2017).
 Stable Distributions: Models for Heavy Tailed Data. http://fs2.american.edu/jpnolan/www/stable/chap1.pdf

class
SymmetricStableReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Auxiliary variable reparameterizer for symmetric
Stable
random variables (i.e. those for whichskew=0
).This is useful in inference of symmetric
Stable
variables because thelog_prob()
is not implemented.This reparameterizes a symmetric
Stable
random variable as a totallyskewed (skew=1
)Stable
scale mixture ofNormal
random variables. See Proposition 3. of [1] (but note we differ sinceStable
uses Nolan’s continuous S0 parameterization). [1] Alvaro Cartea and Sam Howison (2009)
 “Option Pricing with LevyStable Processes” https://pdfs.semanticscholar.org/4d66/c91b136b2a38117dd16c2693679f5341c616.pdf

class
StableReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Auxiliary variable reparameterizer for arbitrary
Stable
random variables.This is useful in inference of nonsymmetric
Stable
variables because thelog_prob()
is not implemented.This reparameterizes a
Stable
random variable as sum of two other stable random variables, one symmetric and the other totally skewed (applying Property 2.3.a of [1]). The totally skewed variable is sampled as inLatentStableReparam
, and the symmetric variable is decomposed as inSymmetricStableReparam
. [1] V. M. Zolotarev (1986)
 “Onedimensional stable distributions”
Projected Normal Distributions¶

class
ProjectedNormalReparam
[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Reparametrizer for
ProjectedNormal
latent variables.This reparameterization works only for latent variables, not likelihoods.
Hidden Markov Models¶

class
LinearHMMReparam
(init=None, trans=None, obs=None)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Auxiliary variable reparameterizer for
LinearHMM
random variables.This defers to component reparameterizers to create auxiliary random variables conditioned on which the process becomes a
GaussianHMM
. If theobservation_dist
is aTransformedDistribution
this reorders those transforms so that the result is aTransformedDistribution
ofGaussianHMM
.This is useful for training the parameters of a
LinearHMM
distribution, whoselog_prob()
method is undefined. To perform inference in the presence of nonGaussian factors such asStable()
,StudentT()
orLogNormal()
, configure withStudentTReparam
,StableReparam
,SymmetricStableReparam
, etc. component reparameterizers forinit
,trans
, andscale
. For example:hmm = LinearHMM( init_dist=Stable(1,0,1,0).expand([2]).to_event(1), trans_matrix=torch.eye(2), trans_dist=MultivariateNormal(torch.zeros(2), torch.eye(2)), obs_matrix=torch.eye(2), obs_dist=TransformedDistribution( Stable(1.5,0.5,1.0).expand([2]).to_event(1), ExpTransform())) rep = LinearHMMReparam(init=SymmetricStableReparam(), obs=StableReparam()) with poutine.reparam(config={"hmm": rep}): pyro.sample("hmm", hmm, obs=data)
Parameters:
Site Splitting¶

class
SplitReparam
(sections, dim)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Reparameterizer to split a random variable along a dimension, similar to
torch.split()
.This is useful for treating different parts of a tensor with different reparameterizers or inference methods. For example when performing HMC inference on a time series, you can first apply
DiscreteCosineReparam
orHaarReparam
, then applySplitReparam
to split into lowfrequency and highfrequency components, and finally add the lowfrequency components to thefull_mass
matrix together with globals.Parameters:  sections – Size of a single chunk or list of sizes for each chunk.
 dim (int) – Dimension along which to split. Defaults to 1.
Type:
Neural Transport¶

class
NeuTraReparam
(guide)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Neural Transport reparameterizer [1] of multiple latent variables.
This uses a trained
AutoContinuous
guide to alter the geometry of a model, typically for use e.g. in MCMC. Example usage:# Step 1. Train a guide guide = AutoIAFNormal(model) svi = SVI(model, guide, ...) # ...train the guide... # Step 2. Use trained guide in NeuTra MCMC neutra = NeuTraReparam(guide) model = poutine.reparam(model, config=lambda _: neutra) nuts = NUTS(model) # ...now use the model in HMC or NUTS...
This reparameterization works only for latent variables, not likelihoods. Note that all sites must share a single common
NeuTraReparam
instance, and that the model must have static structure. [1] Hoffman, M. et al. (2019)
 “NeuTralizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport” https://arxiv.org/abs/1903.03704
Parameters: guide (AutoContinuous) – A trained guide. 
transform_sample
(latent)[source]¶ Given latent samples from the warped posterior (with a possible batch dimension), return a dict of samples from the latent sites in the model.
Parameters: latent – sample from the warped posterior (possibly batched). Note that the batch dimension must not collide with plate dimensions in the model, i.e. any batch dims d <  max_plate_nesting. Returns: a dict of samples keyed by latent sites in the model. Return type: dict
Structured Preconditioning¶

class
StructuredReparam
(guide: pyro.infer.autoguide.guides.AutoStructured)[source]¶ Bases:
pyro.infer.reparam.reparam.Reparam
Preconditioning reparameterizer of multiple latent variables.
This uses a trained
AutoStructured
guide to alter the geometry of a model, typically for use e.g. in MCMC. Example usage:# Step 1. Train a guide guide = AutoStructured(model, ...) svi = SVI(model, guide, ...) # ...train the guide... # Step 2. Use trained guide in preconditioned MCMC model = StructuredReparam(guide).reparam(model) nuts = NUTS(model) # ...now use the model in HMC or NUTS...
This reparameterization works only for latent variables, not likelihoods. Note that all sites must share a single common
StructuredReparam
instance, and that the model must have static structure.Note
This can be seen as a restricted structured version of
NeuTraReparam
[1] combined withpoutine.condition
on MAPestimated sites (the NeuTra transform is an exact reparameterizer, but the conditioning to point estimates introduces model approximation). [1] Hoffman, M. et al. (2019)
 “NeuTralizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport” https://arxiv.org/abs/1903.03704
Parameters: guide (AutoStructured) – A trained guide. 
transform_samples
(aux_samples, save_params=None)[source]¶ Given latent samples from the warped posterior (with a possible batch dimension), return a dict of samples from the latent sites in the model.
Parameters:  aux_samples (dict) – Dict site name to tensor value for each latent
auxiliary site (or if
save_params
is specifiec, then for only those latent auxiliary sites needed to compute requested params).  save_params (list) – An optional list of site names to save. This is useful in models with large nuisance variables. Defaults to None, saving all params.
Returns: a dict of samples keyed by latent sites in the model.
Return type:  aux_samples (dict) – Dict site name to tensor value for each latent
auxiliary site (or if
Distributions¶
PyTorch Distributions¶
Most distributions in Pyro are thin wrappers around PyTorch distributions.
For details on the PyTorch distribution interface, see
torch.distributions.distribution.Distribution
.
For differences between the Pyro and PyTorch interfaces, see
TorchDistributionMixin
.
Bernoulli¶

class
Bernoulli
(probs=None, logits=None, validate_args=None)¶ Wraps
torch.distributions.bernoulli.Bernoulli
withTorchDistributionMixin
.
Beta¶

class
Beta
(concentration1, concentration0, validate_args=None)[source]¶ Wraps
torch.distributions.beta.Beta
withTorchDistributionMixin
.
Binomial¶

class
Binomial
(total_count=1, probs=None, logits=None, validate_args=None)[source]¶ Wraps
torch.distributions.binomial.Binomial
withTorchDistributionMixin
.
Categorical¶

class
Categorical
(probs=None, logits=None, validate_args=None)[source]¶ Wraps
torch.distributions.categorical.Categorical
withTorchDistributionMixin
.
Cauchy¶

class
Cauchy
(loc, scale, validate_args=None)¶ Wraps
torch.distributions.cauchy.Cauchy
withTorchDistributionMixin
.
Chi2¶

class
Chi2
(df, validate_args=None)¶ Wraps
torch.distributions.chi2.Chi2
withTorchDistributionMixin
.
ContinuousBernoulli¶

class
ContinuousBernoulli
(probs=None, logits=None, lims=(0.499, 0.501), validate_args=None)¶ Wraps
torch.distributions.continuous_bernoulli.ContinuousBernoulli
withTorchDistributionMixin
.
Dirichlet¶

class
Dirichlet
(concentration, validate_args=None)[source]¶ Wraps
torch.distributions.dirichlet.Dirichlet
withTorchDistributionMixin
.
Exponential¶

class
Exponential
(rate, validate_args=None)¶ Wraps
torch.distributions.exponential.Exponential
withTorchDistributionMixin
.
ExponentialFamily¶

class
ExponentialFamily
(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)¶ Wraps
torch.distributions.exp_family.ExponentialFamily
withTorchDistributionMixin
.
FisherSnedecor¶

class
FisherSnedecor
(df1, df2, validate_args=None)¶ Wraps
torch.distributions.fishersnedecor.FisherSnedecor
withTorchDistributionMixin
.
Gamma¶

class
Gamma
(concentration, rate, validate_args=None)[source]¶ Wraps
torch.distributions.gamma.Gamma
withTorchDistributionMixin
.
Geometric¶

class
Geometric
(probs=None, logits=None, validate_args=None)[source]¶ Wraps
torch.distributions.geometric.Geometric
withTorchDistributionMixin
.
Gumbel¶

class
Gumbel
(loc, scale, validate_args=None)¶ Wraps
torch.distributions.gumbel.Gumbel
withTorchDistributionMixin
.
HalfCauchy¶

class
HalfCauchy
(scale, validate_args=None)¶ Wraps
torch.distributions.half_cauchy.HalfCauchy
withTorchDistributionMixin
.
HalfNormal¶

class
HalfNormal
(scale, validate_args=None)¶ Wraps
torch.distributions.half_normal.HalfNormal
withTorchDistributionMixin
.
Independent¶

class
Independent
(base_distribution, reinterpreted_batch_ndims, validate_args=None)[source]¶ Wraps
torch.distributions.independent.Independent
withTorchDistributionMixin
.
Kumaraswamy¶

class
Kumaraswamy
(concentration1, concentration0, validate_args=None)¶ Wraps
torch.distributions.kumaraswamy.Kumaraswamy
withTorchDistributionMixin
.
LKJCholesky¶

class
LKJCholesky
(dim, concentration=1.0, validate_args=None)¶ Wraps
torch.distributions.lkj_cholesky.LKJCholesky
withTorchDistributionMixin
.
Laplace¶

class
Laplace
(loc, scale, validate_args=None)¶ Wraps
torch.distributions.laplace.Laplace
withTorchDistributionMixin
.
LogNormal¶

class
LogNormal
(loc, scale, validate_args=None)[source]¶ Wraps
torch.distributions.log_normal.LogNormal
withTorchDistributionMixin
.
LogisticNormal¶

class
LogisticNormal
(loc, scale, validate_args=None)¶ Wraps
torch.distributions.logistic_normal.LogisticNormal
withTorchDistributionMixin
.
LowRankMultivariateNormal¶

class
LowRankMultivariateNormal
(loc, cov_factor, cov_diag, validate_args=None)[source]¶ Wraps
torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal
withTorchDistributionMixin
.
MixtureSameFamily¶

class
MixtureSameFamily
(mixture_distribution, component_distribution, validate_args=None)¶ Wraps
torch.distributions.mixture_same_family.MixtureSameFamily
withTorchDistributionMixin
.
Multinomial¶

class
Multinomial
(total_count=1, probs=None, logits=None, validate_args=None)[source]¶ Wraps
torch.distributions.multinomial.Multinomial
withTorchDistributionMixin
.
MultivariateNormal¶

class
MultivariateNormal
(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source]¶ Wraps
torch.distributions.multivariate_normal.MultivariateNormal
withTorchDistributionMixin
.
NegativeBinomial¶

class
NegativeBinomial
(total_count, probs=None, logits=None, validate_args=None)¶ Wraps
torch.distributions.negative_binomial.NegativeBinomial
withTorchDistributionMixin
.
Normal¶

class
Normal
(loc, scale, validate_args=None)[source]¶ Wraps
torch.distributions.normal.Normal
withTorchDistributionMixin
.
OneHotCategorical¶

class
OneHotCategorical
(probs=None, logits=None, validate_args=None)[source]¶ Wraps
torch.distributions.one_hot_categorical.OneHotCategorical
withTorchDistributionMixin
.
OneHotCategoricalStraightThrough¶

class
OneHotCategoricalStraightThrough
(probs=None, logits=None, validate_args=None)¶ Wraps
torch.distributions.one_hot_categorical.OneHotCategoricalStraightThrough
withTorchDistributionMixin
.
Pareto¶

class
Pareto
(scale, alpha, validate_args=None)¶ Wraps
torch.distributions.pareto.Pareto
withTorchDistributionMixin
.
Poisson¶

class
Poisson
(rate, *, is_sparse=False, validate_args=None)[source]¶ Wraps
torch.distributions.poisson.Poisson
withTorchDistributionMixin
.
RelaxedBernoulli¶

class
RelaxedBernoulli
(temperature, probs=None, logits=None, validate_args=None)¶ Wraps
torch.distributions.relaxed_bernoulli.RelaxedBernoulli
withTorchDistributionMixin
.
RelaxedOneHotCategorical¶

class
RelaxedOneHotCategorical
(temperature, probs=None, logits=None, validate_args=None)¶ Wraps
torch.distributions.relaxed_categorical.RelaxedOneHotCategorical
withTorchDistributionMixin
.
StudentT¶

class
StudentT
(df, loc=0.0, scale=1.0, validate_args=None)¶ Wraps
torch.distributions.studentT.StudentT
withTorchDistributionMixin
.
TransformedDistribution¶

class
TransformedDistribution
(base_distribution, transforms, validate_args=None)¶ Wraps
torch.distributions.transformed_distribution.TransformedDistribution
withTorchDistributionMixin
.
Uniform¶

class
Uniform
(low, high, validate_args=None)[source]¶ Wraps
torch.distributions.uniform.Uniform
withTorchDistributionMixin
.
VonMises¶

class
VonMises
(loc, concentration, validate_args=None)¶ Wraps
torch.distributions.von_mises.VonMises
withTorchDistributionMixin
.
Weibull¶

class
Weibull
(scale, concentration, validate_args=None)¶ Wraps
torch.distributions.weibull.Weibull
withTorchDistributionMixin
.
Pyro Distributions¶
Abstract Distribution¶

class
Distribution
[source]¶ Bases:
object
Base class for parameterized probability distributions.
Distributions in Pyro are stochastic function objects with
sample()
andlog_prob()
methods. Distribution are stochastic functions with fixed parameters:d = dist.Bernoulli(param) x = d() # Draws a random sample. p = d.log_prob(x) # Evaluates log probability of x.
Implementing New Distributions:
Derived classes must implement the methods:
sample()
,log_prob()
.Examples:
Take a look at the examples to see how they interact with inference algorithms.

has_rsample
= False¶

has_enumerate_support
= False¶

__call__
(*args, **kwargs)[source]¶ Samples a random value (just an alias for
.sample(*args, **kwargs)
).For tensor distributions, the returned tensor should have the same
.shape
as the parameters.Returns: A random value. Return type: torch.Tensor

sample
(*args, **kwargs)[source]¶ Samples a random value.
For tensor distributions, the returned tensor should have the same
.shape
as the parameters, unless otherwise noted.Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. Returns: A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape()
.Return type: torch.Tensor

log_prob
(x, *args, **kwargs)[source]¶ Evaluates log probability densities for each of a batch of samples.
Parameters: x (torch.Tensor) – A single value or a batch of values batched along axis 0. Returns: log probability densities as a onedimensional Tensor
with same batch size as value and params. The shape of the result should beself.batch_size
.Return type: torch.Tensor

score_parts
(x, *args, **kwargs)[source]¶ Computes ingredients for stochastic gradient estimators of ELBO.
The default implementation is correct both for nonreparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.
Setting
.has_rsample
on a distribution instance will determine whether inference engines likeSVI
use reparameterized samplers or the score function estimator.Parameters: x (torch.Tensor) – A single value or batch of values. Returns: A ScoreParts object containing parts of the ELBO estimator. Return type: ScoreParts

enumerate_support
(expand=True)[source]¶ Returns a representation of the parametrized distribution’s support, along the first dimension. This is implemented only by discrete distributions.
Note that this returns support values of all the batched RVs in lockstep, rather than the full cartesian product.
Parameters: expand (bool) – whether to expand the result to a tensor of shape (n,) + batch_shape + event_shape
. If false, the return value has unexpanded shape(n,) + (1,)*len(batch_shape) + event_shape
which can be broadcasted to the full shape.Returns: An iterator over the distribution’s discrete support. Return type: iterator

conjugate_update
(other)[source]¶ EXPERIMENTAL Creates an updated distribution fusing information from another compatible distribution. This is supported by only a few conjugate distributions.
This should satisfy the equation:
fg, log_normalizer = f.conjugate_update(g) assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer
Note this is equivalent to
funsor.ops.add
onFunsor
distributions, but we return a lazy sum(updated, log_normalizer)
because PyTorch distributions must be normalized. Thusconjugate_update()
should commute withdist_to_funsor()
andtensor_to_funsor()
dist_to_funsor(f) + dist_to_funsor(g) == dist_to_funsor(fg) + tensor_to_funsor(log_normalizer)
Parameters: other – A distribution representing p(datalatent)
but normalized overlatent
rather thandata
. Herelatent
is a candidate sample fromself
anddata
is a ground observation of unrelated type.Returns: a pair (updated,log_normalizer)
whereupdated
is an updated distribution of typetype(self)
, andlog_normalizer
is aTensor
representing the normalization factor.

has_rsample_
(value)[source]¶ Force reparameterized or detached sampling on a single distribution instance. This sets the
.has_rsample
attribute inplace.This is useful to instruct inference algorithms to avoid reparameterized gradients for variables that discontinuously determine downstream control flow.
Parameters: value (bool) – Whether samples will be pathwise differentiable. Returns: self Return type: Distribution

rv
¶ EXPERIMENTAL Switch to the Random Variable DSL for applying transformations to random variables. Supports either chaining operations or arithmetic operator overloading.
Example usage:
# This should be equivalent to an Exponential distribution. Uniform(0, 1).rv.log().neg().dist # These two distributions Y1, Y2 should be the same X = Uniform(0, 1).rv Y1 = X.mul(4).pow(0.5).sub(1).abs().neg().dist Y2 = (abs((4*X)**(0.5)  1)).dist
Returns: A :class: ~pyro.contrib.randomvariable.random_variable.RandomVariable object wrapping this distribution. Return type: RandomVariable

TorchDistributionMixin¶

class
TorchDistributionMixin
[source]¶ Bases:
pyro.distributions.distribution.Distribution
Mixin to provide Pyro compatibility for PyTorch distributions.
You should instead use TorchDistribution for new distribution classes.
This is mainly useful for wrapping existing PyTorch distributions for use in Pyro. Derived classes must first inherit from
torch.distributions.distribution.Distribution
and then inherit fromTorchDistributionMixin
.
__call__
(sample_shape=torch.Size([]))[source]¶ Samples a random value.
This is reparameterized whenever possible, calling
rsample()
for reparameterized distributions andsample()
for nonreparameterized distributions.Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. Returns: A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape(). Return type: torch.Tensor

shape
(sample_shape=torch.Size([]))[source]¶ The tensor shape of samples from this distribution.
Samples are of shape:
d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape
Parameters: sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution. Returns: Tensor shape of samples. Return type: torch.Size

classmethod
infer_shapes
(**arg_shapes)[source]¶ Infers
batch_shape
andevent_shape
given shapes of args to__init__()
.Note
This assumes distribution shape depends only on the shapes of tensor inputs, not in the data contained in those inputs.
Parameters: **arg_shapes – Keywords mapping name of input arg to torch.Size
or tuple representing the sizes of each tensor input.Returns: A pair (batch_shape, event_shape)
of the shapes of a distribution that would be created with input args of the given shapes.Return type: tuple

expand
(batch_shape, _instance=None)[source]¶ Returns a new
ExpandedDistribution
instance with batch dimensions expanded to batch_shape.Parameters:  batch_shape (tuple) – batch shape to expand to.
 _instance – unused argument for compatibility with
torch.distributions.Distribution.expand()
Returns: an instance of ExpandedDistribution.
Return type: ExpandedDistribution

expand_by
(sample_shape)[source]¶ Expands a distribution by adding
sample_shape
to the left side of itsbatch_shape
.To expand internal dims of
self.batch_shape
from 1 to something larger, useexpand()
instead.Parameters: sample_shape (torch.Size) – The size of the iid batch to be drawn from the distribution. Returns: An expanded version of this distribution. Return type: ExpandedDistribution

to_event
(reinterpreted_batch_ndims=None)[source]¶ Reinterprets the
n
rightmost dimensions of this distributionsbatch_shape
as event dims, adding them to the left side ofevent_shape
.Example:
>>> [d1.batch_shape, d1.event_shape] [torch.Size([2, 3]), torch.Size([4, 5])] >>> d2 = d1.to_event(1) >>> [d2.batch_shape, d2.event_shape] [torch.Size([2]), torch.Size([3, 4, 5])] >>> d3 = d1.to_event(2) >>> [d3.batch_shape, d3.event_shape] [torch.Size([]), torch.Size([2, 3, 4, 5])]
Parameters: reinterpreted_batch_ndims (int) – The number of batch dimensions to reinterpret as event dimensions. May be negative to remove dimensions from an pyro.distributions.torch.Independent
. If None, convert all dimensions to event dimensions.Returns: A reshaped version of this distribution. Return type: pyro.distributions.torch.Independent

mask
(mask)[source]¶ Masks a distribution by a boolean or booleanvalued tensor that is broadcastable to the distributions
batch_shape
.Parameters: mask (bool or torch.Tensor) – A boolean or boolean valued tensor. Returns: A masked copy of this distribution. Return type: MaskedDistribution

TorchDistribution¶

class
TorchDistribution
(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)[source]¶ Bases:
torch.distributions.distribution.Distribution
,pyro.distributions.torch_distribution.TorchDistributionMixin
Base class for PyTorchcompatible distributions with Pyro support.
This should be the base class for almost all new Pyro distributions.
Note
Parameters and data should be of type
Tensor
and all methods return typeTensor
unless otherwise noted.Tensor Shapes:
TorchDistributions provide a method
.shape()
for the tensor shape of samples:x = d.sample(sample_shape) assert x.shape == d.shape(sample_shape)
Pyro follows the same distribution shape semantics as PyTorch. It distinguishes between three different roles for tensor shapes of samples:
 sample shape corresponds to the shape of the iid samples drawn from the distribution. This is taken as an argument by the distribution’s sample method.
 batch shape corresponds to nonidentical (independent) parameterizations of the distribution, inferred from the distribution’s parameter shapes. This is fixed for a distribution instance.
 event shape corresponds to the event dimensions of the distribution, which is fixed for a distribution class. These are collapsed when we try to score a sample from the distribution via d.log_prob(x).
These shapes are related by the equation:
assert d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape
Distributions provide a vectorized
log_prob()
method that evaluates the log probability density of each event in a batch independently, returning a tensor of shapesample_shape + d.batch_shape
:x = d.sample(sample_shape) assert x.shape == d.shape(sample_shape) log_p = d.log_prob(x) assert log_p.shape == sample_shape + d.batch_shape
Implementing New Distributions:
Derived classes must implement the methods
sample()
(orrsample()
if.has_rsample == True
) andlog_prob()
, and must implement the propertiesbatch_shape
, andevent_shape
. Discrete classes may also implement theenumerate_support()
method to improve gradient estimates and set.has_enumerate_support = True
.
expand
(batch_shape, _instance=None)¶ Returns a new
ExpandedDistribution
instance with batch dimensions expanded to batch_shape.Parameters:  batch_shape (tuple) – batch shape to expand to.
 _instance – unused argument for compatibility with
torch.distributions.Distribution.expand()
Returns: an instance of ExpandedDistribution.
Return type: ExpandedDistribution
AffineBeta¶

class
AffineBeta
(concentration1, concentration0, loc, scale, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.TransformedDistribution
Beta distribution scaled by
scale
and shifted byloc
:X ~ Beta(concentration1, concentration0) f(X) = loc + scale * X Y = f(X) ~ AffineBeta(concentration1, concentration0, loc, scale)
Parameters:  concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution.
 concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution.
 loc (float or torch.Tensor) – location parameter.
 scale (float or torch.Tensor) – scale parameter.

arg_constraints
= {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

concentration0
¶

concentration1
¶

high
¶

loc
¶

low
¶

mean
¶

rsample
(sample_shape=torch.Size([]))[source]¶ Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.

sample
(sample_shape=torch.Size([]))[source]¶ Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.

sample_size
¶

scale
¶

support
¶

variance
¶
AsymmetricLaplace¶

class
AsymmetricLaplace
(loc, scale, asymmetry, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Asymmetric version of the
Laplace
distribution.To the left of
loc
this acts like anExponential(1/(asymmetry*scale))
; to the right ofloc
this acts like anExponential(asymmetry/scale)
. The density is continuous so the left and right densities atloc
agree.Parameters:  loc – Location parameter, i.e. the mode.
 scale – Scale parameter = geometric mean of left and right scales.
 asymmetry – Square of ratio of left to right scales.

arg_constraints
= {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶

mean
¶

support
= Real()¶

variance
¶
AVFMultivariateNormal¶

class
AVFMultivariateNormal
(loc, scale_tril, control_var)[source]¶ Bases:
pyro.distributions.torch.MultivariateNormal
Multivariate normal (Gaussian) distribution with transport equation inspired control variates (adaptive velocity fields).
A distribution over vectors in which all the elements have a joint Gaussian density.
Parameters:  loc (torch.Tensor) – Ddimensional mean vector.
 scale_tril (torch.Tensor) – Cholesky of Covariance matrix; D x D matrix.
 control_var (torch.Tensor) – 2 x L x D tensor that parameterizes the control variate; L is an arbitrary positive integer. This parameter needs to be learned (i.e. adapted) to achieve lower variance gradients. In a typical use case this parameter will be adapted concurrently with the loc and scale_tril that define the distribution.
Example usage:
control_var = torch.tensor(0.1 * torch.ones(2, 1, D), requires_grad=True) opt_cv = torch.optim.Adam([control_var], lr=0.1, betas=(0.5, 0.999)) for _ in range(1000): d = AVFMultivariateNormal(loc, scale_tril, control_var) z = d.rsample() cost = torch.pow(z, 2.0).sum() cost.backward() opt_cv.step() opt_cv.zero_grad()

arg_constraints
= {'control_var': Real(), 'loc': Real(), 'scale_tril': LowerTriangular()}¶
BetaBinomial¶

class
BetaBinomial
(concentration1, concentration0, total_count=1, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Compound distribution comprising of a betabinomial pair. The probability of success (
probs
for theBinomial
distribution) is unknown and randomly drawn from aBeta
distribution prior to a certain number of Bernoulli trials given bytotal_count
.Parameters:  concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution.
 concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution.
 total_count (float or torch.Tensor) – Number of Bernoulli trials.

approx_log_prob_tol
= 0.0¶

arg_constraints
= {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': IntegerGreaterThan(lower_bound=0)}¶

concentration0
¶

concentration1
¶

has_enumerate_support
= True¶

mean
¶

support
¶

variance
¶
CoalescentTimes¶

class
CoalescentTimes
(leaf_times, rate=1.0, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Distribution over sorted coalescent times given irregular sampled
leaf_times
and constant population size.Sample values will be sorted sets of binary coalescent times. Each sample
value
will have cardinalityvalue.size(1) = leaf_times.size(1)  1
, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.References
 [1] J.F.C. Kingman (1982)
 “On the Genealogy of Large Populations” Journal of Applied Probability
 [2] J.F.C. Kingman (1982)
 “The Coalescent” Stochastic Processes and their Applications
Parameters:  leaf_times (torch.Tensor) – Vector of times of sampling events, i.e. leaf nodes in the phylogeny. These can be arbitrary real numbers with arbitrary order and duplicates.
 rate (torch.Tensor) – Base coalescent rate (pairwise rate of coalescence) under a constant population size model. Defaults to 1.

arg_constraints
= {'leaf_times': Real(), 'rate': GreaterThan(lower_bound=0.0)}¶

support
¶
CoalescentTimesWithRate¶

class
CoalescentTimesWithRate
(leaf_times, rate_grid, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Distribution over coalescent times given irregular sampled
leaf_times
and piecewise constant coalescent rates defined on a regular time grid.This assumes a piecewise constant base coalescent rate specified on time intervals
(inf,1]
,[1,2]
, …,[T1,inf)
, whereT = rate_grid.size(1)
. Leaves may be sampled at arbitrary real times, but are commonly sampled in the interval[0, T]
.Sample values will be sorted sets of binary coalescent times. Each sample
value
will have cardinalityvalue.size(1) = leaf_times.size(1)  1
, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.This distribution implements
log_prob()
but not.sample()
.See also
CoalescentRateLikelihood
.References
 [1] J.F.C. Kingman (1982)
 “On the Genealogy of Large Populations” Journal of Applied Probability
 [2] J.F.C. Kingman (1982)
 “The Coalescent” Stochastic Processes and their Applications
 [3] A. Popinga, T. Vaughan, T. Statler, A.J. Drummond (2014)
 “Inferring epidemiological dynamics with Bayesian coalescent inference: The merits of deterministic and stochastic models” https://arxiv.org/pdf/1407.1792.pdf
Parameters:  leaf_times (torch.Tensor) – Tensor of times of sampling events, i.e. leaf nodes in the phylogeny. These can be arbitrary real numbers with arbitrary order and duplicates.
 rate_grid (torch.Tensor) – Tensor of base coalescent rates (pairwise
rate of coalescence). For example in a simple SIR model this might be
beta S / I
. The rightmost dimension is time, and this tensor represents a (batch of) rates that are piecwise constant in time.

arg_constraints
= {'leaf_times': Real(), 'rate_grid': GreaterThan(lower_bound=0.0)}¶

duration
¶

log_prob
(value)[source]¶ Computes likelihood as in equations 78 of [3].
This has time complexity
O(T + S N log(N))
whereT
is the number of time steps,N
is the number of leaves, andS = sample_shape.numel()
is the number of samples ofvalue
.Parameters: value (torch.Tensor) – A tensor of coalescent times. These denote sets of size leaf_times.size(1)  1
along the trailing dimension and should be sorted along that dimension.Returns: Likelihood p(coal_times  leaf_times, rate_grid)
Return type: torch.Tensor

support
¶
ConditionalDistribution¶
ConditionalTransformedDistribution¶
Delta¶

class
Delta
(v, log_density=0.0, event_dim=0, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Degenerate discrete distribution (a single point).
Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.
Parameters:  v (torch.Tensor) – The single support element.
 log_density (torch.Tensor) – An optional density for this Delta. This
is useful to keep the class of
Delta
distributions closed under differentiable transformation.  event_dim (int) – Optional event dimension, defaults to zero.

arg_constraints
= {'log_density': Real(), 'v': Dependent()}¶

has_rsample
= True¶

mean
¶

support
¶

variance
¶
DirichletMultinomial¶

class
DirichletMultinomial
(concentration, total_count=1, is_sparse=False, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Compound distribution comprising of a dirichletmultinomial pair. The probability of classes (
probs
for theMultinomial
distribution) is unknown and randomly drawn from aDirichlet
distribution prior to a certain number of Categorical trials given bytotal_count
.Parameters:  or torch.Tensor concentration (float) – concentration parameter (alpha) for the Dirichlet distribution.
 or torch.Tensor total_count (int) – number of Categorical trials.
 is_sparse (bool) – Whether to assume value is mostly zero when computing
log_prob()
, which can speed up computation when data is sparse.

arg_constraints
= {'concentration': IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'total_count': IntegerGreaterThan(lower_bound=0)}¶

concentration
¶

mean
¶

support
¶

variance
¶
DiscreteHMM¶

class
DiscreteHMM
(initial_logits, transition_logits, observation_dist, validate_args=None, duration=None)[source]¶ Bases:
pyro.distributions.hmm.HiddenMarkovModel
Hidden Markov Model with discrete latent state and arbitrary observation distribution. This uses [1] to parallelize over time, achieving O(log(time)) parallel complexity.
The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_logits
andobservation_dist
. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1
, allowinglog_prob()
to work with arbitrary length data:# homogeneous + homogeneous case: event_shape = (1,) + observation_dist.event_shape
References:
 [1] Simo Sarkka, Angel F. GarciaFernandez (2019)
 “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
Parameters:  initial_logits (Tensor) – A logits tensor for an initial
categorical distribution over latent states. Should have rightmost size
state_dim
and be broadcastable tobatch_shape + (state_dim,)
.  transition_logits (Tensor) – A logits tensor for transition
conditional distributions between latent states. Should have rightmost
shape
(state_dim, state_dim)
(old, new), and be broadcastable tobatch_shape + (num_steps, state_dim, state_dim)
.  observation_dist (Distribution) – A conditional
distribution of observed data conditioned on latent state. The
.batch_shape
should have rightmost sizestate_dim
and be broadcastable tobatch_shape + (num_steps, state_dim)
. The.event_shape
may be arbitrary.  duration (int) – Optional size of the time axis
event_shape[0]
. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints
= {'initial_logits': Real(), 'transition_logits': Real()}¶

filter
(value)[source]¶ Compute posterior over final state given a sequence of observations.
Parameters: value (Tensor) – A sequence of observations. Returns: A posterior distribution over latent states at the final time step. result.logits
can then be used asinitial_logits
in a sequential Pyro model for prediction.Return type: Categorical

support
¶
EmpiricalDistribution¶

class
Empirical
(samples, log_weights, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Empirical distribution associated with the sampled data. Note that the shape requirement for log_weights is that its shape must match the leftmost shape of samples. Samples are aggregated along the
aggregation_dim
, which is the rightmost dim of log_weights.Example:
>>> emp_dist = Empirical(torch.randn(2, 3, 10), torch.ones(2, 3)) >>> emp_dist.batch_shape torch.Size([2]) >>> emp_dist.event_shape torch.Size([10])
>>> single_sample = emp_dist.sample() >>> single_sample.shape torch.Size([2, 10]) >>> batch_sample = emp_dist.sample((100,)) >>> batch_sample.shape torch.Size([100, 2, 10])
>>> emp_dist.log_prob(single_sample).shape torch.Size([2]) >>> # Vectorized samples cannot be scored by log_prob. >>> with pyro.validation_enabled(): ... emp_dist.log_prob(batch_sample).shape Traceback (most recent call last): ... ValueError: ``value.shape`` must be torch.Size([2, 10])
Parameters:  samples (torch.Tensor) – samples from the empirical distribution.
 log_weights (torch.Tensor) – log weights (optional) corresponding to the samples.

arg_constraints
= {}¶

enumerate_support
(expand=True)[source]¶ See
pyro.distributions.torch_distribution.TorchDistribution.enumerate_support()

event_shape
¶ See
pyro.distributions.torch_distribution.TorchDistribution.event_shape()

has_enumerate_support
= True¶

log_prob
(value)[source]¶ Returns the log of the probability mass function evaluated at
value
. Note that this currently only supports scoring values with emptysample_shape
.Parameters: value (torch.Tensor) – scalar or tensor value to be scored.

log_weights
¶

mean
¶ See
pyro.distributions.torch_distribution.TorchDistribution.mean()

sample
(sample_shape=torch.Size([]))[source]¶ See
pyro.distributions.torch_distribution.TorchDistribution.sample()

sample_size
¶ Number of samples that constitute the empirical distribution.
Return int: number of samples collected.

support
= Real()¶

variance
¶ See
pyro.distributions.torch_distribution.TorchDistribution.variance()
ExtendedBetaBinomial¶

class
ExtendedBetaBinomial
(concentration1, concentration0, total_count=1, validate_args=None)[source]¶ Bases:
pyro.distributions.conjugate.BetaBinomial
EXPERIMENTAL
BetaBinomial
distribution extended to have logical support the entire integers and to allow arbitrary integertotal_count
. Numerical support is still the integer interval[0, total_count]
.
arg_constraints
= {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': Integer}¶

support
= Integer¶

ExtendedBinomial¶

class
ExtendedBinomial
(total_count=1, probs=None, logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.Binomial
EXPERIMENTAL
Binomial
distribution extended to have logical support the entire integers and to allow arbitrary integertotal_count
. Numerical support is still the integer interval[0, total_count]
.
arg_constraints
= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': Integer}¶

support
= Integer¶

FoldedDistribution¶

class
FoldedDistribution
(base_dist, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.TransformedDistribution
Equivalent to
TransformedDistribution(base_dist, AbsTransform())
, but additionally supportslog_prob()
.Parameters: base_dist (Distribution) – The distribution to reflect. 
support
= GreaterThan(lower_bound=0.0)¶

GammaGaussianHMM¶

class
GammaGaussianHMM
(scale_dist, initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶ Bases:
pyro.distributions.hmm.HiddenMarkovModel
Hidden Markov Model with the joint distribution of initial state, hidden state, and observed state is a
MultivariateStudentT
distribution along the line of references [2] and [3]. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity.This GammaGaussianHMM class corresponds to the generative model:
s = Gamma(df/2, df/2).sample() z = scale(initial_dist, s).sample() x = [] for t in range(num_events): z = z @ transition_matrix + scale(transition_dist, s).sample() x.append(z @ observation_matrix + scale(observation_dist, s).sample())
where scale(mvn(loc, precision), s) := mvn(loc, s * precision).
The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_dist
andobservation_dist
. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1
, allowinglog_prob()
to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
 [1] Simo Sarkka, Angel F. GarciaFernandez (2019)
 “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
 [2] F. J. Giron and J. C. Rojano (1994)
 “Bayesian Kalman filtering with elliptically contoured errors”
 [3] Filip Tronarp, Toni Karvonen, and Simo Sarkka (2019)
 “Student’s tfilters for noise scale estimation” https://users.aalto.fi/~ssarkka/pub/SPL2019.pdf
Variables: Parameters:  scale_dist (Gamma) – Prior of the mixing distribution.
 initial_dist (MultivariateNormal) – A distribution with unit scale mixing
over initial states. This should have batch_shape broadcastable to
self.batch_shape
. This should have event_shape(hidden_dim,)
.  transition_matrix (Tensor) – A linear transformation of hidden
state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)
where the rightmost dims are ordered(old, new)
.  transition_dist (MultivariateNormal) – A process noise distribution
with unit scale mixing. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(hidden_dim,)
.  observation_matrix (Tensor) – A linear transformation from hidden
to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim)
.  observation_dist (MultivariateNormal) – An observation noise distribution
with unit scale mixing. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(obs_dim,)
.  duration (int) – Optional size of the time axis
event_shape[0]
. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints
= {}¶

filter
(value)[source]¶ Compute posteriors over the multiplier and the final state given a sequence of observations. The posterior is a pair of Gamma and MultivariateNormal distributions (i.e. a GammaGaussian instance).
Parameters: value (Tensor) – A sequence of observations. Returns: A pair of posterior distributions over the mixing and the latent state at the final time step. Return type: a tuple of ~pyro.distributions.Gamma and ~pyro.distributions.MultivariateNormal

support
= IndependentConstraint(Real(), 2)¶
GammaPoisson¶

class
GammaPoisson
(concentration, rate, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Compound distribution comprising of a gammapoisson pair, also referred to as a gammapoisson mixture. The
rate
parameter for thePoisson
distribution is unknown and randomly drawn from aGamma
distribution.Note
This can be treated as an alternate parametrization of the
NegativeBinomial
(total_count
,probs
) distribution, with concentration = total_count and rate = (1  probs) / probs.Parameters: 
arg_constraints
= {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}¶

concentration
¶

mean
¶

rate
¶

support
= IntegerGreaterThan(lower_bound=0)¶

variance
¶

GaussianHMM¶

class
GaussianHMM
(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶ Bases:
pyro.distributions.hmm.HiddenMarkovModel
Hidden Markov Model with Gaussians for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure
log_prob()
is differentiable.This corresponds to the generative model:
z = initial_distribution.sample() x = [] for t in range(num_events): z = z @ transition_matrix + transition_dist.sample() x.append(z @ observation_matrix + observation_dist.sample())
The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_dist
andobservation_dist
. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1
, allowinglog_prob()
to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
 [1] Simo Sarkka, Angel F. GarciaFernandez (2019)
 “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
Variables: Parameters:  initial_dist (MultivariateNormal) – A distribution
over initial states. This should have batch_shape broadcastable to
self.batch_shape
. This should have event_shape(hidden_dim,)
.  transition_matrix (Tensor) – A linear transformation of hidden
state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)
where the rightmost dims are ordered(old, new)
.  transition_dist (MultivariateNormal) – A process
noise distribution. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(hidden_dim,)
.  observation_matrix (Tensor) – A linear transformation from hidden
to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim)
.  observation_dist (MultivariateNormal or
Normal) – An observation noise distribution. This should
have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(obs_dim,)
.  duration (int) – Optional size of the time axis
event_shape[0]
. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints
= {}¶

conjugate_update
(other)[source]¶ EXPERIMENTAL Creates an updated
GaussianHMM
fusing information from another compatible distribution.This should satisfy:
fg, log_normalizer = f.conjugate_update(g) assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer
Parameters: other (MultivariateNormal or Normal) – A distribution representing p(dataself.probs)
but normalized overself.probs
rather thandata
.Returns: a pair (updated,log_normalizer)
whereupdated
is an updatedGaussianHMM
, andlog_normalizer
is aTensor
representing the normalization factor.

filter
(value)[source]¶ Compute posterior over final state given a sequence of observations.
Parameters: value (Tensor) – A sequence of observations. Returns: A posterior distribution over latent states at the final time step. result
can then be used asinitial_dist
in a sequential Pyro model for prediction.Return type: MultivariateNormal

has_rsample
= True¶

prefix_condition
(data)[source]¶ EXPERIMENTAL Given self has
event_shape == (t+f, d)
and datax
of shapebatch_shape + (t, d)
, compute a conditional distribution of event_shape(f, d)
. Typicallyt
is the number of training time steps,f
is the number of forecast time steps, andd
is the data dimension.Parameters: data (Tensor) – data of dimension at least 2.

rsample_posterior
(value, sample_shape=torch.Size([]))[source]¶ EXPERIMENTAL Sample from the latent state conditioned on observation.

support
= IndependentConstraint(Real(), 2)¶
GaussianMRF¶

class
GaussianMRF
(initial_dist, transition_dist, observation_dist, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Temporal Markov Random Field with Gaussian factors for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure
log_prob()
is differentiable.The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_dist
andobservation_dist
. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1
, allowinglog_prob()
to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
 [1] Simo Sarkka, Angel F. GarciaFernandez (2019)
 “Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
Variables: Parameters:  initial_dist (MultivariateNormal) – A distribution
over initial states. This should have batch_shape broadcastable to
self.batch_shape
. This should have event_shape(hidden_dim,)
.  transition_dist (MultivariateNormal) – A joint
distribution factor over a pair of successive time steps. This should
have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(hidden_dim + hidden_dim,)
(old+new).  observation_dist (MultivariateNormal) – A joint
distribution factor over a hidden and an observed state. This should
have batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(hidden_dim + obs_dim,)
.

arg_constraints
= {}¶

support
¶
GaussianScaleMixture¶

class
GaussianScaleMixture
(coord_scale, component_logits, component_scale)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Mixture of Normal distributions with zero mean and diagonal covariance matrices.
That is, this distribution is a mixture with K components, where each component distribution is a Ddimensional Normal distribution with zero mean and a Ddimensional diagonal covariance matrix. The K different covariance matrices are controlled by the parameters coord_scale and component_scale. That is, the covariance matrix of the k’th component is given by
Sigma_ii = (component_scale_k * coord_scale_i) ** 2 (i = 1, …, D)
where component_scale_k is a positive scale factor and coord_scale_i are positive scale parameters shared between all K components. The mixture weights are controlled by a Kdimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution. This distribution does not currently support batched parameters.
See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research.
[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856
Note that this distribution supports both even and odd dimensions, but the former should be more a bit higher precision, since it doesn’t use any erfs in the backward call. Also note that this distribution does not support D = 1.
Parameters:  coord_scale (torch.tensor) – Ddimensional vector of scales
 component_logits (torch.tensor) – Kdimensional vector of logits
 component_scale (torch.tensor) – Kdimensional vector of scale multipliers

arg_constraints
= {'component_logits': Real(), 'component_scale': GreaterThan(lower_bound=0.0), 'coord_scale': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶
ImproperUniform¶

class
ImproperUniform
(support, batch_shape, event_shape)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Improper distribution with zero
log_prob()
and undefinedsample()
.This is useful for transforming a model from generative dag form to factor graph form for use in HMC. For example the following are equal in distribution:
# Version 1. a generative dag x = pyro.sample("x", Normal(0, 1)) y = pyro.sample("y", Normal(x, 1)) z = pyro.sample("z", Normal(y, 1)) # Version 2. a factor graph xyz = pyro.sample("xyz", ImproperUniform(constraints.real, (), (3,))) x, y, z = xyz.unbind(1) pyro.sample("x", Normal(0, 1), obs=x) pyro.sample("y", Normal(x, 1), obs=y) pyro.sample("z", Normal(y, 1), obs=z)
Note this distribution errors when
sample()
is called. To create a similar distribution that instead samples from a specified distribution consider using.mask(False)
as in:xyz = dist.Normal(0, 1).expand([3]).to_event(1).mask(False)
Parameters:  support (Constraint) – The support of the distribution.
 batch_shape (torch.Size) – The batch shape.
 event_shape (torch.Size) – The event shape.

arg_constraints
= {}¶

support
¶
IndependentHMM¶

class
IndependentHMM
(base_dist)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Wrapper class to treat a batch of independent univariate HMMs as a single multivariate distribution. This converts distribution shapes as follows:
.batch_shape .event_shape base_dist shape + (obs_dim,) (duration, 1) result shape (duration, obs_dim) Parameters: base_dist (HiddenMarkovModel) – A base hidden Markov model instance. 
arg_constraints
= {}¶

duration
¶

has_rsample
¶

support
¶

InverseGamma¶

class
InverseGamma
(concentration, rate, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.TransformedDistribution
Creates an inversegamma distribution parameterized by concentration and rate.
X ~ Gamma(concentration, rate) Y = 1/X ~ InverseGamma(concentration, rate)Parameters:  concentration (torch.Tensor) – the concentration parameter (i.e. alpha).
 rate (torch.Tensor) – the rate parameter (i.e. beta).

arg_constraints
= {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}¶

concentration
¶

has_rsample
= True¶

rate
¶

support
= GreaterThan(lower_bound=0.0)¶
LinearHMM¶

class
LinearHMM
(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]¶ Bases:
pyro.distributions.hmm.HiddenMarkovModel
Hidden Markov Model with linear dynamics and observations and arbitrary noise for initial, transition, and observation distributions. Each of those distributions can be e.g.
MultivariateNormal
orIndependent
ofNormal
,StudentT
, orStable
. Additionally the observation distribution may be constrained, e.g.LogNormal
This corresponds to the generative model:
z = initial_distribution.sample() x = [] for t in range(num_events): z = z @ transition_matrix + transition_dist.sample() y = z @ observation_matrix + obs_base_dist.sample() x.append(obs_transform(y))
where
observation_dist
is split intoobs_base_dist
and an optionalobs_transform
(defaulting to the identity).This implements a reparameterized
rsample()
method but does not implement alog_prob()
method. Derived classes may implementlog_prob()
.Inference without
log_prob()
can be performed using either reparameterization withLinearHMMReparam
or likelihoodfree algorithms such asEnergyDistance
. Note that while stable processes generally require a common shared stability parameter \(\alpha\) , this distribution and the above inference algorithms allow heterogeneous stability parameters.The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_dist
andobservation_dist
. However at least one of the distributions or matrices must be expanded to contain the time dimension.Variables: Parameters:  initial_dist – A distribution over initial states. This should have
batch_shape broadcastable to
self.batch_shape
. This should have event_shape(hidden_dim,)
.  transition_matrix (Tensor) – A linear transformation of hidden
state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)
where the rightmost dims are ordered(old, new)
.  transition_dist – A distribution over process noise. This should have
batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(hidden_dim,)
.  observation_matrix (Tensor) – A linear transformation from hidden
to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim)
.  observation_dist – A observation noise distribution. This should have
batch_shape broadcastable to
self.batch_shape + (num_steps,)
. This should have event_shape(obs_dim,)
.  duration (int) – Optional size of the time axis
event_shape[0]
. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.

arg_constraints
= {}¶

has_rsample
= True¶

support
¶
 initial_dist – A distribution over initial states. This should have
batch_shape broadcastable to
LKJ¶

class
LKJ
(dim, concentration=1.0, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.TransformedDistribution
LKJ distribution for correlation matrices. The distribution is controlled by
concentration
parameter \(\eta\) to make the probability of the correlation matrix \(M\) propotional to \(\det(M)^{\eta  1}\). Because of that, whenconcentration == 1
, we have a uniform distribution over correlation matrices.When
concentration > 1
, the distribution favors samples with large large determinent. This is useful when we know a priori that the underlying variables are not correlated. Whenconcentration < 1
, the distribution favors samples with small determinent. This is useful when we know a priori that some underlying variables are correlated.Parameters:  dimension (int) – dimension of the matrices
 concentration (ndarray) – concentration/shape parameter of the distribution (often referred to as eta)
References
[1] Generating random correlation matrices based on vines and extended onion method, Daniel Lewandowski, Dorota Kurowicka, Harry Joe

arg_constraints
= {'concentration': GreaterThan(lower_bound=0.0)}¶

mean
¶

support
= CorrMatrix()¶
LKJCorrCholesky¶
Logistic¶

class
Logistic
(loc, scale, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Logistic distribution.
This is a smooth distribution with symmetric asymptotically exponential tails and a concave log density. For standard
loc=0
,scale=1
, the density is given by\[p(x) = \frac {e^{x}} {(1 + e^{x})^2}\]Like the
Laplace
density, this density has the heaviest possible tails (asymptotically) while still being logconvex. Unlike theLaplace
distribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.Parameters:  loc – Location parameter.
 scale – Scale parameter.

arg_constraints
= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶

mean
¶

support
= Real()¶

variance
¶
MaskedDistribution¶

class
MaskedDistribution
(base_dist, mask)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Masks a distribution by a boolean tensor that is broadcastable to the distribution’s
batch_shape
.In the special case
mask is False
, computation oflog_prob()
,score_parts()
, andkl_divergence()
is skipped, and constant zero values are returned instead.Parameters: mask (torch.Tensor or bool) – A boolean or booleanvalued tensor. 
arg_constraints
= {}¶

has_enumerate_support
¶

has_rsample
¶

mean
¶

support
¶

variance
¶

MaskedMixture¶

class
MaskedMixture
(mask, component0, component1, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
A masked deterministic mixture of two distributions.
This is useful when the mask is sampled from another distribution, possibly correlated across the batch. Often the mask can be marginalized out via enumeration.
Example:
change_point = pyro.sample("change_point", dist.Categorical(torch.ones(len(data) + 1)), infer={'enumerate': 'parallel'}) mask = torch.arange(len(data), dtype=torch.long) >= changepoint with pyro.plate("data", len(data)): pyro.sample("obs", MaskedMixture(mask, dist1, dist2), obs=data)
Parameters:  mask (torch.Tensor) – A boolean tensor toggling between
component0
andcomponent1
.  component0 (pyro.distributions.TorchDistribution) – a distribution
for batch elements
mask == False
.  component1 (pyro.distributions.TorchDistribution) – a distribution
for batch elements
mask == True
.

arg_constraints
= {}¶

has_rsample
¶

support
¶
 mask (torch.Tensor) – A boolean tensor toggling between
MixtureOfDiagNormals¶

class
MixtureOfDiagNormals
(locs, coord_scale, component_logits)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Mixture of Normal distributions with arbitrary means and arbitrary diagonal covariance matrices.
That is, this distribution is a mixture with K components, where each component distribution is a Ddimensional Normal distribution with a Ddimensional mean parameter and a Ddimensional diagonal covariance matrix. The K different component means are gathered into the K x D dimensional parameter locs and the K different scale parameters are gathered into the K x D dimensional parameter coord_scale. The mixture weights are controlled by a Kdimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution.
See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research. Note that this distribution does not support dimension D = 1.
[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856
Parameters:  locs (torch.Tensor) – K x D mean matrix
 coord_scale (torch.Tensor) – K x D scale matrix
 component_logits (torch.Tensor) – Kdimensional vector of softmax logits

arg_constraints
= {'component_logits': Real(), 'coord_scale': GreaterThan(lower_bound=0.0), 'locs': Real()}¶

has_rsample
= True¶
MultivariateStudentT¶

class
MultivariateStudentT
(df, loc, scale_tril, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Creates a multivariate Student’s tdistribution parameterized by degree of freedom
df
, meanloc
and scalescale_tril
.Parameters: 
arg_constraints
= {'df': GreaterThan(lower_bound=0.0), 'loc': IndependentConstraint(Real(), 1), 'scale_tril': LowerCholesky()}¶

has_rsample
= True¶

mean
¶

support
= IndependentConstraint(Real(), 1)¶

variance
¶

OMTMultivariateNormal¶

class
OMTMultivariateNormal
(loc, scale_tril)[source]¶ Bases:
pyro.distributions.torch.MultivariateNormal
Multivariate normal (Gaussian) distribution with OMT gradients w.r.t. both parameters. Note the gradient computation w.r.t. the Cholesky factor has cost O(D^3), although the resulting gradient variance is generally expected to be lower.
A distribution over vectors in which all the elements have a joint Gaussian density.
Parameters:  loc (torch.Tensor) – Mean.
 scale_tril (torch.Tensor) – Cholesky of Covariance matrix.

arg_constraints
= {'loc': Real(), 'scale_tril': LowerTriangular()}¶
OneOneMatching¶

class
OneOneMatching
(logits, *, bp_iters=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Random perfect matching from
N
sources toN
destinations where each source matches exactly one destination and each destination matches exactly one source.Samples are represented as long tensors of shape
(N,)
taking values in{0,...,N1}
and satisfying the above oneone constraint. The log probability of a samplev
is the sum of edge logits, up to the log partition functionlog Z
:\[\log p(v) = \sum_s \text{logits}[s, v[s]]  \log Z\]Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the
bp_iters
argument. Thelog_partition_function()
andlog_prob()
methods use a Bethe approximation [1,2,3,4].References:
 [1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008)
 “Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
 [2] Bert Huang, Tony Jebara (2009)
 “Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
 [3] Pascal O. Vontobel (2012)
 “The Bethe Permanent of a NonNegative Matrix” https://arxiv.org/pdf/1107.4196.pdf
 [4] M Chertkov, AB Yedidia (2013)
 “Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf
Parameters:  logits (Tensor) – An
(N, N)
shaped tensor of edge logits.  bp_iters (int) – Optional number of belief propagation iterations. If
unspecified or
None
expensive exact algorithms will be used.

arg_constraints
= {'logits': Real()}¶

has_enumerate_support
= True¶

mode
()[source]¶ Computes a maximum probability matching.
Note
This requires the lap package and runs on CPU.

support
¶
OneTwoMatching¶

class
OneTwoMatching
(logits, *, bp_iters=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Random matching from
2*N
sources toN
destinations where each source matches exactly one destination and each destination matches exactly two sources.Samples are represented as long tensors of shape
(2*N,)
taking values in{0,...,N1}
and satisfying the above onetwo constraint. The log probability of a samplev
is the sum of edge logits, up to the log partition functionlog Z
:\[\log p(v) = \sum_s \text{logits}[s, v[s]]  \log Z\]Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the
bp_iters
argument. Thelog_partition_function()
andlog_prob()
methods use a Bethe approximation [1,2,3,4].References:
 [1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008)
 “Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
 [2] Bert Huang, Tony Jebara (2009)
 “Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
 [3] Pascal O. Vontobel (2012)
 “The Bethe Permanent of a NonNegative Matrix” https://arxiv.org/pdf/1107.4196.pdf
 [4] M Chertkov, AB Yedidia (2013)
 “Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf
Parameters:  logits (Tensor) – An
(2 * N, N)
shaped tensor of edge logits.  bp_iters (int) – Optional number of belief propagation iterations. If
unspecified or
None
expensive exact algorithms will be used.

arg_constraints
= {'logits': Real()}¶

has_enumerate_support
= True¶

mode
()[source]¶ Computes a maximum probability matching.
Note
This requires the lap package and runs on CPU.

support
¶
OrderedLogistic¶

class
OrderedLogistic
(predictor, cutpoints, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.Categorical
Alternative parametrization of the distribution over a categorical variable.
Instead of the typical parametrization of a categorical variable in terms of the probability mass of the individual categories
p
, this provides an alternative that is useful in specifying ordered categorical models. This accepts a vector ofcutpoints
which are an ordered vector of real numbers denoting baseline cumulative logodds of the individual categories, and a model vectorpredictor
which modifies the baselines for each sample individually.These cumulative logodds are then transformed into a discrete cumulative probability distribution, that is finally differenced to return the probability mass matrix
p
that specifies the categorical distribution.Parameters:  predictor (Tensor) – A tensor of predictor variables of arbitrary
shape. The output shape of nonbatched samples from this distribution will
be the same shape as
predictor
.  cutpoints (Tensor) – A tensor of cutpoints that are used to determine the
cumulative probability of each entry in
predictor
belonging to a given category. The first cutpoints.ndim1 dimensions must be broadcastable topredictor
, and the 1 dimension is monotonically increasing.

arg_constraints
= {'cutpoints': OrderedVector(), 'predictor': Real()}¶
 predictor (Tensor) – A tensor of predictor variables of arbitrary
shape. The output shape of nonbatched samples from this distribution will
be the same shape as
ProjectedNormal¶

class
ProjectedNormal
(concentration, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Projected isotropic normal distribution of arbitrary dimension.
This distribution over directional data is qualitatively similar to the von Mises and von MisesFisher distributions, but permits tractable variational inference via reparametrized gradients.
To use this distribution with autoguides, use
poutine.reparam
with aProjectedNormalReparam
reparametrizer in the model, e.g.:@poutine.reparam(config={"direction": ProjectedNormalReparam()}) def model(): direction = pyro.sample("direction", ProjectedNormal(torch.zeros(3))) ...
Note
This implements
log_prob()
only for dimensions {2,3}. [1] D. HernandezStumpfhauser, F.J. Breidt, M.J. van der Woerd (2017)
 “The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference” https://projecteuclid.org/euclid.ba/1453211962

arg_constraints
= {'concentration': IndependentConstraint(Real(), 1)}¶

has_rsample
= True¶

mean
¶ Note this is the mean in the sense of a centroid in the submanifold that minimizes expected squared geodesic distance.

mode
¶

support
= Sphere¶
RelaxedBernoulliStraightThrough¶

class
RelaxedBernoulliStraightThrough
(temperature, probs=None, logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.RelaxedBernoulli
An implementation of
RelaxedBernoulli
with a straightthrough gradient estimator.This distribution has the following properties:
 The samples returned by the
rsample()
method are discrete/quantized.  The
log_prob()
method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.  In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.
References:
 [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
 Chris J. Maddison, Andriy Mnih, Yee Whye Teh
 [2] Categorical Reparameterization with GumbelSoftmax,
 Eric Jang, Shixiang Gu, Ben Poole
 The samples returned by the
RelaxedOneHotCategoricalStraightThrough¶

class
RelaxedOneHotCategoricalStraightThrough
(temperature, probs=None, logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch.RelaxedOneHotCategorical
An implementation of
RelaxedOneHotCategorical
with a straightthrough gradient estimator.This distribution has the following properties:
 The samples returned by the
rsample()
method are discrete/quantized.  The
log_prob()
method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.  In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.
References:
 [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
 Chris J. Maddison, Andriy Mnih, Yee Whye Teh
 [2] Categorical Reparameterization with GumbelSoftmax,
 Eric Jang, Shixiang Gu, Ben Poole
 The samples returned by the
Rejector¶

class
Rejector
(propose, log_prob_accept, log_scale, *, batch_shape=None, event_shape=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Rejection sampled distribution given an acceptance rate function.
Parameters:  propose (Distribution) – A proposal distribution that samples batched
proposals via
propose()
.rsample()
supports asample_shape
arg only ifpropose()
supports asample_shape
arg.  log_prob_accept (callable) – A callable that inputs a batch of proposals and returns a batch of log acceptance probabilities.
 log_scale – Total log probability of acceptance.

arg_constraints
= {}¶

has_rsample
= True¶
 propose (Distribution) – A proposal distribution that samples batched
proposals via
SineBivariateVonMises¶

class
SineBivariateVonMises
(phi_loc, psi_loc, phi_concentration, psi_concentration, correlation=None, weighted_correlation=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Unimodal distribution of two dependent angles on the 2torus (S^1 ⨂ S^1) given by
\[C^{1}\exp(\kappa_1\cos(x\mu_1) + \kappa_2\cos(x_2 \mu_2) + \rho\sin(x_1  \mu_1)\sin(x_2  \mu_2))\]and
\[C = (2\pi)^2 \sum_{i=0} {2i \choose i} \left(\frac{\rho^2}{4\kappa_1\kappa_2}\right)^i I_i(\kappa_1)I_i(\kappa_2),\]where I_i(cdot) is the modified bessel function of first kind, mu’s are the locations of the distribution, kappa’s are the concentration and rho gives the correlation between angles x_1 and x_2.
This distribution is a submodel of the Bivariate von Mises distribution, called the Sine Distribution [2] in directional statistics.
This distribution is helpful for modeling coupled angles such as torsion angles in peptide chains. To infer parameters, use
NUTS
orHMC
with priors that avoid parameterizations where the distribution becomes bimodal; see note below.Note
Sample efficiency drops as
\[\frac{\rho}{\kappa_1\kappa_2} \rightarrow 1\]because the distribution becomes increasingly bimodal.
Note
The correlation and weighted_correlation params are mutually exclusive.
Note
In the context of
SVI
, this distribution can be used as a likelihood but not for latent variables. ** References: **
 Probabilistic model for two dependent circular variables Singh, H., Hnizdo, V., and Demchuck, E. (2002)
 Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data, Mardia, K. V, Taylor, T. C., and Subramaniam, G. (2007)
Parameters:  phi_loc (torch.Tensor) – location of first angle
 psi_loc (torch.Tensor) – location of second angle
 phi_concentration (torch.Tensor) – concentration of first angle
 psi_concentration (torch.Tensor) – concentration of second angle
 correlation (torch.Tensor) – correlation between the two angles
 weighted_correlation (torch.Tensor) – set correlation to weigthed_corr * sqrt(phi_conc*psi_conc) to avoid bimodality (see note).

arg_constraints
= {'correlation': Real(), 'phi_concentration': GreaterThan(lower_bound=0.0), 'phi_loc': Real(), 'psi_concentration': GreaterThan(lower_bound=0.0), 'psi_loc': Real()}¶

max_sample_iter
= 1000¶

mean
¶

sample
(sample_shape=torch.Size([]))[source]¶  ** References: **
 A New Unified Approach for the Simulation of aWide Class of Directional Distributions John T. Kent, Asaad M. Ganeiber & Kanti V. Mardia (2018)

support
= IndependentConstraint(Real(), 1)¶
SineSkewed¶

class
SineSkewed
(base_dist: pyro.distributions.torch_distribution.TorchDistribution, skewness, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Sine Skewing [1] is a procedure for producing a distribution that breaks pointwise symmetry on a torus distribution. The new distribution is called the Sine Skewed X distribution, where X is the name of the (symmetric) base distribution.
Torus distributions are distributions with support on products of circles (i.e., ⨂^d S^1 where S^1=[pi,pi) ). So, a 0torus is a point, the 1torus is a circle, and the 2torus is commonly associated with the donut shape.
The Sine Skewed X distribution is parameterized by a weight parameter for each dimension of the event of X. For example with a von Mises distribution over a circle (1torus), the Sine Skewed von Mises Distribution has one skew parameter. The skewness parameters can be inferred using
HMC
orNUTS
. For example, the following will produce a uniform prior over skewness for the 2torus,:def model(obs): # Sine priors phi_loc = pyro.sample('phi_loc', VonMises(pi, 2.)) psi_loc = pyro.sample('psi_loc', VonMises(pi / 2, 2.)) phi_conc = pyro.sample('phi_conc', Beta(halpha_phi, beta_prec_phi  halpha_phi)) psi_conc = pyro.sample('psi_conc', Beta(halpha_psi, beta_prec_psi  halpha_psi)) corr_scale = pyro.sample('corr_scale', Beta(2., 5.)) # SS prior skew_phi = pyro.sample('skew_phi', Uniform(1., 1.)) psi_bound = 1  skew_phi.abs() skew_psi = pyro.sample('skew_psi', Uniform(1., 1.)) skewness = torch.stack((skew_phi, psi_bound * skew_psi), dim=1) assert skewness.shape == (num_mix_comp, 2) with pyro.plate('obs_plate'): sine = SineBivariateVonMises(phi_loc=phi_loc, psi_loc=psi_loc, phi_concentration=1000 * phi_conc, psi_concentration=1000 * psi_conc, weighted_correlation=corr_scale) return pyro.sample('phi_psi', SineSkewed(sine, skewness), obs=obs)
To ensure the skewing does not alter the normalization constant of the (Sine Bivaraite von Mises) base distribution the skewness parameters are constraint. The constraint requires the sum of the absolute values of skewness to be less than or equal to one. So for the above snippet it must hold that:
skew_phi.abs()+skew_psi.abs() <= 1
We handle this in the prior by computing psi_bound and use it to scale skew_psi. We do not use psi_bound as:
skew_psi = pyro.sample('skew_psi', Uniform(psi_bound, psi_bound))
as it would make the support for the Uniform distribution dynamic.
In the context of
SVI
, this distribution can freely be used as a likelihood, but use as latent variables it will lead to slow inference for 2 and higher dim toruses. This is because the base_dist cannot be reparameterized.Note
An event in the base distribution must be on a dtorus, so the event_shape must be (d,).
Note
For the skewness parameter, it must hold that the sum of the absolute value of its weights for an event must be less than or equal to one. See eq. 2.1 in [1].
 ** References: **
 Sineskewed toroidal distributions and their application in protein bioinformatics AmeijeirasAlonso, J., Ley, C. (2019)
Parameters:  base_dist (torch.distributions.Distribution) – base density on a ddimensional torus. Supported base
distributions include: 1D
VonMises
,SineBivariateVonMises
, 1DProjectedNormal
, andUniform
(pi, pi).  skewness (torch.tensor) – skewness of the distribution.

arg_constraints
= {'skewness': IndependentConstraint(Interval(lower_bound=1.0, upper_bound=1.0), 1)}¶

support
= IndependentConstraint(Real(), 1)¶
SkewLogistic¶

class
SkewLogistic
(loc, scale, asymmetry=1.0, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Skewed generalization of the Logistic distribution (Type I in [1]).
This is a smooth distribution with asymptotically exponential tails and a concave log density. For standard
loc=0
,scale=1
,asymmetry=α
the density is given by\[p(x;\alpha) = \frac {\alpha e^{x}} {(1 + e^{x})^{\alpha+1}}\]Like the
AsymmetricLaplace
density, this density has the heaviest possible tails (asymptotically) while still being logconvex. Unlike theAsymmetricLaplace
distribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.References
 [1] Generalized logistic distribution
 https://en.wikipedia.org/wiki/Generalized_logistic_distribution
Parameters:  loc – Location parameter.
 scale – Scale parameter.
 asymmetry – Asymmetry parameter (positive). The distribution skews
right when
asymmetry > 1
and left whenasymmetry < 1
. Defaults toasymmetry = 1
corresponding to the standard Logistic distribution.

arg_constraints
= {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶

support
= Real()¶
SoftAsymmetricLaplace¶

class
SoftAsymmetricLaplace
(loc, scale, asymmetry=1.0, softness=1.0, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Soft asymmetric version of the
Laplace
distribution.This has a smooth (infinitely differentiable) density with two asymmetric asymptotically exponential tails, one on the left and one on the right. In the limit of
softness → 0
, this converges in distribution to theAsymmetricLaplace
distribution.This is equivalent to the sum of three random variables
z  u + v
where:z ~ Normal(loc, scale * softness) u ~ Exponential(1 / (scale * asymmetry)) v ~ Exponential(asymetry / scale)
This is also equivalent the sum of two random variables
z + a
where:z ~ Normal(loc, scale * softness) a ~ AsymmetricLaplace(0, scale, asymmetry)
Parameters:  loc – Location parameter, i.e. the mode.
 scale – Scale parameter = geometric mean of left and right scales.
 asymmetry – Square of ratio of left to right scales. Defaults to 1.
 softness – Scale parameter of the Gaussian smoother. Defaults to 1.

arg_constraints
= {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0), 'softness': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶

mean
¶

support
= Real()¶

variance
¶
SoftLaplace¶

class
SoftLaplace
(loc, scale, *, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Smooth distribution with Laplacelike tail behavior.
This distribution corresponds to the logconvex density:
z = (value  loc) / scale log_prob = log(2 / pi)  log(scale)  logaddexp(z, z)
Like the Laplace density, this density has the heaviest possible tails (asymptotically) while still being logconvex. Unlike the Laplace distribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.
Parameters:  loc – Location parameter.
 scale – Scale parameter.

arg_constraints
= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶

has_rsample
= True¶

mean
¶

support
= Real()¶

variance
¶
SpanningTree¶

class
SpanningTree
(edge_logits, sampler_options=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Distribution over spanning trees on a fixed number
V
of vertices.A tree is represented as
torch.LongTensor
edges
of shape(V1,2)
satisfying the following properties: The edges constitute a tree, i.e. are connected and cycle free.
 Each edge
(v1,v2) = edges[e]
is sorted, i.e.v1 < v2
.  The entire tensor is sorted in colexicographic order.
Use
validate_edges()
to verify edges are correctly formed.The
edge_logits
tensor has one entry for each of theV*(V1)//2
edges in the complete graph onV
vertices, where edges are each sorted and the edge order is colexicographic:(0,1), (0,2), (1,2), (0,3), (1,3), (2,3), (0,4), (1,4), (2,4), ...
This ordering corresponds to the sizeindependent pairing function:
k = v1 + v2 * (v2  1) // 2
where
k
is the rank of the edge(v1,v2)
in the complete graph. To convert a matrix of edge logits to the linear representation used here:assert my_matrix.shape == (V, V) i, j = make_complete_graph(V) edge_logits = my_matrix[i, j]
Parameters:  edge_logits (torch.Tensor) – A tensor of length
V*(V1)//2
containing logits (aka negative energies) of all edges in the complete graph onV
vertices. See above comment for edge ordering.  sampler_options (dict) – An optional dict of sampler options including:
mcmc_steps
defaulting to a single MCMC step (which is pretty good);initial_edges
defaulting to a cheap approximate sample;backend
one of “python” or “cpp”, defaulting to “python”.

arg_constraints
= {'edge_logits': Real()}¶

edge_mean
¶ Computes marginal probabilities of each edge being active.
Note
This is similar to other distributions’
.mean()
method, but with a different shape because this distribution’s values are not encoded as binary matrices.Returns: A symmetric square (V,V)
shaped matrix with values in[0,1]
denoting the marginal probability of each edge being in a sampled value.Return type: Tensor

enumerate_support
(expand=True)[source]¶ This is implemented for trees with up to 6 vertices (and 5 edges).

has_enumerate_support
= True¶

mode
¶ Returns: The maximum weight spanning tree. Return type: Tensor

sample
(sample_shape=torch.Size([]))[source]¶ This sampler is implemented using MCMC run for a small number of steps after being initialized by a cheap approximate sampler. This sampler is approximate and cubic time. This is faster than the classic AldousBroder sampler [1,2], especially for graphs with large mixing time. Recent research [3,4] proposes samplers that run in submatrixmultiply time but are more complex to implement.
References
 [1] Generating random spanning trees
 Andrei Broder (1989)
 [2] The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees,
 David J. Aldous (1990)
 [3] Sampling Random Spanning Trees Faster than Matrix Multiplication,
 David Durfee, Rasmus Kyng, John Peebles, Anup B. Rao, Sushant Sachdeva (2017) https://arxiv.org/abs/1611.07451
 [4] An almostlinear time algorithm for uniform random spanning tree generation,
 Aaron Schild (2017) https://arxiv.org/abs/1711.06455

support
= IntegerGreaterThan(lower_bound=0)¶

validate_edges
(edges)[source]¶ Validates a batch of
edges
tensors, as returned bysample()
orenumerate_support()
or as input tolog_prob()
.Parameters: edges (torch.LongTensor) – A batch of edges. Raises: ValueError Returns: None
Stable¶

class
Stable
(stability, skew, scale=1.0, loc=0.0, coords='S0', validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Levy \(\alpha\)stable distribution. See [1] for a review.
This uses Nolan’s parametrization [2] of the
loc
parameter, which is required for continuity and differentiability. This corresponds to the notation \(S^0_\alpha(\beta,\sigma,\mu_0)\) of [1], where \(\alpha\) = stability, \(\beta\) = skew, \(\sigma\) = scale, and \(\mu_0\) = loc. To instead use the S parameterization as in scipy, passcoords="S"
, but BEWARE this is discontinuous atstability=1
and has poor geometry for inference.This implements a reparametrized sampler
rsample()
, but does not implementlog_prob()
. Inference can be performed using either likelihoodfree algorithms such asEnergyDistance
, or reparameterization via thereparam()
handler with one of the reparameterizersLatentStableReparam
,SymmetricStableReparam
, orStableReparam
e.g.:with poutine.reparam(config={"x": StableReparam()}): pyro.sample("x", Stable(stability, skew, scale, loc))
 [1] S. Borak, W. Hardle, R. Weron (2005).
 Stable distributions. https://edoc.huberlin.de/bitstream/handle/18452/4526/8.pdf
 [2] J.P. Nolan (1997).
 Numerical calculation of stable densities and distribution functions.
 [3] Rafal Weron (1996).
 On the ChambersMallowsStuck Method for Simulating Skewed Stable Random Variables.
 [4] J.P. Nolan (2017).
 Stable Distributions: Models for Heavy Tailed Data. http://fs2.american.edu/jpnolan/www/stable/chap1.pdf
Parameters:  stability (Tensor) – Levy stability parameter \(\alpha\in(0,2]\) .
 skew (Tensor) – Skewness \(\beta\in[1,1]\) .
 scale (Tensor) – Scale \(\sigma > 0\) . Defaults to 1.
 loc (Tensor) – Location \(\mu_0\) when using Nolan’s S0 parametrization [2], or \(\mu\) when using the S parameterization. Defaults to 0.
 coords (str) – Either “S0” (default) to use Nolan’s continuous S0 parametrization, or “S” to use the discontinuous parameterization.

arg_constraints
= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0), 'skew': Interval(lower_bound=1, upper_bound=1), 'stability': Interval(lower_bound=0, upper_bound=2)}¶

has_rsample
= True¶

mean
¶

support
= Real()¶

variance
¶
TruncatedPolyaGamma¶

class
TruncatedPolyaGamma
(prototype, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
This is a PolyaGamma(1, 0) distribution truncated to have finite support in the interval (0, 2.5). See [1] for details. As a consequence of the truncation the log_prob method is only accurate to about six decimal places. In addition the provided sampler is a rough approximation that is only meant to be used in contexts where sample accuracy is not important (e.g. in initialization). Broadly, this implementation is only intended for usage in cases where good approximations of the log_prob are sufficient, as is the case e.g. in HMC.
Parameters: prototype (tensor) – A prototype tensor of arbitrary shape used to determine the dtype and device returned by sample and log_prob. References
 [1] ‘Bayesian inference for logistic models using PolyaGamma latent variables’
 Nicholas G. Polson, James G. Scott, Jesse Windle.

arg_constraints
= {}¶

has_rsample
= False¶

num_gamma_variates
= 8¶

num_log_prob_terms
= 7¶

support
= Interval(lower_bound=0.0, upper_bound=2.5)¶

truncation_point
= 2.5¶
Unit¶

class
Unit
(log_factor, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Trivial nonnormalized distribution representing the unit type.
The unit type has a single value with no data, i.e.
value.numel() == 0
.This is used for
pyro.factor()
statements.
arg_constraints
= {'log_factor': Real()}¶

support
= Real()¶

VonMises3D¶

class
VonMises3D
(concentration, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Spherical von Mises distribution.
This implementation combines the direction parameter and concentration parameter into a single combined parameter that contains both direction and magnitude. The
value
arg is represented in cartesian coordinates: it must be a normalized 3vector that lies on the 2sphere.See
VonMises
for a 2D polar coordinate cousin of this distribution. Seeprojected_normal
for a qualitatively similar distribution but implementing more functionality.Currently only
log_prob()
is implemented.Parameters: concentration (torch.Tensor) – A combined locationandconcentration vector. The direction of this vector is the location, and its magnitude is the concentration. 
arg_constraints
= {'concentration': Real()}¶

support
= Sphere¶

ZeroInflatedDistribution¶

class
ZeroInflatedDistribution
(base_dist, *, gate=None, gate_logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.torch_distribution.TorchDistribution
Generic Zero Inflated distribution.
This can be used directly or can be used as a base class as e.g. for
ZeroInflatedPoisson
andZeroInflatedNegativeBinomial
.Parameters:  base_dist (TorchDistribution) – the base distribution.
 gate (torch.Tensor) – probability of extra zeros given via a Bernoulli distribution.
 gate_logits (torch.Tensor) – logits of extra zeros given via a Bernoulli distribution.

arg_constraints
= {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real()}¶

support
¶
ZeroInflatedNegativeBinomial¶

class
ZeroInflatedNegativeBinomial
(total_count, *, probs=None, logits=None, gate=None, gate_logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.zero_inflated.ZeroInflatedDistribution
A Zero Inflated Negative Binomial distribution.
Parameters:  total_count (float or torch.Tensor) – nonnegative number of negative Bernoulli trials.
 probs (torch.Tensor) – Event probabilities of success in the half open interval [0, 1).
 logits (torch.Tensor) – Event logodds for probabilities of success.
 gate (torch.Tensor) – probability of extra zeros.
 gate_logits (torch.Tensor) – logits of extra zeros.

arg_constraints
= {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}¶

logits
¶

probs
¶

support
= IntegerGreaterThan(lower_bound=0)¶

total_count
¶
ZeroInflatedPoisson¶

class
ZeroInflatedPoisson
(rate, *, gate=None, gate_logits=None, validate_args=None)[source]¶ Bases:
pyro.distributions.zero_inflated.ZeroInflatedDistribution
A Zero Inflated Poisson distribution.
Parameters:  rate (torch.Tensor) – rate of poisson distribution.
 gate (torch.Tensor) – probability of extra zeros.
 gate_logits (torch.Tensor) – logits of extra zeros.

arg_constraints
= {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'rate': GreaterThan(lower_bound=0.0)}¶

rate
¶

support
= IntegerGreaterThan(lower_bound=0)¶
Transforms¶
ConditionalTransform¶
CholeskyTransform¶
CorrLCholeskyTransform¶

class
CorrLCholeskyTransform
(cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Transforms a vector into the cholesky factor of a correlation matrix.
The input should have shape [batch_shape] + [d * (d1)/2]. The output will have shape [batch_shape] + [d, d].
References:
[1] Cholesky Factors of Correlation Matrices. Stan Reference Manual v2.18, Section 10.12.

bijective
= True¶

codomain
= CorrCholesky()¶

domain
= IndependentConstraint(Real(), 1)¶

CorrMatrixCholeskyTransform¶
DiscreteCosineTransform¶

class
DiscreteCosineTransform
(dim=1, smooth=0.0, cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Discrete Cosine Transform of typeII.
This uses
dct()
andidct()
to compute orthonormal DCT and inverse DCT transforms. The jacobian is 1.Parameters:  dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right.
 smooth (float) – Smoothing parameter. When 0, this transforms white noise to white noise; when 1 this transforms Brownian noise to to white noise; when 1 this transforms violet noise to white noise; etc. Any real number is allowed. https://en.wikipedia.org/wiki/Colors_of_noise.

bijective
= True¶

codomain
¶

domain
¶
ELUTransform¶
HaarTransform¶

class
HaarTransform
(dim=1, flip=False, cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Discrete Haar transform.
This uses
haar_transform()
andinverse_haar_transform()
to compute (orthonormal) Haar and inverse Haar transforms. The jacobian is 1. For sequences with length T not a power of two, this implementation is equivalent to a blockstructured Haar transform in which block sizes decrease by factors of one half from left to right.Parameters: 
bijective
= True¶

codomain
¶

domain
¶

LeakyReLUTransform¶
LowerCholeskyAffine¶

class
LowerCholeskyAffine
(loc, scale_tril, cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
A bijection of the form,
\(\mathbf{y} = \mathbf{L} \mathbf{x} + \mathbf{r}\)where mathbf{L} is a lower triangular matrix and mathbf{r} is a vector.
Parameters:  loc (torch.tensor) – the fixed Ddimensional vector to shift the input by.
 scale_tril (torch.tensor) – the D x D lower triangular matrix used in the transformation.

bijective
= True¶

codomain
= IndependentConstraint(Real(), 1)¶

domain
= IndependentConstraint(Real(), 1)¶

log_abs_det_jacobian
(x, y)[source]¶ Calculates the elementwise determinant of the log Jacobian, i.e. log(abs(dy/dx)).

volume_preserving
= False¶
Normalize¶

class
Normalize
(p=2, cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Safely project a vector onto the sphere wrt the
p
norm. This avoids the singularity at zero by mapping to the vector[1, 0, 0, ..., 0]
.
bijective
= False¶

codomain
= Sphere¶

domain
= IndependentConstraint(Real(), 1)¶

OrderedTransform¶

class
OrderedTransform
(cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Transforms a real vector into an ordered vector.
Specifically, enforces monotonically increasing order on the last dimension of a given tensor via the transformation \(y_0 = x_0\), \(y_i = \sum_{1 \le j \le i} \exp(x_i)\)

bijective
= True¶

codomain
= OrderedVector()¶

domain
= IndependentConstraint(Real(), 1)¶

Permute¶

class
Permute
(permutation, *, dim=1, cache_size=1)[source]¶ Bases:
torch.distributions.transforms.Transform
A bijection that reorders the input dimensions, that is, multiplies the input by a permutation matrix. This is useful in between
AffineAutoregressive
transforms to increase the flexibility of the resulting distribution and stabilize learning. Whilst not being an autoregressive transform, the log absolute determinate of the Jacobian is easily calculable as 0. Note that reordering the input dimension between two layers ofAffineAutoregressive
is not equivalent to reordering the dimension inside the MADE networks that those IAFs use; using aPermute
transform results in a distribution with more flexibility.Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> from pyro.distributions.transforms import AffineAutoregressive, Permute >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> iaf1 = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> ff = Permute(torch.randperm(10, dtype=torch.long)) >>> iaf2 = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> flow_dist = dist.TransformedDistribution(base_dist, [iaf1, ff, iaf2]) >>> flow_dist.sample() # doctest: +SKIP
Parameters:  permutation (torch.LongTensor) – a permutation ordering that is applied to the inputs.
 dim (int) – the tensor dimension to permute. This value must be negative and defines the event dim as abs(dim).

bijective
= True¶

codomain
¶

domain
¶

log_abs_det_jacobian
(x, y)[source]¶ Calculates the elementwise determinant of the log Jacobian, i.e. log(abs([dy_0/dx_0, …, dy_{N1}/dx_{N1}])). Note that this type of transform is not autoregressive, so the log Jacobian is not the sum of the previous expression. However, it turns out it’s always 0 (since the determinant is 1 or +1), and so returning a vector of zeros works.

volume_preserving
= True¶
PositivePowerTransform¶

class
PositivePowerTransform
(exponent, *, cache_size=0, validate_args=None)[source]¶ Bases:
torch.distributions.transforms.Transform
Transform via the mapping \(y=\operatorname{sign}(x)x^{\text{exponent}}\).
Whereas
PowerTransform
allows arbitraryexponent
and restricts domain and codomain to postive values, this class restrictsexponent > 0
and allows real domain and codomain.Warning
The Jacobian is typically zero or infinite at the origin.

bijective
= True¶

codomain
= Real()¶

domain
= Real()¶

sign
= 1¶

SoftplusLowerCholeskyTransform¶

class
SoftplusLowerCholeskyTransform
(cache_size=0)[source]¶ Bases:
torch.distributions.transforms.Transform
Transform from unconstrained matrices to lowertriangular matrices with nonnegative diagonal entries. This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization.

codomain
= LowerCholesky()¶

domain
= IndependentConstraint(Real(), 2)¶

SoftplusTransform¶
TransformModules¶
AffineAutoregressive¶

class
AffineAutoregressive
(autoregressive_nn, log_scale_min_clip=5.0, log_scale_max_clip=3.0, sigmoid_bias=2.0, stable=False)[source]¶ Bases:
pyro.distributions.torch_transform.TransformModule
An implementation of the bijective transform of Inverse Autoregressive Flow (IAF), using by default Eq (10) from Kingma Et Al., 2016,
\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\), and \(\sigma_t>0\).
If the stable keyword argument is set to True then the transformation used is,
\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1\sigma_t)\odot\mu_t\)where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).
Together with
TransformedDistribution
this provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> pyro.module("my_transform", transform) # doctest: +SKIP >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample() # doctest: +SKIP
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution
. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution
. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitrary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.Parameters:  autoregressive_nn (callable) – an autoregressive neural network whose forward call returns a realvalued mean and logitscale as a tuple
 log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
 log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
 sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
 stable (bool) – When true, uses the alternative “stable” version of the transform (see above).
References:
[1] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling. Improving Variational Inference with Inverse Autoregressive Flow. [arXiv:1606.04934]
[2] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
[3] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. [arXiv:1502.03509]

autoregressive
= True¶

bijective
= True¶

codomain
= IndependentConstraint(Real(), 1)¶

domain
= IndependentConstraint(Real(), 1)¶

sign
= 1¶
AffineCoupling¶

class
AffineCoupling
(split_dim, hypernet, *, dim=1, log_scale_min_clip=5.0, log_scale_max_clip=3.0)[source]¶ Bases:
pyro.distributions.torch_transform.TransformModule
An implementation of the affine coupling layer of RealNVP (Dinh et al., 2017) that uses the bijective transform,
\(\mathbf{y}_{1:d} = \mathbf{x}_{1:d}\) \(\mathbf{y}_{(d+1):D} = \mu + \sigma\odot\mathbf{x}_{(d+1):D}\)where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, and \(\mu,\sigma\) are shift and translation parameters calculated as the output of a function inputting only \(\mathbf{x}_{1:d}\).
That is, the first \(d\) components remain unchanged, and the subsequent \(Dd\) are shifted and translated by a function of the previous components.
Together with
TransformedDistribution
this provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import DenseNN >>> input_dim = 10 >>> split_dim = 6 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dimsplit_dim, input_dimsplit_dim] >>> hypernet = DenseNN(split_dim, [10*input_dim], param_dims) >>> transform = AffineCoupling(split_dim, hypernet) >>> pyro.module("my_transform", transform) # doctest: +SKIP >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample() # doctest: +SKIP
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution
. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution
. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitary value is being scored, it will calculate it manually.This is an operation that scales as O(1), i.e. constant in the input dimension. So in general, it is cheap to sample and score (an arbitrary value) from
AffineCoupling
.Parameters:  split_dim (int) – Zeroindexed dimension \(d\) upon which to perform input/ output split for transformation.
 hypernet (callable) – a neural network whose forward call returns a realvalued mean and logitscale as a tuple. The input should have final dimension split_dim and the output final dimension input_dimsplit_dim for each member of the tuple.
 dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
 log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
 log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
References:
[1] Laurent Dinh, Jascha SohlDickstein, and Samy Bengio. Density estimation using Real NVP. ICLR 2017.

bijective
= True¶

codomain
¶

domain
¶
BatchNorm¶

class
BatchNorm
(input_dim, momentum=0.1, epsilon=1e05)[source]¶ Bases:
pyro.distributions.torch_transform.TransformModule
A type of batch normalization that can be used to stabilize training in normalizing flows. The inverse operation is defined as
\(x = (y  \hat{\mu}) \oslash \sqrt{\hat{\sigma^2}} \otimes \gamma + \beta\)that is, the standard batch norm equation, where \(x\) is the input, \(y\) is the output, \(\gamma,\beta\) are learnable parameters, and \(\hat{\mu}\)/\(\hat{\sigma^2}\) are smoothed running averages of the sample mean and variance, respectively. The constraint \(\gamma>0\) is enforced to ease calculation of the logdetJacobian term.
This is an elementwise transform, and when applied to a vector, learns two parameters (\(\gamma,\beta\)) for each dimension of the input.
When the module is set to training mode, the moving averages of the sample mean and variance are updated every time the inverse operator is called, e.g., when a normalizing flow scores a minibatch with the log_prob method.
Also, when the module is set to training mode, the sample mean and variance on the current minibatch are used in place of the smoothed averages, \(\hat{\mu}\) and \(\hat{\sigma^2}\), for the inverse operator. For this reason it is not the case that \(x=g(g^{1}(x))\) during training, i.e., that the inverse operation is the inverse of the forward one.
Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> from pyro.distributions.transforms import AffineAutoregressive >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> iafs = [AffineAutoregressive(AutoRegressiveNN(10, [40])) for _ in range(2)] >>> bn = BatchNorm(10) >>> flow_dist = dist.TransformedDistribution(base_dist, [iafs[0], bn, iafs[1]]) >>> flow_dist.sample() # doctest: +SKIP
Parameters: References:
[1] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03167
[2] Laurent Dinh, Jascha SohlDickstein, and Samy Bengio. Density Estimation using Real NVP. In International Conference on Learning Representations, 2017. https://arxiv.org/abs/1605.08803
[3] George Papamakarios, Theo Pavlakou, and Iain Murray. Masked Autoregressive Flow for Density Estimation. In Neural Information Processing Systems, 2017. https://arxiv.org/abs/1705.07057

bijective
= True¶

codomain
= Real()¶

constrained_gamma
¶

domain
= Real()¶

BlockAutoregressive¶

class
BlockAutoregressive
(input_dim, hidden_factors=[8, 8], activation='tanh', residual=None)[source]¶ Bases:
pyro.distributions.torch_transform.TransformModule
An implementation of Block Neural Autoregressive Flow (blockNAF) (De Cao et al., 2019) bijective transform. BlockNAF uses a similar transformation to deep dense NAF, building the autoregressive NN into the structure of the transform, in a sense.
Together with
TransformedDistribution
this provides a way to create richer variational approximations.Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> naf = BlockAutoregressive(input_dim=10) >>> pyro.module("my_naf", naf) # doctest: +SKIP >>> naf_dist = dist.TransformedDistribution(base_dist, [naf]) >>> naf_dist.sample() # doctest: +SKIP
The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method  a possibility for a future implementation.
Parameters:  input_dim (int) – The dimensionality of the input and output variables.
 hidden_factors (list) – Hidden layer i has hidden_factors[i] hidden units per input dimension. This corresponds to both \(a\) and \(b\) in De Cao et al. (2019). The elements of hidden_factors must be integers.
 activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
 residual (string) – Type of residual connections to use. Choices are “None”, “normal” for \(\mathbf{y}+f(\mathbf{y})\), and “gated” for \(\alpha\mathbf{y} + (1  \alpha\mathbf{y})\) for learnable parameter \(\alpha\).
References:
[1] Nicola De Cao, Ivan Titov, Wilker Aziz. Block Neural Autoregressive Flow. [arXiv:1904.04676]

autoregressive
= True¶

bijective
= True¶

codomain
= IndependentConstraint(Real(), 1)¶

domain
= IndependentConstraint(Real(), 1)¶
ConditionalAffineAutoregressive¶

class
ConditionalAffineAutoregressive
(autoregressive_nn, **kwargs)[source]¶ Bases:
pyro.distributions.conditional.ConditionalTransformModule
An implementation of the bijective transform of Inverse Autoregressive Flow (IAF) that conditions on an additional context variable and uses, by default, Eq (10) from Kingma Et Al., 2016,
\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\) and context \(\mathbf{z}\in\mathbb{R}^M\), and \(\sigma_t>0\).
If the stable keyword argument is set to True then the transformation used is,
\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1\sigma_t)\odot\mu_t\)where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).
Together with
ConditionalTransformedDistribution
this provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import ConditionalAutoRegressiveNN >>> input_dim = 10 >>> context_dim = 4 >>> batch_size = 3 >>> hidden_dims = [10*input_dim, 10*input_dim] >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> hypernet = ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims) >>> transform = ConditionalAffineAutoregressive(hypernet) >>> pyro.module("my_transform", transform) # doctest: +SKIP >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size])) # doctest: +SKIP
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution
. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution
. However, if the cached value isn’t available, either becaus