SVI¶

class
SVI
(model, guide, optim, loss, loss_and_grads=None, num_samples=10, num_steps=0, **kwargs)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Parameters:  model – the model (callable containing Pyro primitives)
 guide – the guide (callable containing Pyro primitives)
 optim (pyro.optim.PyroOptim) – a wrapper a for a PyTorch optimizer
 loss (pyro.infer.elbo.ELBO) – an instance of a subclass of
ELBO
. Pyro provides three builtin losses:Trace_ELBO
,TraceGraph_ELBO
, andTraceEnum_ELBO
. See theELBO
docs to learn how to implement a custom loss.  num_samples – the number of samples for Monte Carlo posterior approximation
 num_steps – the number of optimization steps to take in
run()
A unified interface for stochastic variational inference in Pyro. The most commonly used loss is
loss=Trace_ELBO()
. See the tutorial SVI Part I for a discussion.
ELBO¶

class
ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
object
ELBO
is the toplevel interface for stochastic variational inference via optimization of the evidence lower bound.Most users will not interact with this base class
ELBO
directly; instead they will create instances of derived classes:Trace_ELBO
,TraceGraph_ELBO
, orTraceEnum_ELBO
.Parameters:  num_particles – The number of particles/samples used to form the ELBO (gradient) estimators.
 max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is only required when enumerating over sample sites in parallel, e.g. if a site setsinfer={"enumerate": "parallel"}
. If omitted, ELBO may guess a valid value by running the (model,guide) pair once, however this guess may be incorrect if model or guide structure is dynamic.  vectorize_particles (bool) – Whether to vectorize the ELBO computation over num_particles. Defaults to False. This requires static structure in model and guide.
 strict_enumeration_warning (bool) – Whether to warn about possible
misuse of enumeration, i.e. that
pyro.infer.traceenum_elbo.TraceEnum_ELBO
is used iff there are enumerated sample sites.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer. When this is True, all
torch.jit.TracerWarning
will be ignored. Defaults to False.  jit_options (bool) – Optional dict of options to pass to
torch.jit.trace()
, e.g.{"optimize": False}
.  retain_graph (bool) – Whether to retain autograd graph during an SVI step. Defaults to None (False).
 tail_adaptive_beta (float) – Exponent beta with
1.0 <= beta < 0.0
for use with TraceTailAdaptive_ELBO.
References
[1] Automated Variational Inference in Probabilistic Programming David Wingate, Theo Weber
[2] Black Box Variational Inference, Rajesh Ranganath, Sean Gerrish, David M. Blei

class
Trace_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A trace implementation of ELBObased SVI. The estimator is constructed along the lines of references [1] and [2]. There are no restrictions on the dependency structure of the model or the guide. The gradient estimator includes partial RaoBlackwellization for reducing the variance of the estimator when nonreparameterizable random variables are present. The RaoBlackwellization is partial in that it only uses conditional independence information that is marked by
plate
contexts. For more finegrained RaoBlackwellization, seeTraceGraph_ELBO
.References
 [1] Automated Variational Inference in Probabilistic Programming,
 David Wingate, Theo Weber
 [2] Black Box Variational Inference,
 Rajesh Ranganath, Sean Gerrish, David M. Blei

loss
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Evaluates the ELBO with an estimator that uses num_particles many samples/particles.

class
JitTrace_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
Like
Trace_ELBO
but usespyro.ops.jit.compile()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceGraph_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A TraceGraph implementation of ELBObased SVI. The gradient estimator is constructed along the lines of reference [1] specialized to the case of the ELBO. It supports arbitrary dependency structure for the model and guide as well as baselines for nonreparameterizable random variables. Where possible, conditional dependency information as recorded in the
Trace
is used to reduce the variance of the gradient estimator. In particular two kinds of conditional dependency information are used to reduce variance: the sequential order of samples (z is sampled after y => y does not depend on z)
plate
generators
References
 [1] Gradient Estimation Using Stochastic Computation Graphs,
 John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
 [2] Neural Variational Inference and Learning in Belief Networks
 Andriy Mnih, Karol Gregor

loss
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Evaluates the ELBO with an estimator that uses num_particles many samples/particles.

loss_and_grads
(model, guide, *args, **kwargs)[source]¶ Returns: returns an estimate of the ELBO Return type: float Computes the ELBO as well as the surrogate ELBO that is used to form the gradient estimator. Performs backward on the latter. Num_particle many samples are used to form the estimators. If baselines are present, a baseline loss is also constructed and differentiated.

class
JitTraceGraph_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.tracegraph_elbo.TraceGraph_ELBO
Like
TraceGraph_ELBO
but usestorch.jit.trace()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
BackwardSampleMessenger
(enum_trace, guide_trace)[source]¶ Bases:
pyro.poutine.messenger.Messenger
Implements forward filtering / backward sampling for sampling from the joint posterior distribution

class
TraceEnum_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.elbo.ELBO
A trace implementation of ELBObased SVI that supports  exhaustive enumeration over discrete sample sites, and  local parallel sampling over any sample site.
To enumerate over a sample site in the
guide
, mark the site with eitherinfer={'enumerate': 'sequential'}
orinfer={'enumerate': 'parallel'}
. To configure all guide sites at once, useconfig_enumerate()
. To enumerate over a sample site in themodel
, mark the siteinfer={'enumerate': 'parallel'}
and ensure the site does not appear in theguide
.This assumes restricted dependency structure on the model and guide: variables outside of an
plate
can never depend on variables inside thatplate
.
loss
(model, guide, *args, **kwargs)[source]¶ Returns: an estimate of the ELBO Return type: float Estimates the ELBO using
num_particles
many samples (particles).

differentiable_loss
(model, guide, *args, **kwargs)[source]¶ Returns: a differentiable estimate of the ELBO Return type: torch.Tensor Raises: ValueError – if the ELBO is not differentiable (e.g. is identically zero) Estimates a differentiable ELBO using
num_particles
many samples (particles). The result should be infinitely differentiable (as long as underlying derivatives have been implemented).

loss_and_grads
(model, guide, *args, **kwargs)[source]¶ Returns: an estimate of the ELBO Return type: float Estimates the ELBO using
num_particles
many samples (particles). Performs backward on the ELBO of each particle.


class
JitTraceEnum_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.traceenum_elbo.TraceEnum_ELBO
Like
TraceEnum_ELBO
but usespyro.ops.jit.compile()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceMeanField_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
A trace implementation of ELBObased SVI. This is currently the only ELBO estimator in Pyro that uses analytic KL divergences when those are available.
In contrast to, e.g.,
TraceGraph_ELBO
andTrace_ELBO
this estimator places restrictions on the dependency structure of the model and guide. In particular it assumes that the guide has a meanfield structure, i.e. that it factorizes across the different latent variables present in the guide. It also assumes that all of the latent variables in the guide are reparameterized. This latter condition is satisfied for, e.g., the Normal distribution but is not satisfied for, e.g., the Categorical distribution.Warning
This estimator may give incorrect results if the meanfield condition is not satisfied.
Note for advanced users:
The mean field condition is a sufficient but not necessary condition for this estimator to be correct. The precise condition is that for every latent variable z in the guide, its parents in the model must not include any latent variables that are descendants of z in the guide. Here ‘parents in the model’ and ‘descendants in the guide’ is with respect to the corresponding (statistical) dependency structure. For example, this condition is always satisfied if the model and guide have identical dependency structures.

class
JitTraceMeanField_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_mean_field_elbo.TraceMeanField_ELBO
Like
TraceMeanField_ELBO
but usespyro.ops.jit.trace()
to compileloss_and_grads()
.This works only for a limited set of models:
 Models must have static structure.
 Models must not depend on any global data (except the param store).
 All model inputs that are tensors must be passed in via
*args
.  All model inputs that are not tensors must be passed in via
**kwargs
, and compilation will be triggered once per unique**kwargs
.

class
TraceTailAdaptive_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.trace_elbo.Trace_ELBO
Interface for Stochastic Variational Inference with an adaptive fdivergence as described in ref. [1]. Users should specify num_particles > 1 and vectorize_particles==True. The argument tail_adaptive_beta can be specified to modify how the adaptive fdivergence is constructed. See reference for details.
Note that this interface does not support computing the varational objective itself; rather it only supports computing gradients of the variational objective. Consequently, one might want to use another SVI interface (e.g. RenyiELBO) in order to monitor convergence.
Note that this interface only supports models in which all the latent variables are fully reparameterized. It also does not support data subsampling.
References [1] “Variational Inference with Tailadaptive fDivergence”, Dilin Wang, Hao Liu, Qiang Liu, NeurIPS 2018 https://papers.nips.cc/paper/7816variationalinferencewithtailadaptivefdivergence

class
RenyiELBO
(alpha=0, num_particles=2, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True)[source]¶ Bases:
pyro.infer.elbo.ELBO
An implementation of Renyi’s \(\alpha\)divergence variational inference following reference [1].
In order for the objective to be a strict lower bound, we require \(\alpha \ge 0\). Note, however, that according to reference [1], depending on the dataset \(\alpha < 0\) might give better results. In the special case \(\alpha = 0\), the objective function is that of the important weighted autoencoder derived in reference [2].
Note
Setting \(\alpha < 1\) gives a better bound than the usual ELBO. For \(\alpha = 1\), it is better to use
Trace_ELBO
class because it helps reduce variances of gradient estimations.Warning
Minibatch training is not supported yet.
Parameters:  alpha (float) – The order of \(\alpha\)divergence. Here \(\alpha \neq 1\). Default is 0.
 num_particles – The number of particles/samples used to form the objective (gradient) estimator. Default is 2.
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts. Default is infinity.  strict_enumeration_warning (bool) – Whether to warn about possible
misuse of enumeration, i.e. that
TraceEnum_ELBO
is used iff there are enumerated sample sites.
References:
 [1] Renyi Divergence Variational Inference,
 Yingzhen Li, Richard E. Turner
 [2] Importance Weighted Autoencoders,
 Yuri Burda, Roger Grosse, Ruslan Salakhutdinov
Importance¶

class
Importance
(model, guide=None, num_samples=None)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Parameters:  model – probabilistic model defined as a function
 guide – guide used for sampling defined as a function
 num_samples – number of samples to draw from the guide (default 10)
This method performs posterior inference by importance sampling using the guide as the proposal distribution. If no guide is provided, it defaults to proposing from the model’s prior.

psis_diagnostic
(*args, **kwargs)[source]¶ Computes the Pareto tail index k for a model/guide pair using the technique described in [1], which builds on previous work in [2]. If \(0 < k < 0.5\) the guide is a good approximation to the model posterior, in the sense described in [1]. If \(0.5 \le k \le 0.7\), the guide provides a suboptimal approximation to the posterior, but may still be useful in practice. If \(k > 0.7\) the guide program provides a poor approximation to the full posterior, and caution should be used when using the guide. Note, however, that a guide may be a poor fit to the full posterior while still yielding reasonable model predictions. If \(k < 0.0\) the importance weights corresponding to the model and guide appear to be bounded from above; this would be a bizarre outcome for a guide trained via ELBO maximization. Please see [1] for a more complete discussion of how the tail index k should be interpreted.
Please be advised that a large number of samples may be required for an accurate estimate of k.
Note that we assume that the model and guide are both vectorized and have static structure. As is canonical in Pyro, the args and kwargs are passed to the model and guide.
References [1] ‘Yes, but Did It Work?: Evaluating Variational Inference.’ Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman [2] ‘Pareto Smoothed Importance Sampling.’ Aki Vehtari, Andrew Gelman, Jonah Gabry
Parameters:  model (callable) – the model program.
 guide (callable) – the guide program.
 num_particles (int) – the total number of times we run the model and guide in order to compute the diagnostic. defaults to 1000.
 max_simultaneous_particles – the maximum number of simultaneous samples drawn from the model and guide. defaults to num_particles. num_particles must be divisible by max_simultaneous_particles. compute the diagnostic. defaults to 1000.
 max_plate_nesting (int) – optional bound on max number of nested
pyro.plate()
contexts in the model/guide. defaults to 7.
Returns float: the PSIS diagnostic k

vectorized_importance_weights
(model, guide, *args, **kwargs)[source]¶ Parameters:  model – probabilistic model defined as a function
 guide – guide used for sampling defined as a function
 num_samples – number of samples to draw from the guide (default 1)
 max_plate_nesting (int) – Bound on max number of nested
pyro.plate()
contexts.  normalized (bool) – set to True to return selfnormalized importance weights
Returns: returns a
(num_samples,)
shaped tensor of importance weights and the model and guide traces that produced themVectorized computation of importance weights for models with static structure:
log_weights, model_trace, guide_trace = \ vectorized_importance_weights(model, guide, *args, num_samples=1000, max_plate_nesting=4, normalized=False)
Discrete Inference¶

infer_discrete
(fn=None, first_available_dim=None, temperature=1)[source]¶ A poutine that samples discrete sites marked with
site["infer"]["enumerate"] = "parallel"
from the posterior, conditioned on observations.Example:
@infer_discrete(first_available_dim=1, temperature=0) @config_enumerate def viterbi_decoder(data, hidden_dim=10): transition = 0.3 / hidden_dim + 0.7 * torch.eye(hidden_dim) means = torch.arange(float(hidden_dim)) states = [0] for t in pyro.markov(range(len(data))): states.append(pyro.sample("states_{}".format(t), dist.Categorical(transition[states[1]]))) pyro.sample("obs_{}".format(t), dist.Normal(means[states[1]], 1.), obs=data[t]) return states # returns maximum likelihood states
Parameters:  fn – a stochastic function (callable containing Pyro primitive calls)
 first_available_dim (int) – The first tensor dimension (counting from the right) that is available for parallel enumeration. This dimension and all dimensions left may be used internally by Pyro. This should be a negative integer.
 temperature (int) – Either 1 (sample via forwardfilter backwardsample) or 0 (optimize via Viterbilike MAP inference). Defaults to 1 (sample).

class
TraceEnumSample_ELBO
(num_particles=1, max_plate_nesting=inf, max_iarange_nesting=None, vectorize_particles=False, strict_enumeration_warning=True, ignore_jit_warnings=False, jit_options=None, retain_graph=None, tail_adaptive_beta=1.0)[source]¶ Bases:
pyro.infer.traceenum_elbo.TraceEnum_ELBO
This extends
TraceEnum_ELBO
to make it cheaper to sample from discrete latent states during SVI.The following are equivalent but the first is cheaper, sharing work between the computations of
loss
andz
:# Version 1. elbo = TraceEnumSample_ELBO(max_plate_nesting=1) loss = elbo.loss(*args, **kwargs) z = elbo.sample_saved() # Version 2. elbo = TraceEnum_ELBO(max_plate_nesting=1) loss = elbo.loss(*args, **kwargs) guide_trace = poutine.trace(guide).get_trace(*args, **kwargs) z = infer_discrete(poutine.replay(model, guide_trace), first_available_dim=2)(*args, **kwargs)
Inference Utilities¶

class
EmpiricalMarginal
(trace_posterior, sites=None, validate_args=None)[source]¶ Bases:
pyro.distributions.empirical.Empirical
Marginal distribution over a single site (or multiple, provided they have the same shape) from the
TracePosterior
’s model.Note
If multiple sites are specified, they must have the same tensor shape. Samples from each site will be stacked and stored within a single tensor. See
Empirical
. To hold the marginal distribution of sites having different shapes, useMarginals
instead.Parameters:  trace_posterior (TracePosterior) – a
TracePosterior
instance representing a Monte Carlo posterior.  sites (list) – optional list of sites for which we need to generate the marginal distribution.
 trace_posterior (TracePosterior) – a

class
Marginals
(trace_posterior, sites=None, validate_args=None)[source]¶ Bases:
object
Holds the marginal distribution over one or more sites from the
TracePosterior
’s model. This is a convenience container class, which can be extended byTracePosterior
subclasses. e.g. for implementing diagnostics.Parameters:  trace_posterior (TracePosterior) – a TracePosterior instance representing a Monte Carlo posterior.
 sites (list) – optional list of sites for which we need to generate the marginal distribution.

empirical
¶ A dictionary of sites’ names and their corresponding
EmpiricalMarginal
distribution.Type: OrderedDict

support
(flatten=False)[source]¶ Gets support of this marginal distribution.
Parameters: flatten (bool) – A flag to decide if we want to flatten batch_shape when the marginal distribution is collected from the posterior with num_chains > 1
. Defaults to False.Returns: a dict with keys are sites’ names and values are sites’ supports. Return type: OrderedDict

class
TracePosterior
(num_chains=1)[source]¶ Bases:
object
Abstract TracePosterior object from which posterior inference algorithms inherit. When run, collects a bag of execution traces from the approximate posterior. This is designed to be used by other utility classes like EmpiricalMarginal, that need access to the collected execution traces.

information_criterion
(pointwise=False)[source]¶ Computes information criterion of the model. Currently, returns only “Widely Applicable/WatanabeAkaike Information Criterion” (WAIC) and the corresponding effective number of parameters.
Reference:
[1] Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC, Aki Vehtari, Andrew Gelman, and Jonah Gabry
Parameters: pointwise (bool) – a flag to decide if we want to get a vectorized WAIC or not. When pointwise=False
, returns the sum.Returns: a dictionary containing values of WAIC and its effective number of parameters. Return type: OrderedDict


class
TracePredictive
(model, posterior, num_samples, keep_sites=None)[source]¶ Bases:
pyro.infer.abstract_infer.TracePosterior
Generates and holds traces from the posterior predictive distribution, given model execution traces from the approximate posterior. This is achieved by constraining latent sites to randomly sampled parameter values from the model execution traces and running the model forward to generate traces with new response (“_RETURN”) sites. :param model: arbitrary Python callable containing Pyro primitives. :param TracePosterior posterior: trace posterior instance holding samples from the model’s approximate posterior. :param int num_samples: number of samples to generate. :param keep_sites: The sites which should be sampled from posterior distribution (default: all)