MCMC¶
MCMC¶

class
MCMC
(kernel, num_samples, warmup_steps=None, initial_params=None, num_chains=1, hook_fn=None, mp_context=None, disable_progbar=False, disable_validation=True, transforms=None)[source]¶ Bases:
object
Wrapper class for Markov Chain Monte Carlo algorithms. Specific MCMC algorithms are TraceKernel instances and need to be supplied as a
kernel
argument to the constructor.Note
The case of num_chains > 1 uses python multiprocessing to run parallel chains in multiple processes. This goes with the usual caveats around multiprocessing in python, e.g. the model used to initialize the
kernel
must be serializable via pickle, and the performance / constraints will be platform dependent (e.g. only the “spawn” context is available in Windows). This has also not been extensively tested on the Windows platform.Parameters:  kernel – An instance of the
TraceKernel
class, which when given an execution trace returns another sample trace from the target (posterior) distribution.  num_samples (int) – The number of samples that need to be generated, excluding the samples discarded during the warmup phase.
 warmup_steps (int) – Number of warmup iterations. The samples generated during the warmup phase are discarded. If not provided, default is half of num_samples.
 num_chains (int) – Number of MCMC chains to run in parallel. Depending on whether num_chains is 1 or more than 1, this class internally dispatches to either _UnarySampler or _MultiSampler.
 initial_params (dict) – dict containing initial tensors in unconstrained space to initiate the markov chain. The leading dimension’s size must match that of num_chains. If not specified, parameter values will be sampled from the prior.
 hook_fn – Python callable that takes in (kernel, samples, stage, i) as arguments. stage is either sample or warmup and i refers to the i’th sample for the given stage. This can be used to implement additional logging, or more generally, run arbitrary code per generated sample.
 mp_context (str) – Multiprocessing context to use when num_chains > 1. Only applicable for Python 3.5 and above. Use mp_context=”spawn” for CUDA.
 disable_progbar (bool) – Disable progress bar and diagnostics update.
 disable_validation (bool) – Disables distribution validation check. This is disabled by default, since divergent transitions will lead to exceptions. Switch to True for debugging purposes.
 transforms (dict) – dictionary that specifies a transform for a sample site with constrained support to unconstrained space.

diagnostics
()[source]¶ Gets some diagnostics statistics such as effective sample size, split GelmanRubin, or divergent transitions from the sampler.

get_samples
(num_samples=None, group_by_chain=False)[source]¶ Get samples from the MCMC run, potentially resampling with replacement.
Parameters: Returns: dictionary of samples keyed by site name.

summary
(prob=0.9)[source]¶ Prints a summary table displaying diagnostics of samples obtained from posterior. The diagnostics displayed are mean, standard deviation, median, the 90% Credibility Interval,
effective_sample_size()
,split_gelman_rubin()
.Parameters: prob (float) – the probability mass of samples within the credibility interval.
 kernel – An instance of the
HMC¶

class
HMC
(model=None, potential_fn=None, step_size=1, trajectory_length=None, num_steps=None, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8)[source]¶ Bases:
pyro.infer.mcmc.mcmc_kernel.MCMCKernel
Simple Hamiltonian Monte Carlo kernel, where
step_size
andnum_steps
need to be explicitly specified by the user.References
[1] MCMC Using Hamiltonian Dynamics, Radford M. Neal
Parameters:  model – Python callable containing Pyro primitives.
 potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.
 step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.
 trajectory_length (float) – Length of a MCMC trajectory. If not
specified, it will be set to
step_size x num_steps
. In casenum_steps
is not specified, it will be set to \(2\pi\).  num_steps (int) – The number of discrete steps over which to simulate
Hamiltonian dynamics. The state at the end of the trajectory is
returned as the proposal. This value is always equal to
int(trajectory_length / step_size)
.  adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warmup phase using Dual Averaging scheme.
 adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warmup phase using Welford scheme.
 full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  target_accept_prob (float) – Increasing this value will lead to a smaller step size, hence the sampling will be slower and more robust. Default to 0.8.
Note
Internally, the mass matrix will be ordered according to the order of the names of latent variables, not the order of their appearance in the model.
Example:
>>> true_coefs = torch.tensor([1., 2., 3.]) >>> data = torch.randn(2000, 3) >>> dim = 3 >>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(1)).sample() >>> >>> def model(data): ... coefs_mean = torch.zeros(dim) ... coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3))) ... y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(1)), obs=labels) ... return y >>> >>> hmc_kernel = HMC(model, step_size=0.0855, num_steps=4) >>> mcmc = MCMC(hmc_kernel, num_samples=500, warmup_steps=100) >>> mcmc.run(data) >>> mcmc.get_samples()['beta'].mean(0) # doctest: +SKIP tensor([ 0.9819, 1.9258, 2.9737])

initial_params
¶

inverse_mass_matrix
¶

num_steps
¶

step_size
¶
NUTS¶

class
NUTS
(model=None, potential_fn=None, step_size=1, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, use_multinomial_sampling=True, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8, max_tree_depth=10)[source]¶ Bases:
pyro.infer.mcmc.hmc.HMC
NoUTurn Sampler kernel, which provides an efficient and convenient way to run Hamiltonian Monte Carlo. The number of steps taken by the integrator is dynamically adjusted on each call to
sample
to ensure an optimal length for the Hamiltonian trajectory [1]. As such, the samples generated will typically have lower autocorrelation than those generated by theHMC
kernel. Optionally, the NUTS kernel also provides the ability to adapt step size during the warmup phase.Refer to the baseball example to see how to do Bayesian inference in Pyro using NUTS.
References
 [1] The NoUturn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo,
 Matthew D. Hoffman, and Andrew Gelman.
 [2] A Conceptual Introduction to Hamiltonian Monte Carlo,
 Michael Betancourt
 [3] Slice Sampling,
 Radford M. Neal
Parameters:  model – Python callable containing Pyro primitives.
 potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.
 step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.
 adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warmup phase using Dual Averaging scheme.
 adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warmup phase using Welford scheme.
 full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.
 use_multinomial_sampling (bool) – A flag to decide if we want to sample candidates along its trajectory using “multinomial sampling” or using “slice sampling”. Slice sampling is used in the original NUTS paper [1], while multinomial sampling is suggested in [2]. By default, this flag is set to True. If it is set to False, NUTS uses slice sampling.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  target_accept_prob (float) – Target acceptance probability of step size adaptation scheme. Increasing this value will lead to a smaller step size, so the sampling will be slower but more robust. Default to 0.8.
 max_tree_depth (int) – Max depth of the binary tree created during the doubling scheme of NUTS sampler. Default to 10.
Example:
>>> true_coefs = torch.tensor([1., 2., 3.]) >>> data = torch.randn(2000, 3) >>> dim = 3 >>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(1)).sample() >>> >>> def model(data): ... coefs_mean = torch.zeros(dim) ... coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3))) ... y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(1)), obs=labels) ... return y >>> >>> nuts_kernel = NUTS(model, adapt_step_size=True) >>> mcmc = MCMC(nuts_kernel, num_samples=500, warmup_steps=300) >>> mcmc.run(data) >>> mcmc.get_samples()['beta'].mean(0) # doctest: +SKIP tensor([ 0.9221, 1.9464, 2.9228])
Utilities¶

initialize_model
(model, model_args=(), model_kwargs={}, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, skip_jit_warnings=False, num_chains=1)[source]¶ Given a Python callable with Pyro primitives, generates the following modelspecific properties needed for inference using HMC/NUTS kernels:
 initial parameters to be sampled using a HMC kernel,
 a potential function whose input is a dict of parameters in unconstrained space,
 transforms to transform latent sites of model to unconstrained space,
 a prototype trace to be used in MCMC to consume traces from sampled parameters.
Parameters:  model – a Pyro model which contains Pyro primitives.
 model_args (tuple) – optional args taken by model.
 model_kwargs (dict) – optional kwargs taken by model.
 transforms (dict) – Optional dictionary that specifies a transform
for a sample site with constrained support to unconstrained space. The
transform should be invertible, and implement log_abs_det_jacobian.
If not specified and the model has sites with constrained support,
automatic transformations will be applied, as specified in
torch.distributions.constraint_registry
.  max_plate_nesting (int) – Optional bound on max number of nested
pyro.plate()
contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.  jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.
 jit_options (dict) – A dictionary contains optional arguments for
torch.jit.trace()
function.  ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT
tracer when
jit_compile=True
. Default is False.  num_chains (int) – Number of parallel chains. If num_chains > 1, the returned initial_params will be a list with num_chains elements.
Returns: a tuple of (initial_params, potential_fn, transforms, prototype_trace)

diagnostics
(samples, num_chains=1)[source]¶ Gets diagnostics statistics such as effective sample size and split GelmanRubin using the samples drawn from the posterior distribution.
Parameters: Returns: dictionary of diagnostic stats for each sample site.

predictive
(model, posterior_samples, *args, **kwargs)[source]¶ Run model by sampling latent parameters from posterior_samples, and return values at sample sites from the forward run. By default, only sites not contained in posterior_samples are returned. This can be modified by changing the return_sites keyword argument.
Warning
The interface for the predictive class is experimental, and might change in the future. e.g. a unified interface for predictive with SVI.
Parameters:  model – Python callable containing Pyro primitives.
 posterior_samples (dict) – dictionary of samples from the posterior.
 args – model arguments.
 kwargs – model kwargs; and other keyword arguments (see below).
Keyword Arguments:  num_samples (
int
)  number of samples to draw from the predictive distribution. This argument has no effect ifposterior_samples
is nonempty, in which case, the leading dimension size of samples inposterior_samples
is used.  return_sites (
list
)  sites to return; by default only sample sites not present in posterior_samples are returned.  return_trace (
bool
)  whether to return the full trace. Note that this is vectorized over num_samples.  parallel (
bool
)  predict in parallel by wrapping the existing model in an outermost plate messenger. Note that this requires that the model has all batch dims correctly annotated viaplate
. Default is False.
Returns: dict of samples from the predictive distribution, or a single vectorized trace (if return_trace=True).