MCMC

MCMC

class MCMC(kernel, num_samples, warmup_steps=None, initial_params=None, num_chains=1, hook_fn=None, mp_context=None, disable_progbar=False, disable_validation=True, transforms=None, save_params=None)[source]

Bases: pyro.infer.mcmc.api.AbstractMCMC

Wrapper class for Markov Chain Monte Carlo algorithms. Specific MCMC algorithms are TraceKernel instances and need to be supplied as a kernel argument to the constructor.

Note

The case of num_chains > 1 uses python multiprocessing to run parallel chains in multiple processes. This goes with the usual caveats around multiprocessing in python, e.g. the model used to initialize the kernel must be serializable via pickle, and the performance / constraints will be platform dependent (e.g. only the “spawn” context is available in Windows). This has also not been extensively tested on the Windows platform.

Parameters
  • kernel – An instance of the TraceKernel class, which when given an execution trace returns another sample trace from the target (posterior) distribution.

  • num_samples (int) – The number of samples that need to be generated, excluding the samples discarded during the warmup phase.

  • warmup_steps (int) – Number of warmup iterations. The samples generated during the warmup phase are discarded. If not provided, default is is the same as num_samples.

  • num_chains (int) – Number of MCMC chains to run in parallel. Depending on whether num_chains is 1 or more than 1, this class internally dispatches to either _UnarySampler or _MultiSampler.

  • initial_params (dict) – dict containing initial tensors in unconstrained space to initiate the markov chain. The leading dimension’s size must match that of num_chains. If not specified, parameter values will be sampled from the prior.

  • hook_fn – Python callable that takes in (kernel, samples, stage, i) as arguments. stage is either sample or warmup and i refers to the i’th sample for the given stage. This can be used to implement additional logging, or more generally, run arbitrary code per generated sample.

  • mp_context (str) – Multiprocessing context to use when num_chains > 1. Only applicable for Python 3.5 and above. Use mp_context=”spawn” for CUDA.

  • disable_progbar (bool) – Disable progress bar and diagnostics update.

  • disable_validation (bool) – Disables distribution validation check. Defaults to True, disabling validation, since divergent transitions will lead to exceptions. Switch to False to enable validation, or to None to preserve existing global values.

  • transforms (dict) – dictionary that specifies a transform for a sample site with constrained support to unconstrained space.

  • save_params (List[str]) – Optional list of a subset of parameter names to save during sampling and diagnostics. This is useful in models with large nuisance variables. Defaults to None, saving all params.

diagnostics()[source]

Gets some diagnostics statistics such as effective sample size, split Gelman-Rubin, or divergent transitions from the sampler.

get_samples(num_samples=None, group_by_chain=False)[source]

Get samples from the MCMC run, potentially resampling with replacement.

For parameter details see: select_samples.

run(*args, **kwargs)[source]

Run MCMC to generate samples and populate self._samples.

Example usage:

def model(data):
    ...

nuts_kernel = NUTS(model)
mcmc = MCMC(nuts_kernel, num_samples=500)
mcmc.run(data)
samples = mcmc.get_samples()
Parameters
summary(prob=0.9)[source]

Prints a summary table displaying diagnostics of samples obtained from posterior. The diagnostics displayed are mean, standard deviation, median, the 90% Credibility Interval, effective_sample_size(), split_gelman_rubin().

Parameters

prob (float) – the probability mass of samples within the credibility interval.

StreamingMCMC

class StreamingMCMC(kernel, num_samples, warmup_steps=None, initial_params=None, statistics=None, num_chains=1, hook_fn=None, disable_progbar=False, disable_validation=True, transforms=None, save_params=None)[source]

Bases: pyro.infer.mcmc.api.AbstractMCMC

MCMC that computes required statistics in a streaming fashion. For this class no samples are retained but only aggregated statistics. This is useful for running memory expensive models where we care only about specific statistics (especially useful in a memory constrained environments like GPU).

For available streaming ops please see streaming.

diagnostics()[source]

Gets diagnostics. Currently a split Gelman-Rubin is only supported and requires ‘mean’ and ‘variance’ streaming statistics to be present.

get_statistics(group_by_chain=True)[source]

Returns a dict of statistics defined by those passed to the class constructor.

Parameters

group_by_chain (bool) – Whether statistics should be chain-wise or merged together.

run(*args, **kwargs)[source]

Run StreamingMCMC to compute required self._statistics.

MCMCKernel

class MCMCKernel[source]

Bases: object

cleanup()[source]

Optional method to clean up any residual state on termination.

diagnostics()[source]

Returns a dict of useful diagnostics after finishing sampling process.

end_warmup()[source]

Optional method to tell kernel that warm-up phase has been finished.

property initial_params

Returns a dict of initial params (by default, from the prior) to initiate the MCMC run.

Returns

dict of parameter values keyed by their name.

logging()[source]

Relevant logging information to be printed at regular intervals of the MCMC run. Returns None by default.

Returns

String containing the diagnostic summary. e.g. acceptance rate

Return type

string

abstract sample(params)[source]

Samples parameters from the posterior distribution, when given existing parameters.

Parameters
  • params (dict) – Current parameter values.

  • time_step (int) – Current time step.

Returns

New parameters from the posterior distribution.

setup(warmup_steps, *args, **kwargs)[source]

Optional method to set up any state required at the start of the simulation run.

Parameters
  • warmup_steps (int) – Number of warmup iterations.

  • *args – Algorithm specific positional arguments.

  • **kwargs – Algorithm specific keyword arguments.

HMC

class HMC(model=None, potential_fn=None, step_size=1, trajectory_length=None, num_steps=None, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8, init_strategy=<function init_to_uniform>, *, min_stepsize: float = 1e-10, max_stepsize: float = 10000000000.0)[source]

Bases: pyro.infer.mcmc.mcmc_kernel.MCMCKernel

Simple Hamiltonian Monte Carlo kernel, where step_size and num_steps need to be explicitly specified by the user.

References

[1] MCMC Using Hamiltonian Dynamics, Radford M. Neal

Parameters
  • model – Python callable containing Pyro primitives.

  • potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.

  • step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.

  • trajectory_length (float) – Length of a MCMC trajectory. If not specified, it will be set to step_size x num_steps. In case num_steps is not specified, it will be set to \(2\pi\).

  • num_steps (int) – The number of discrete steps over which to simulate Hamiltonian dynamics. The state at the end of the trajectory is returned as the proposal. This value is always equal to int(trajectory_length / step_size).

  • adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warm-up phase using Dual Averaging scheme.

  • adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warm-up phase using Welford scheme.

  • full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.

  • transforms (dict) – Optional dictionary that specifies a transform for a sample site with constrained support to unconstrained space. The transform should be invertible, and implement log_abs_det_jacobian. If not specified and the model has sites with constrained support, automatic transformations will be applied, as specified in torch.distributions.constraint_registry.

  • max_plate_nesting (int) – Optional bound on max number of nested pyro.plate() contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.

  • jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.

  • jit_options (dict) – A dictionary contains optional arguments for torch.jit.trace() function.

  • ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT tracer when jit_compile=True. Default is False.

  • target_accept_prob (float) – Increasing this value will lead to a smaller step size, hence the sampling will be slower and more robust. Default to 0.8.

  • init_strategy (callable) – A per-site initialization function. See Initialization section for available functions.

  • min_stepsize (float) – Lower bound on stepsize in adaptation strategy.

  • max_stepsize (float) – Upper bound on stepsize in adaptation strategy.

Note

Internally, the mass matrix will be ordered according to the order of the names of latent variables, not the order of their appearance in the model.

Example

>>> true_coefs = torch.tensor([1., 2., 3.])
>>> data = torch.randn(2000, 3)
>>> dim = 3
>>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample()
>>>
>>> def model(data):
...     coefs_mean = torch.zeros(dim)
...     coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3)))
...     y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(-1)), obs=labels)
...     return y
>>>
>>> hmc_kernel = HMC(model, step_size=0.0855, num_steps=4)
>>> mcmc = MCMC(hmc_kernel, num_samples=500, warmup_steps=100)
>>> mcmc.run(data)
>>> mcmc.get_samples()['beta'].mean(0)  
tensor([ 0.9819,  1.9258,  2.9737])
cleanup()[source]
clear_cache()[source]
diagnostics()[source]
property initial_params
property inverse_mass_matrix
logging()[source]
property mass_matrix_adapter
property num_steps
sample(params)[source]
setup(warmup_steps, *args, **kwargs)[source]
property step_size

NUTS

class NUTS(model=None, potential_fn=None, step_size=1, adapt_step_size=True, adapt_mass_matrix=True, full_mass=False, use_multinomial_sampling=True, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, ignore_jit_warnings=False, target_accept_prob=0.8, max_tree_depth=10, init_strategy=<function init_to_uniform>)[source]

Bases: pyro.infer.mcmc.hmc.HMC

No-U-Turn Sampler kernel, which provides an efficient and convenient way to run Hamiltonian Monte Carlo. The number of steps taken by the integrator is dynamically adjusted on each call to sample to ensure an optimal length for the Hamiltonian trajectory [1]. As such, the samples generated will typically have lower autocorrelation than those generated by the HMC kernel. Optionally, the NUTS kernel also provides the ability to adapt step size during the warmup phase.

Refer to the baseball example to see how to do Bayesian inference in Pyro using NUTS.

References

[1] The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo,

Matthew D. Hoffman, and Andrew Gelman.

[2] A Conceptual Introduction to Hamiltonian Monte Carlo,

Michael Betancourt

[3] Slice Sampling,

Radford M. Neal

Parameters
  • model – Python callable containing Pyro primitives.

  • potential_fn – Python callable calculating potential energy with input is a dict of real support parameters.

  • step_size (float) – Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics. If not specified, it will be set to 1.

  • adapt_step_size (bool) – A flag to decide if we want to adapt step_size during warm-up phase using Dual Averaging scheme.

  • adapt_mass_matrix (bool) – A flag to decide if we want to adapt mass matrix during warm-up phase using Welford scheme.

  • full_mass (bool) – A flag to decide if mass matrix is dense or diagonal.

  • use_multinomial_sampling (bool) – A flag to decide if we want to sample candidates along its trajectory using “multinomial sampling” or using “slice sampling”. Slice sampling is used in the original NUTS paper [1], while multinomial sampling is suggested in [2]. By default, this flag is set to True. If it is set to False, NUTS uses slice sampling.

  • transforms (dict) – Optional dictionary that specifies a transform for a sample site with constrained support to unconstrained space. The transform should be invertible, and implement log_abs_det_jacobian. If not specified and the model has sites with constrained support, automatic transformations will be applied, as specified in torch.distributions.constraint_registry.

  • max_plate_nesting (int) – Optional bound on max number of nested pyro.plate() contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.

  • jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.

  • jit_options (dict) – A dictionary contains optional arguments for torch.jit.trace() function.

  • ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT tracer when jit_compile=True. Default is False.

  • target_accept_prob (float) – Target acceptance probability of step size adaptation scheme. Increasing this value will lead to a smaller step size, so the sampling will be slower but more robust. Default to 0.8.

  • max_tree_depth (int) – Max depth of the binary tree created during the doubling scheme of NUTS sampler. Default to 10.

  • init_strategy (callable) – A per-site initialization function. See Initialization section for available functions.

Example

>>> true_coefs = torch.tensor([1., 2., 3.])
>>> data = torch.randn(2000, 3)
>>> dim = 3
>>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample()
>>>
>>> def model(data):
...     coefs_mean = torch.zeros(dim)
...     coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3)))
...     y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(-1)), obs=labels)
...     return y
>>>
>>> nuts_kernel = NUTS(model, adapt_step_size=True)
>>> mcmc = MCMC(nuts_kernel, num_samples=500, warmup_steps=300)
>>> mcmc.run(data)
>>> mcmc.get_samples()['beta'].mean(0)  
tensor([ 0.9221,  1.9464,  2.9228])
sample(params)[source]

RandomWalkKernel

class RandomWalkKernel(model, init_step_size: float = 0.1, target_accept_prob: float = 0.234)[source]

Bases: pyro.infer.mcmc.mcmc_kernel.MCMCKernel

Simple gradient-free kernel that utilizes an isotropic gaussian random walk in the unconstrained latent space of the model. The step size that controls the variance of the kernel is adapted during warm-up with a simple adaptation scheme that targets a user-provided acceptance probability.

Parameters
  • model – Python callable containing Pyro primitives.

  • init_step_size (float) – A positive float that controls the initial step size. Defaults to 0.1.

  • target_accept_prob (float) – The target acceptance probability used during adaptation of the step size. Defaults to 0.234.

Example

>>> true_coefs = torch.tensor([1., 2., 3.])
>>> data = torch.randn(2000, 3)
>>> dim = 3
>>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample()
>>>
>>> def model(data):
...     coefs_mean = torch.zeros(dim)
...     coefs = pyro.sample('beta', dist.Normal(coefs_mean, torch.ones(3)))
...     y = pyro.sample('y', dist.Bernoulli(logits=(coefs * data).sum(-1)), obs=labels)
...     return y
>>>
>>> hmc_kernel = RandomWalkKernel(model, init_step_size=0.2)
>>> mcmc = MCMC(hmc_kernel, num_samples=200, warmup_steps=100)
>>> mcmc.run(data)
>>> mcmc.get_samples()['beta'].mean(0)  
tensor([ 0.9819,  1.9258,  2.9737])
diagnostics()[source]
property initial_params
logging()[source]
sample(params)[source]
setup(warmup_steps, *args, **kwargs)[source]

BlockMassMatrix

class BlockMassMatrix(init_scale=1.0)[source]

Bases: object

EXPERIMENTAL This class is used to adapt (inverse) mass matrix and provide useful methods to calculate algebraic terms which involves the mass matrix.

The mass matrix will have block structure, which can be specified by using the method configure() with the corresponding structured mass_matrix_shape arg.

Parameters

init_scale (float) – initial scale to construct the initial mass matrix.

configure(mass_matrix_shape, adapt_mass_matrix=True, options={})[source]

Sets up an initial mass matrix.

Parameters
  • mass_matrix_shape (dict) – a dict that maps tuples of site names to the shape of the corresponding mass matrix. Each tuple of site names corresponds to a block.

  • adapt_mass_matrix (bool) – a flag to decide whether an adaptation scheme will be used.

  • options (dict) – tensor options to construct the initial mass matrix.

end_adaptation()[source]

Updates the current mass matrix using the adaptation scheme.

property inverse_mass_matrix
kinetic_grad(r)[source]

Computes the gradient of kinetic energy w.r.t. the momentum r. It is equivalent to compute velocity given the momentum r.

Parameters

r (dict) – a dictionary maps site names to a tensor momentum.

Returns

a dictionary maps site names to the corresponding gradient

property mass_matrix_size

A dict that maps site names to the size of the corresponding mass matrix.

scale(r_unscaled, r_prototype)[source]

Computes M^{1/2} @ r_unscaled.

Note that r is generated from a gaussian with scale mass_matrix_sqrt. This method will scale it.

Parameters
  • r_unscaled (dict) – a dictionary maps site names to a tensor momentum.

  • r_prototype (dict) – a dictionary mapes site names to prototype momentum. Those prototype values are used to get shapes of the scaled version.

Returns

a dictionary maps site names to the corresponding tensor

unscale(r)[source]

Computes inv(M^{1/2}) @ r.

Note that r is generated from a gaussian with scale mass_matrix_sqrt. This method will unscale it.

Parameters

r (dict) – a dictionary maps site names to a tensor momentum.

Returns

a dictionary maps site names to the corresponding tensor

update(z, z_grad)[source]

Updates the adaptation scheme using the new sample z or its grad z_grad.

Parameters
  • z (dict) – the current value.

  • z_grad (dict) – grad of the current value.

Utilities

initialize_model(model, model_args=(), model_kwargs={}, transforms=None, max_plate_nesting=None, jit_compile=False, jit_options=None, skip_jit_warnings=False, num_chains=1, init_strategy=<function init_to_uniform>, initial_params=None)[source]

Given a Python callable with Pyro primitives, generates the following model-specific properties needed for inference using HMC/NUTS kernels:

  • initial parameters to be sampled using a HMC kernel,

  • a potential function whose input is a dict of parameters in unconstrained space,

  • transforms to transform latent sites of model to unconstrained space,

  • a prototype trace to be used in MCMC to consume traces from sampled parameters.

Parameters
  • model – a Pyro model which contains Pyro primitives.

  • model_args (tuple) – optional args taken by model.

  • model_kwargs (dict) – optional kwargs taken by model.

  • transforms (dict) – Optional dictionary that specifies a transform for a sample site with constrained support to unconstrained space. The transform should be invertible, and implement log_abs_det_jacobian. If not specified and the model has sites with constrained support, automatic transformations will be applied, as specified in torch.distributions.constraint_registry.

  • max_plate_nesting (int) – Optional bound on max number of nested pyro.plate() contexts. This is required if model contains discrete sample sites that can be enumerated over in parallel.

  • jit_compile (bool) – Optional parameter denoting whether to use the PyTorch JIT to trace the log density computation, and use this optimized executable trace in the integrator.

  • jit_options (dict) – A dictionary contains optional arguments for torch.jit.trace() function.

  • ignore_jit_warnings (bool) – Flag to ignore warnings from the JIT tracer when jit_compile=True. Default is False.

  • num_chains (int) – Number of parallel chains. If num_chains > 1, the returned initial_params will be a list with num_chains elements.

  • init_strategy (callable) – A per-site initialization function. See Initialization section for available functions.

  • initial_params (dict) – dict containing initial tensors in unconstrained space to initiate the markov chain.

Returns

a tuple of (initial_params, potential_fn, transforms, prototype_trace)

diagnostics(samples, group_by_chain=True)[source]

Gets diagnostics statistics such as effective sample size and split Gelman-Rubin using the samples drawn from the posterior distribution.

Parameters
  • samples (dict) – dictionary of samples keyed by site name.

  • group_by_chain (bool) – If True, each variable in samples will be treated as having shape num_chains x num_samples x sample_shape. Otherwise, the corresponding shape will be num_samples x sample_shape (i.e. without chain dimension).

Returns

dictionary of diagnostic stats for each sample site.

select_samples(samples, num_samples=None, group_by_chain=False)[source]

Performs selection from given MCMC samples.

Parameters
  • samples (dictionary) – Samples object to sample from.

  • num_samples (int) – Number of samples to return. If None, all the samples from an MCMC chain are returned in their original ordering.

  • group_by_chain (bool) – Whether to preserve the chain dimension. If True, all samples will have num_chains as the size of their leading dimension.

Returns

dictionary of samples keyed by site name.