Optimal Experiment Design

Tasks such as choosing the next question to ask in a psychology study, designing an election polling strategy, and deciding which compounds to synthesize and test in biological sciences are all fundamentally asking the same question: how do we design an experiment to maximize the information gathered? Pyro is designed to support automated optimal experiment design: specifying a model and guide is enough to obtain optimal designs for many different kinds of experiment scenarios. Check out our experimental design tutorials that use Pyro to design an adaptive psychology study that uses past data to select the next question, and design an election polling strategy that aims to give the strongest prediction about the eventual winner of the election.

Bayesian optimal experimental design (BOED) is a powerful methodology for tackling experimental design problems and is the framework adopted by Pyro. In the BOED framework, we begin with a Bayesian model with a likelihood \(p(y|\theta,d)\) and a prior \(p(\theta)\) on the target latent variables. In Pyro, any fully Bayesian model can be used in the BOED framework. The sample sites corresponding to experimental outcomes are the observation sites, those corresponding to latent variables of interest are the target sites. The design \(d\) is the argument to the model, and is not a random variable.

In the BOED framework, we choose the design that optimizes the expected information gain (EIG) on the targets \(\theta\) from running the experiment

\(\text{EIG}(d) = \mathbf{E}_{p(y|d)} [H[p(\theta)] − H[p(\theta|y, d)]]\) ,

where \(H[·]\) represents the entropy and \(p(\theta|y, d) \propto p(\theta)p(y|\theta, d)\) is the posterior we get from running the experiment with design \(d\) and observing \(y\). In other words, the optimal design is the one that, in expectation over possible future observations, most reduces posterior entropy over the target latent variables. If the predictive model is correct, this forms a design strategy that is (one-step) optimal from an information-theoretic viewpoint. For further details, see [1, 2].

The pyro.contrib.oed module provides tools to create optimal experimental designs for Pyro models. In particular, it provides estimators for the expected information gain (EIG).

To estimate the EIG for a particular design, we first set up our Pyro model. For example:

def model(design):

    # This line allows batching of designs, treating all batch dimensions as independent
    with pyro.plate_stack("plate_stack", design.shape):

        # We use a Normal prior for theta
        theta = pyro.sample("theta", dist.Normal(torch.tensor(0.0), torch.tensor(1.0)))

        # We use a simple logistic regression model for the likelihood
        logit_p = theta - design
        y = pyro.sample("y", dist.Bernoulli(logits=logit_p))

        return y

We then select an appropriate EIG estimator, such as:

eig = nmc_eig(model, design, observation_labels=["y"], target_labels=["theta"], N=2500, M=50)

It is possible to estimate the EIG across a grid of designs:

designs = torch.stack([design1, design2], dim=0)

to find the best design from a number of options.

[1] Chaloner, Kathryn, and Isabella Verdinelli. “Bayesian experimental design: A review.” Statistical Science (1995): 273-304.

[2] Foster, Adam, et al. “Variational Bayesian Optimal Experimental Design.” arXiv preprint arXiv:1903.05480 (2019).

Expected Information Gain

laplace_eig(model, design, observation_labels, target_labels, guide, loss, optim, num_steps, final_num_samples, y_dist=None, eig=True, **prior_entropy_kwargs)[source]

Estimates the expected information gain (EIG) by making repeated Laplace approximations to the posterior.

Parameters
  • model (function) – Pyro stochastic function taking design as only argument.

  • design (torch.Tensor) – Tensor of possible designs.

  • observation_labels (list) – labels of sample sites to be regarded as observables.

  • target_labels (list) – labels of sample sites to be regarded as latent variables of interest, i.e. the sites that we wish to gain information about.

  • guide (function) – Pyro stochastic function corresponding to model.

  • loss – a Pyro loss such as pyro.infer.Trace_ELBO().differentiable_loss.

  • optim – optimizer for the loss

  • num_steps (int) – Number of gradient steps to take per sampled pseudo-observation.

  • final_num_samples (int) – Number of y samples (pseudo-observations) to take.

  • y_dist – Distribution to sample y from- if None we use the Bayesian marginal distribution.

  • eig (bool) – Whether to compute the EIG or the average posterior entropy (APE). The EIG is given by EIG = prior entropy - APE. If True, the prior entropy will be estimated analytically, or by Monte Carlo as appropriate for the model. If False the APE is returned.

  • prior_entropy_kwargs (dict) – parameters for estimating the prior entropy: num_prior_samples indicating the number of samples for a MC estimate of prior entropy, and mean_field indicating if an analytic form for a mean-field prior should be tried.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor

vi_eig(model, design, observation_labels, target_labels, vi_parameters, is_parameters, y_dist=None, eig=True, **prior_entropy_kwargs)[source]

Deprecated since version 0.4.1: Use posterior_eig instead.

Estimates the expected information gain (EIG) using variational inference (VI).

The APE is defined as

\(APE(d)=E_{Y\sim p(y|\theta, d)}[H(p(\theta|Y, d))]\)

where \(H[p(x)]\) is the differential entropy. The APE is related to expected information gain (EIG) by the equation

\(EIG(d)=H[p(\theta)]-APE(d)\)

in particular, minimising the APE is equivalent to maximising EIG.

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • vi_parameters (dict) – Variational inference parameters which should include: optim: an instance of pyro.Optim, guide: a guide function compatible with model, num_steps: the number of VI steps to make, and loss: the loss function to use for VI

  • is_parameters (dict) – Importance sampling parameters for the marginal distribution of \(Y\). May include num_samples: the number of samples to draw from the marginal.

  • y_dist (pyro.distributions.Distribution) – (optional) the distribution assumed for the response variable \(Y\)

  • eig (bool) – Whether to compute the EIG or the average posterior entropy (APE). The EIG is given by EIG = prior entropy - APE. If True, the prior entropy will be estimated analytically, or by Monte Carlo as appropriate for the model. If False the APE is returned.

  • prior_entropy_kwargs (dict) – parameters for estimating the prior entropy: num_prior_samples indicating the number of samples for a MC estimate of prior entropy, and mean_field indicating if an analytic form for a mean-field prior should be tried.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor

nmc_eig(model, design, observation_labels, target_labels=None, N=100, M=10, M_prime=None, independent_priors=False)[source]

Nested Monte Carlo estimate of the expected information gain (EIG). The estimate is, when there are not any random effects,

\[\frac{1}{N}\sum_{n=1}^N \log p(y_n | \theta_n, d) - \frac{1}{N}\sum_{n=1}^N \log \left(\frac{1}{M}\sum_{m=1}^M p(y_n | \theta_m, d)\right)\]

where \(\theta_n, y_n \sim p(\theta, y | d)\) and \(\theta_m \sim p(\theta)\). The estimate in the presence of random effects is

\[\frac{1}{N}\sum_{n=1}^N \log \left(\frac{1}{M'}\sum_{m=1}^{M'} p(y_n | \theta_n, \widetilde{\theta}_{nm}, d)\right)- \frac{1}{N}\sum_{n=1}^N \log \left(\frac{1}{M}\sum_{m=1}^{M} p(y_n | \theta_m, \widetilde{\theta}_{m}, d)\right)\]

where \(\widetilde{\theta}\) are the random effects with \(\widetilde{\theta}_{nm} \sim p(\widetilde{\theta}|\theta=\theta_n)\) and \(\theta_m,\widetilde{\theta}_m \sim p(\theta,\widetilde{\theta})\). The latter form is used when M_prime != None.

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • N (int) – Number of outer expectation samples.

  • M (int) – Number of inner expectation samples for p(y|d).

  • M_prime (int) – Number of samples for p(y | theta, d) if required.

  • independent_priors (bool) – Only used when M_prime is not None. Indicates whether the prior distributions for the target variables and the nuisance variables are independent. In this case, it is not necessary to sample the targets conditional on the nuisance variables.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor

donsker_varadhan_eig(model, design, observation_labels, target_labels, num_samples, num_steps, T, optim, return_history=False, final_design=None, final_num_samples=None)[source]

Donsker-Varadhan estimate of the expected information gain (EIG).

The Donsker-Varadhan representation of EIG is

\[\sup_T E_{p(y, \theta | d)}[T(y, \theta)] - \log E_{p(y|d)p(\theta)}[\exp(T(\bar{y}, \bar{\theta}))]\]

where \(T\) is any (measurable) function.

This methods optimises the loss function over a pre-specified class of functions T.

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • num_samples (int) – Number of samples per iteration.

  • num_steps (int) – Number of optimization steps.

  • T (function or torch.nn.Module) – optimisable function T for use in the Donsker-Varadhan loss function.

  • optim (pyro.optim.Optim) – Optimiser to use.

  • return_history (bool) – If True, also returns a tensor giving the loss function at each step of the optimization.

  • final_design (torch.Tensor) – The final design tensor to evaluate at. If None, uses design.

  • final_num_samples (int) – The number of samples to use at the final evaluation, If None, uses `num_samples.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor or tuple

posterior_eig(model, design, observation_labels, target_labels, num_samples, num_steps, guide, optim, return_history=False, final_design=None, final_num_samples=None, eig=True, prior_entropy_kwargs={}, *args, **kwargs)[source]

Posterior estimate of expected information gain (EIG) computed from the average posterior entropy (APE) using \(EIG(d) = H[p(\theta)] - APE(d)\). See [1] for full details.

The posterior representation of APE is

\(\sup_{q}\ E_{p(y, \theta | d)}[\log q(\theta | y, d)]\)

where \(q\) is any distribution on \(\theta\).

This method optimises the loss over a given guide family representing \(q\).

[1] Foster, Adam, et al. “Variational Bayesian Optimal Experimental Design.” arXiv preprint arXiv:1903.05480 (2019).

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • num_samples (int) – Number of samples per iteration.

  • num_steps (int) – Number of optimization steps.

  • guide (function) – guide family for use in the (implicit) posterior estimation. The parameters of guide are optimised to maximise the posterior objective.

  • optim (pyro.optim.Optim) – Optimiser to use.

  • return_history (bool) – If True, also returns a tensor giving the loss function at each step of the optimization.

  • final_design (torch.Tensor) – The final design tensor to evaluate at. If None, uses design.

  • final_num_samples (int) – The number of samples to use at the final evaluation, If None, uses `num_samples.

  • eig (bool) – Whether to compute the EIG or the average posterior entropy (APE). The EIG is given by EIG = prior entropy - APE. If True, the prior entropy will be estimated analytically, or by Monte Carlo as appropriate for the model. If False the APE is returned.

  • prior_entropy_kwargs (dict) – parameters for estimating the prior entropy: num_prior_samples indicating the number of samples for a MC estimate of prior entropy, and mean_field indicating if an analytic form for a mean-field prior should be tried.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor or tuple

marginal_eig(model, design, observation_labels, target_labels, num_samples, num_steps, guide, optim, return_history=False, final_design=None, final_num_samples=None)[source]

Estimate EIG by estimating the marginal entropy \(p(y|d)\). See [1] for full details.

The marginal representation of EIG is

\(\inf_{q}\ E_{p(y, \theta | d)}\left[\log \frac{p(y | \theta, d)}{q(y | d)} \right]\)

where \(q\) is any distribution on \(y\). A variational family for \(q\) is specified in the guide.

Warning

This method does not estimate the correct quantity in the presence of random effects.

[1] Foster, Adam, et al. “Variational Bayesian Optimal Experimental Design.” arXiv preprint arXiv:1903.05480 (2019).

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • num_samples (int) – Number of samples per iteration.

  • num_steps (int) – Number of optimization steps.

  • guide (function) – guide family for use in the marginal estimation. The parameters of guide are optimised to maximise the log-likelihood objective.

  • optim (pyro.optim.Optim) – Optimiser to use.

  • return_history (bool) – If True, also returns a tensor giving the loss function at each step of the optimization.

  • final_design (torch.Tensor) – The final design tensor to evaluate at. If None, uses design.

  • final_num_samples (int) – The number of samples to use at the final evaluation, If None, uses `num_samples.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor or tuple

lfire_eig(model, design, observation_labels, target_labels, num_y_samples, num_theta_samples, num_steps, classifier, optim, return_history=False, final_design=None, final_num_samples=None)[source]

Estimates the EIG using the method of Likelihood-Free Inference by Ratio Estimation (LFIRE) as in [1]. LFIRE is run separately for several samples of \(\theta\).

[1] Kleinegesse, Steven, and Michael Gutmann. “Efficient Bayesian Experimental Design for Implicit Models.” arXiv preprint arXiv:1810.09912 (2018).

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • num_y_samples (int) – Number of samples to take in \(y\) for each \(\theta\).

  • num_steps (int) – Number of optimization steps.

  • classifier (function) – a Pytorch or Pyro classifier used to distinguish between samples of \(y\) under \(p(y|d)\) and samples under \(p(y|\theta,d)\) for some \(\theta\).

  • optim (pyro.optim.Optim) – Optimiser to use.

  • return_history (bool) – If True, also returns a tensor giving the loss function at each step of the optimization.

  • final_design (torch.Tensor) – The final design tensor to evaluate at. If None, uses design.

  • final_num_samples (int) – The number of samples to use at the final evaluation, If None, uses `num_samples.

Param

int num_theta_samples: Number of initial samples in \(\theta\) to take. The likelihood ratio is estimated by LFIRE for each sample.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor or tuple

vnmc_eig(model, design, observation_labels, target_labels, num_samples, num_steps, guide, optim, return_history=False, final_design=None, final_num_samples=None)[source]

Estimates the EIG using Variational Nested Monte Carlo (VNMC). The VNMC estimate [1] is

\[\frac{1}{N}\sum_{n=1}^N \left[ \log p(y_n | \theta_n, d) - \log \left(\frac{1}{M}\sum_{m=1}^M \frac{p(\theta_{mn})p(y_n | \theta_{mn}, d)} {q(\theta_{mn} | y_n)} \right) \right]\]

where \(q(\theta | y)\) is the learned variational posterior approximation and \(\theta_n, y_n \sim p(\theta, y | d)\), \(\theta_{mn} \sim q(\theta|y=y_n)\).

As \(N \to \infty\) this is an upper bound on EIG. We minimise this upper bound by stochastic gradient descent.

Warning

This method cannot be used in the presence of random effects.

[1] Foster, Adam, et al. “Variational Bayesian Optimal Experimental Design.” arXiv preprint arXiv:1903.05480 (2019).

Parameters
  • model (function) – A pyro model accepting design as only argument.

  • design (torch.Tensor) – Tensor representation of design

  • observation_labels (list) – A subset of the sample sites present in model. These sites are regarded as future observations and other sites are regarded as latent variables over which a posterior is to be inferred.

  • target_labels (list) – A subset of the sample sites over which the posterior entropy is to be measured.

  • num_samples (tuple) – Number of (\(N, M\)) samples per iteration.

  • num_steps (int) – Number of optimization steps.

  • guide (function) – guide family for use in the posterior estimation. The parameters of guide are optimised to minimise the VNMC upper bound.

  • optim (pyro.optim.Optim) – Optimiser to use.

  • return_history (bool) – If True, also returns a tensor giving the loss function at each step of the optimization.

  • final_design (torch.Tensor) – The final design tensor to evaluate at. If None, uses design.

  • final_num_samples (tuple) – The number of (\(N, M\)) samples to use at the final evaluation, If None, uses `num_samples.

Returns

EIG estimate, optionally includes full optimization history

Return type

torch.Tensor or tuple

Generalised Linear Mixed Models

Warning

This module will eventually be deprecated in favor of brmp

The pyro.contrib.oed.glmm module provides models and guides for generalised linear mixed models (GLMM). It also includes the Normal-inverse-gamma family.

To create a classical Bayesian linear model, use:

from pyro.contrib.oed.glmm import known_covariance_linear_model

# Note: coef is a p-vector, observation_sd is a scalar
# Here, p=1 (one feature)
model = known_covariance_linear_model(coef_mean=torch.tensor([0.]),
                                      coef_sd=torch.tensor([10.]),
                                      observation_sd=torch.tensor(2.))

# An n x p design tensor
# Here, n=2 (two observations)
design = torch.tensor(torch.tensor([[1.], [-1.]]))

model(design)

A non-linear link function may be introduced, for instance:

from pyro.contrib.oed.glmm import logistic_regression_model

# No observation_sd is needed for logistic models
model = logistic_regression_model(coef_mean=torch.tensor([0.]),
                                  coef_sd=torch.tensor([10.]))

Random effects may be incorporated as regular Bayesian regression coefficients. For random effects with a shared covariance matrix, see pyro.contrib.oed.glmm.lmer_model().