Miscellaneous Ops¶
The pyro.ops
module implements highlevel utilities
that are mostly independent of the rest of Pyro.

class
DualAveraging
(prox_center=0, t0=10, kappa=0.75, gamma=0.05)[source]¶ Bases:
object
Dual Averaging is a scheme to solve convex optimization problems. It belongs to a class of subgradient methods which uses subgradients to update parameters (in primal space) of a model. Under some conditions, the averages of generated parameters during the scheme are guaranteed to converge to an optimal value. However, a counterintuitive aspect of traditional subgradient methods is “new subgradients enter the model with decreasing weights” (see \([1]\)). Dual Averaging scheme solves that phenomenon by updating parameters using weights equally for subgradients (which lie in a dual space), hence we have the name “dual averaging”.
This class implements a dual averaging scheme which is adapted for Markov chain Monte Carlo (MCMC) algorithms. To be more precise, we will replace subgradients by some statistics calculated during an MCMC trajectory. In addition, introducing some free parameters such as
t0
andkappa
is helpful and still guarantees the convergence of the scheme.References
[1] Primaldual subgradient methods for convex problems, Yurii Nesterov
[2] The NoUturn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo, Matthew D. Hoffman, Andrew Gelman
Parameters:  prox_center (float) – A “proxcenter” parameter introduced in \([1]\) which pulls the primal sequence towards it.
 t0 (float) – A free parameter introduced in \([2]\) that stabilizes the initial steps of the scheme.
 kappa (float) – A free parameter introduced in \([2]\)
that controls the weights of steps of the scheme.
For a small
kappa
, the scheme will quickly forget states from early steps. This should be a number in \((0.5, 1]\).  gamma (float) – A free parameter which controls the speed of the convergence of the scheme.

velocity_verlet
(z, r, potential_fn, step_size, num_steps=1)[source]¶ Second order symplectic integrator that uses the velocity verlet algorithm.
Parameters:  z (dict) – dictionary of sample site names and their current values
(type
Tensor
).  r (dict) – dictionary of sample site names and corresponding momenta
(type
Tensor
).  potential_fn (callable) – function that returns potential energy given z
for each sample site. The negative gradient of the function with respect
to
z
determines the rate of change of the corresponding sites’ momentar
.  step_size (float) – step size for each time step iteration.
 num_steps (int) – number of discrete time steps over which to integrate.
Return tuple (z_next, r_next): final position and momenta, having same types as (z, r).
 z (dict) – dictionary of sample site names and their current values
(type

single_step_velocity_verlet
(z, r, potential_fn, step_size, z_grads=None)[source]¶ A special case of
velocity_verlet
integrator wherenum_steps=1
. It is particular helpful for NUTS kernel.Parameters: z_grads (torch.Tensor) – optional gradients of potential energy at current z
.Return tuple (z_next, r_next, z_grads, potential_energy): next position and momenta, together with the potential energy and its gradient w.r.t. z_next
.

newton_step_2d
(loss, x, trust_radius=None)[source]¶ Performs a Newton update step to minimize loss on a batch of 2dimensional variables, optionally regularizing to constrain to a trust region.
loss
must be twicedifferentiable as a function ofx
. Ifloss
is2+d
times differentiable, then the return value of this function isd
times differentiable.When
loss
is interpreted as a negative log probability density, then the return value of this function can be used to construct a Laplace approximationMultivariateNormal(mode,cov)
.Warning
Take care to detach the result of this function when used in an optimization loop. If you forget to detach the result of this function during optimization, then backprop will propagate through the entire iteration process, and worse will compute two extra derivatives for each step.
Example use inside a loop:
x = torch.zeros(1000, 2) # arbitrary initial value for step in range(100): x = x.detach() # block gradients through previous steps x.requires_grad = True # ensure loss is differentiable wrt x loss = my_loss_function(x) x = newton_step_2d(loss, x, trust_radius=1.0) # the final x is still differentiable
Parameters:  loss (torch.Tensor) – A scalar function of
x
to be minimized.  x (torch.Tensor) – A dependent variable with rightmost size of 2.
 trust_radius (float) – An optional trust region trust_radius. The
updated value
mode
of this function will be withintrust_radius
of the inputx
.
Returns: A pair
(mode, cov)
wheremode
is an updated tensor of the same shape as the original valuex
, andcov
is an esitmate of the covariance 2x2 matrix withcov.shape == x.shape[:1] + (2,2)
.Return type:  loss (torch.Tensor) – A scalar function of