Source code for pyro.ops.dual_averaging

# Copyright (c) 2017-2019 Uber Technologies, Inc.
# SPDX-License-Identifier: Apache-2.0


[docs]class DualAveraging(object): """ Dual Averaging is a scheme to solve convex optimization problems. It belongs to a class of subgradient methods which uses subgradients to update parameters (in primal space) of a model. Under some conditions, the averages of generated parameters during the scheme are guaranteed to converge to an optimal value. However, a counter-intuitive aspect of traditional subgradient methods is "new subgradients enter the model with decreasing weights" (see :math:`[1]`). Dual Averaging scheme solves that phenomenon by updating parameters using weights equally for subgradients (which lie in a dual space), hence we have the name "dual averaging". This class implements a dual averaging scheme which is adapted for Markov chain Monte Carlo (MCMC) algorithms. To be more precise, we will replace subgradients by some statistics calculated during an MCMC trajectory. In addition, introducing some free parameters such as ``t0`` and ``kappa`` is helpful and still guarantees the convergence of the scheme. References [1] `Primal-dual subgradient methods for convex problems`, Yurii Nesterov [2] `The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo`, Matthew D. Hoffman, Andrew Gelman :param float prox_center: A "prox-center" parameter introduced in :math:`[1]` which pulls the primal sequence towards it. :param float t0: A free parameter introduced in :math:`[2]` that stabilizes the initial steps of the scheme. :param float kappa: A free parameter introduced in :math:`[2]` that controls the weights of steps of the scheme. For a small ``kappa``, the scheme will quickly forget states from early steps. This should be a number in :math:`(0.5, 1]`. :param float gamma: A free parameter which controls the speed of the convergence of the scheme. """ def __init__(self, prox_center=0, t0=10, kappa=0.75, gamma=0.05): self.prox_center = prox_center self.t0 = t0 self.kappa = kappa self.gamma = gamma self.reset()
[docs] def reset(self): self._x_avg = 0 # average of primal sequence self._g_avg = 0 # average of dual sequence self._t = 0
[docs] def step(self, g): """ Updates states of the scheme given a new statistic/subgradient ``g``. :param float g: A statistic calculated during an MCMC trajectory or subgradient. """ self._t += 1 # g_avg = (g_1 + ... + g_t) / t self._g_avg = (1 - 1/(self._t + self.t0)) * self._g_avg + g / (self._t + self.t0) # According to formula (3.4) of [1], we have # x_t = argmin{ g_avg . x + loc_t . |x - x0|^2 }, # where loc_t := beta_t / t, beta_t := (gamma/2) * sqrt(t) self._x_t = self.prox_center - (self._t ** 0.5) / self.gamma * self._g_avg # weight for the new x_t weight_t = self._t ** (-self.kappa) self._x_avg = (1 - weight_t) * self._x_avg + weight_t * self._x_t
[docs] def get_state(self): r""" Returns the latest :math:`x_t` and average of :math:`\left\{x_i\right\}_{i=1}^t` in primal space. """ return self._x_t, self._x_avg