Optimizers

class geoopt.optim.RiemannianAdam(*args, stabilize=None, **kwargs)[source]

Riemannian Adam with the same API as torch.optim.Adam.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float (optional)) – learning rate (default: 1e-3)
  • betas (Tuple[float, float] (optional)) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps (float (optional)) – term added to the denominator to improve numerical stability (default: 1e-8)
  • weight_decay (float (optional)) – weight decay (L2 penalty) (default: 0)
  • amsgrad (bool (optional)) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False)
Other Parameters:
 

stabilize (int) – Stabilize parameters if they are off-manifold due to numerical reasons every stabilize steps (default: None – no stabilize)

step(closure=None)[source]

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.
class geoopt.optim.RiemannianSGD(params, lr, momentum=0, dampening=0, weight_decay=0, nesterov=False, stabilize=None)[source]

Riemannian Stochastic Gradient Descent with the same API as torch.optim.SGD.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float) – learning rate
  • momentum (float (optional)) – momentum factor (default: 0)
  • weight_decay (float (optional)) – weight decay (L2 penalty) (default: 0)
  • dampening (float (optional)) – dampening for momentum (default: 0)
  • nesterov (bool (optional)) – enables Nesterov momentum (default: False)
Other Parameters:
 

stabilize (int) – Stabilize parameters if they are off-manifold due to numerical reasons every stabilize steps (default: None – no stabilize)

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters:closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

class geoopt.optim.SparseRiemannianAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, amsgrad=False)[source]

Implements lazy version of Adam algorithm suitable for sparse gradients.

In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float (optional)) – learning rate (default: 1e-3)
  • betas (Tuple[float, float] (optional)) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps (float (optional)) – term added to the denominator to improve numerical stability (default: 1e-8)
  • amsgrad (bool (optional)) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False)
Other Parameters:
 

stabilize (int) – Stabilize parameters if they are off-manifold due to numerical reasons every stabilize steps (default: None – no stabilize)

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters:closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

class geoopt.optim.SparseRiemannianSGD(params, lr, momentum=0, dampening=0, nesterov=False, stabilize=None)[source]

Implements lazy version of SGD algorithm suitable for sparse gradients.

In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float) – learning rate
  • momentum (float (optional)) – momentum factor (default: 0)
  • dampening (float (optional)) – dampening for momentum (default: 0)
  • nesterov (bool (optional)) – enables Nesterov momentum (default: False)
Other Parameters:
 

stabilize (int) – Stabilize parameters if they are off-manifold due to numerical reasons every stabilize steps (default: None – no stabilize)

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters:closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.