Algorithms
This section will introduce you to the basic workflow that you'll use to optimise functions, as well as the algorithms we've implemented.
Generally speaking, there are two parts to optimising. The first is to specify the function you want to optimise. Say,
julia> f(x) = sum(x .^ 2)Then, you want to create a Hook with the SemioticOpt.IsStoppingCondition trait. Else, optimisation will get stuck in an infinite loop.
julia> using LinearAlgebra
julia> h = StopWhen((a; locals...) -> norm(x(a) - locals[:z]) < 1e-6) # Stop when the residual is less than the toleranceThen, you want to choose the algorithm you want to use for minimisation and specify its parameters.
julia> a = GradientDescent(; x=[100.0, 50.0], η=1e-1, hooks=[h,]) # Specify parameters for optimisationFinally, you'll run minimize! or minimize.
julia> sol = minimize!(f, a) # OptimiseSemioticOpt.minimize! — Functionminimize!(f::Function, a::OptAlgorithm)Minimize f using a.
Does in-place updates of a.x. This will generally be more performant than SemioticOpt.minimize. However, there are cases in which this will be worse, so we provide both.
If you don't provide any hook with the IsStoppingCondition trait, this will loop forever.
SemioticOpt.minimize — Functionminimize(f::Function, a::OptAlgorithm)Minimize f using a.
This will generally be less performant than SemioticOpt.minimize!. However, there are cases in which this will be better, so we provide it as an option.
If you don't provide any hook with the IsStoppingCondition trait, this will loop forever.
sol here is a struct containing various metadata. If you only care about the optimal value of x, then grab it using
julia> SemioticOpt.x(sol)Gradient Descent
From Wikipedia:
[Gradient Descent]... is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent.
If the function we're optimising f is convex, then a local minimum is also a global minimum. The update rule for gradient descent is $x_{n+1}=x_n - η∇f(x_n)$.
SemioticOpt.GradientDescent — TypeGradientDescent(;x::V, η::T, hooks::S) where {
T<:Real,V<:AbstractVector{T},S<:AbstractVecOrTuple{<:Hook}
}Parameters for gradient descent learning.
Fields
η::Tis the learning rate/step size.x::Vis the current best guess for the solution.hooks::Sare the hooks
A Quick Interlude: Choosing Step Size
If your function is L-Lipschitz and the optimal point$f^* > -\infty$, then $\eta < 2 / L$ will converge to the optimal point. Note that $\eta$ is the step size. See Theorem 4.2 for details. So that you don't have to remember this relationship, just know that if you can tell us the Lipschitz constant of the function you're trying to minimise, we'll tell you what the step size is.
SemioticOpt.stepsize — Functionstepsize(l::Real)Return the optimal step size for a convex function with Lipschitz constant l.
Projected Gradient Descent
Projected Gradient Descent is a more general gradient descent in a sense. Whereas gradient descent itself has no constraint, projected gradient descent allows for you to specify a constraint. In particular, here, we are interested in solving the problem of $\min_x f(x)$ subject to some constraints. Those constraints define the feasible region $\mathcal{C}$. Thus, the full problem we're trying to solve is $\min_x f(x) \textrm{subject to } x\in\mathcal{C}$. Once we do the standard gradient descent step, here we project the result onto the feasible set. In other words,
\[\begin{align*} y_{n} &= x_n - η∇f(x_n) \\ x_{n+1} &= \textrm{Pr}(y_n)_\mathcal{C} \end{align*}\]
To use PGD, you'll need to specify t, the projection function. Any projection function must take only x as input. To get more complex behaviour, you may find it useful to leverage currying (partial function application).
Let's look at an example of this. Say we want to minimise $f(x)=\sum x^2$ subject to a unit simplex constraint. We provide a function SemioticOpt.σsimplex. However, this function doesn't only take x as input, but also takes σ. In order to create a projection function in terms of only x, we will apply currying by creating a new function.
julia> using SemioticOpt
julia> unitsimplex(x) = σsimplex(x, 1)Now, we can create a PGD parameters struct and solve our problem.
julia> using LinearAlgebra
julia> f(x) = sum(x.^2)
julia> a = ProjectedGradientDescent(;
x=[100.0, 50.0],
η=1e-1,
hooks=[StopWhen((a; kws...) -> norm(x(a) - kws[:z]) < 1e-6)],
t=unitsimplex,
)
julia> aopt = minimize(f, a)
julia> SemioticOpt.x(aopt)
2-element Vector{Float64}:
0.5000023384026198
0.49999766159738035SemioticOpt.ProjectedGradientDescent — TypeProjectedGradientDescent(;
x::AbstractVector{T}, η::T, hooks::AbstractVecOrTuple{<:Hook}, t::F
) where {T<:Real,F<:Function}
ProjectedGradientDescent(g::G, t::F) where {G<:GradientDescent,F<:Function}Specifies parameters for SemioticOpt.GradientDescent and the projection function t.
t takes as input the vector to be projected x and returns the projected vector.
Forwarded Methods
Predefined Projection Functions
SemioticOpt.σsimplex — Functionσsimplex(x::AbstractVector{T}, σ::Real) where {T<:Real}Project x onto the σ-simplex.
In other words, project xs to be non-negative and sum to σ.
This operation is precision-senstive, so we conver the data to bigfloat within the function. We then convert back to T before returning.
We use RoundDown while converting from BigFloat to T to ensure that the sum of the projected vector is less than or equal to σ.
SemioticOpt.gssp — Functiongssp(x::AbstractVector{<:Real}, k::Int, σ::Real)Project x onto the intersection of the set of k-sparse vectors and the σ-simplex.
Reference: http://proceedings.mlr.press/v28/kyrillidis13.pdf
Halpern Iteration
Halpern iteration is a method of anchoring an iterative optimisation algorithm to some anchor point $x_0$. The general form of the algorithm is given by
\[x_{k+1} = \lambda_{k+1}x_0 + (1 - \lambda_{k+1})T(x_k)\]
Here, $T: \mathbb{R}^d \to \mathbb{R}^d$ is a non-expansive map (i.e., $\forall x,y \in \mathbb{R}^d: ||T(x) - T(y)|| \leq ||x - y||$). $\lambda_k$ is a step size that must be chosen such that it satisfies the properties.
\[\begin{align*} \lim_{k\to\infty} \lambda_k &= 0 \\ \sum_{k=1}^\infty \lambda_k &= \infty \\ \sum_{k=1}^\infty |\lambda_{k+1} - \lambda_k| &< \infty \\ \end{align*}\]
A reasonable first guess for $\lambda_k$ might be $\frac{1}{k}$ if you're unsure about what to pick.
In the formula for Halpern Iteration, we actually use $k+1$. In our code, you should still define $\lambda_k$ as we'll add the one for you.
This is an implicitly regularised method. If you use an algorithm like gradient descent with Halpern Iteration, you'll converge to the solution with the minimum $\ell2$ distance from x₀.
Implementation-wise, we've actually defined Halpern Iteration using a Hook that exhibits the SemioticOpt.RunAfterIteration trait. To use this, you'll want to add this hook to an existing algorithm. For example,
julia> using LinearAlgebra
julia> f(x) = sum(x.^2)
julia> a = GradientDescent(;
x=[100.0, 50.0],
η=1e-1,
hooks=[
StopWhen((a; kws...) -> norm(x(a) - kws[:z]) < 1e-6),
HalpernIteration(; x₀=[10, 10], λ=k -> 1 / k),
],
)
julia> o = minimize!(f, a)
julia> x(o)
2-element Vector{Float64}:
0.005945303210463734
0.005945303210463734SemioticOpt.HalpernIteration — TypeHalpernIteration{T<:Real,V<:AbstractVector{T}} <: HookA hook for using Halpern Iteration. in which you should specify the x₀ and λ.