P. L. Combettes and J.-C. Pesquet.Proximal Splitting Methods in Signal Processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, 2011.

Combettes and Pesquet give a thorough introduction/overview of proximal splitting algorithms to solve problems of the form

$\min_{x \in \mathbb{R}^n} f_1(x) + \ldots + f_m(x)$

for $f_1,\ldots,f_m$ convex functions.

For the following discussion, let

$\text{prox}_{\gamma f}(x) := \arg \min_{\hat{x} \in \mathbb{R}^n} f(\hat{x}) + \frac{1}{2\gamma} \|x - \hat{x}\|_2^2$

the proximal mapping for a proper, convex and lower semi-continuous function $f$.

They discuss multiple approaches including forward-backward splitting applied to the following problem: For $m = 2$, let $f_1$ be proper convex and lower-semi-continuous and $f_2$ be proper and differentiable with $L$-Lipschitz continuous gradient $\nabla f_1$ (note the similarity to the problem considered by Ochs et al. in [1]). Furthermore, $f_1 + f_2$ is required to be coercive, i.e. for $\|x\| \rightarrow \infty$ it is $f_1(x) + f_2(x) \rightarrow \infty$. For this problem, it can be shown that at least one solution exists and the following fix point equation can be derived:

$x^\ast = \text{prox}_{\gamma f_1}(x^\ast - \gamma \nabla f_2(x^\ast))$.

This formulation suggests the following iterative optimization scheme:

$x^{(n + 1)} = \text{prox}_{\gamma_n f_1}(x^{(n)} - \gamma_n \nabla f_2(x^{(n)}))$

where $x^{(n)}$ denotes the $n$-th iterate and $\gamma_n$ the learning rate in the $n$-th iteration. Here, the application of the proximal mapping $\text{prox}_{\gamma_n f_1}$ is called the backward step and $x^{(n)} - \gamma_n \nabla f_2(x^{(n)})$ is called the forward step. The convergence of this approach is obviously depending on the chosen learning rate $\gamma_n$. For Algorithms 1 and 2, convergence has been shown (in [2] and [3], respectively.

function constant_step_forward_backward(
        $f_1$, $f_2$,
        $L$ // Lipschitz constant of $\nabla f_2$
    choose $\epsilon \in (0, \frac{3}{4})$
    initialize $x^{(0)}$
    for $n = 1,\ldots$
        choose $\lambda_n \in [\epsilon, \frac{3}{2} - \epsilon]$
        $x^{(n + 1)} = x^{(n)} + \lambda_n(\text{prox}_{L^{-1} f_1}(x^{(n)} - L^{-1} \nabla f_2(x^{(n)})) - x^{(n)})$
Algorithm 1: Constant-step forward-backward algorithm.
function forward_backward(
        $f_1$, $f_2$,
        $L$ // Lipschitz constant of $\nabla f_2$
    choose $\epsilon \in (0, \min\{1, \frac{1}{L}\})$
    initialize $x^{(0)}$
    for $n = 1,\ldots$
        choose $\gamma_n \in [\epsilon, \frac{2}{L} - \epsilon]$
        choose $\lambda_n \in [\epsilon, \frac{3}{2} - \epsilon]$
        $x^{(n + 1)} = x^{(n)} + \lambda_n(\text{prox}_{\gamma_n f_1}(x^{(n)} - \gamma_n \nabla f_2(x^{(n)})) - x^{(n)})$
Algorithm 2: Forward-backward algorithm.

The forward-backward splitting algorithm can also be generalized to $f_2$ not being convex and an additional inertial force can be integrated as done by Ochs et al. [1]. While Combettes and Pesquet do not discuss signal processing or computer vision applications, Ochs et al. apply their variant of forward-backward splitting to image denoising and image compression.

Beneath the discussion of popular proximal splitting algorithms, Combettes and Pesquet also give an exhaustive list of proximal mappings for various, commonly used functions as well as useful properties of the proximal mapping operator, some of which are summarized in Table 1.

Table 1: Useful properties of the proximal mapping operator.

Translation$\phi(x - z)$, $z \in \mathbb{R}^n$$z + \text{prox}_\phi(x - z)$
Scaling$\phi(\frac{x}{\rho})$, $\rho \neq 0$$\rho\text{prox}_{\frac{\phi}{\rho^2}}(\frac{x}{\rho}$
  • [1] P. Ochs, Y. Chen,T. Brox, T. Pock. iPiano: Inertial Proximal Algorithm for Non-Convex Optimization. Computing Research Repository, abs/1404.4805, 2014.
  • [2] H. H. Bauschke, P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Sapces. Springer, 2011.
  • [3] P. L. Combettes, V. R. Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Model. Simul. 4, 2005.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.