11^{th}JANUARY2018

Antonin Chambolle, Thomas Pock. *A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging*. Journal of Mathematical Imaging and Vision, 2011.

Chambolle und Pock describe a first-order primal-dual algorithm for non-smooth, convex optimization which is closely related to the Arrow-Hurwicz algorithm []. In the following, I want to briefly sum up their discussion while focussing on the a simple example, the ROF functional [] for image denoising, and the algorithm without convergence analysis.

The algorithm tires to address the saddle-point problem

$\min_x \max_y \langle Kx, y \rangle + G(x) -F^*(y)$

where $K: X \mapsto Y$ is a continuous linear operator, $G:X \mapsto [0,\infty)$ and $F^∗:Y\mapsto[0,\infty)$ for finite dimensional real vector spaces $X$ and $Y$. Furthermore, $G$ and $F^∗$ being a proper, convex, lower-semicontinuous functions and $F^∗$ being the convex conjugate of a convex lower-semicontinuous function $F$. Overall, this problem is the primal dual formulation of the convex optimization problem

$\min_x F(Kx) + G(x)$

Note that a solution $(\hat{x}, \hat{y})$ of the problem, which is assumed to exist, must satisfy

$K\hat{x} \in \partial F^*(\hat{y})$

$- (K*\hat{y}) \in \partial G(\hat{x})$

which represents the first-order condition. Here, $\partial F^*$ and $\partial G$ represent the subgradients of $F^*$ and $G$, respectively. An important underlying assumption for the presented algorithm is that $F$ and $G$ are simple, meaning that the proximal operator, i.e. the resolvent operator, defined through

$x = (I + \tau \partial F)^{-1}(y) = \arg \min_x \left\{\frac{\|x - y\|^2}{2\tau} + F(x)\right\}$

is easy to compute. This may mean it is available in closed form, or is easy to approximate using any optimization algorithm. Note that Moreau's identity states that

$x = (I + \tau \partial F)^{-1}(x) + \tau\left(I + \frac{1}{\tau} \partial F^*\right)^{-1} \left(\frac{x}{\tau}\right)$

such that $(I+\tau\partial F^∗)^{−1}$ can be computed from $(I + \tau \partial F)^{-1}$ and vice-versa (a simple proof and an example can be found in the lecture notes by Candes). The proposed algorithm is summarized in Algorithm 1 and based on the iterative scheme

$y^{n + 1} = (I + \sigma \partial F^*)^{-1}(y^n + \sigma K \bar{x}^n)$

$x^{n + 1} = (I + \tau \partial G)^{-1}(x^n - \tau K^* y^{n + 1})$

$\bar{x}^{n + 1} = x^{n + 1} + \theta(x^{n + 1} - x^n)$

For some $\theta$. The Arrow-Hurwicz algorithm is a special case where $\theta = 0$. The convergence analysis considers the case where $\theta=1$ and holds for $L = \|K\|$ and $\tau \sigma L^2 < 1$ where $\tau$ and $\sigma$ are as in Algorithm 1. Note that for $K$ being the identity, the algorithm resembles the Douglas-Rachford splitting algorithm [refb 7 refs] (which is a special case of the general proximal point algorithm).

As simple example, Chambolle and Pock discuss image denoising using the ROF functional []:

$\min_u \int_\Omega |Du| + \frac{\lambda}{2}\|u - g\|_2^2$

where $\int_\Omega |Du|$ describes the total variation. $Du$ refers to the distributional derivative of $u$ such that for sufficiently smooth $u$ the total variation becomes $\int_\Omega |\nabla u|$. In the discrete setting, i.e. $u$ being a two-dimensional array, the function is written as

$\min_u \|\nabla u\|_1 + \frac{\lambda}{2} \|u - g\|_2^2$

The total variation in the discrete case reduces to the sum over the Euclidean norm of the gradients at each array position. In order to apply the proposed algorithm, $G$ and $F^∗$ need to be identified. But first, we give the primal-dual formulation:

$\min_u \max_p - \langle u, \text{div} p\rangle + \frac{\lambda}{2} \|u - g\|_2^2 - \delta_P(p)$

where now $G(u)=\frac{\lambda}{2} \|u - g\|_2^2$ and $F^∗(p) = \delta_P (p)$ after having substituted the convex conjugate of the total variation (details can be found in this great introduction to total variation, see Section 3.2.3). The set $P$ is given by $\{p \in Y: \|p\|_\infty \leq 1\}$ and $\delta_P$ and $\delta_P$ denotes the indicator function of the set $P$. It remains to formulate the proximal operators for $G$ and $F^∗$. These can be found in most textbooks on variational methods for image processing and are given by:

$p = (I + \sigma \partial F^*)^{-1}(\tilde{p}) \Leftrightarrow p_{i,j} = \frac{\tilde{p}_{i,j}}{\max(1, |\tilde{p}_{i,j}|}$

$u = (I + \tau \partial G)^{-1} (\tilde{u}) \Leftrightarrow u_{i,j} = \frac{\tilde{u}_{i,j} + \tau \lambda g_{i,j}}{1 + \tau \lambda}$

Now the algorithm can be applied. Examples are given in Figure 1.

your opinionon this article?Let me knowyour thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.