IAM

JANUARY2018

READING

Antonin Chambolle, Thomas Pock. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision, 2011.

Chambolle und Pock describe a first-order primal-dual algorithm for non-smooth, convex optimization which is closely related to the Arrow-Hurwicz algorithm []. In the following, I want to briefly sum up their discussion while focussing on the a simple example, the ROF functional [] for image denoising, and the algorithm without convergence analysis.

The algorithm tires to address the saddle-point problem

$\min_x \max_y \langle Kx, y \rangle + G(x) -F^*(y)$

where $K: X \mapsto Y$ is a continuous linear operator, $G:X \mapsto [0,\infty)$ and $F^∗:Y\mapsto[0,\infty)$ for finite dimensional real vector spaces $X$ and $Y$. Furthermore, $G$ and $F^∗$ being a proper, convex, lower-semicontinuous functions and $F^∗$ being the convex conjugate of a convex lower-semicontinuous function $F$. Overall, this problem is the primal dual formulation of the convex optimization problem

$\min_x F(Kx) + G(x)$

Note that a solution $(\hat{x}, \hat{y})$ of the problem, which is assumed to exist, must satisfy

$K\hat{x} \in \partial F^*(\hat{y})$

$- (K*\hat{y}) \in \partial G(\hat{x})$

which represents the first-order condition. Here, $\partial F^*$ and $\partial G$ represent the subgradients of $F^*$ and $G$, respectively. An important underlying assumption for the presented algorithm is that $F$ and $G$ are simple, meaning that the proximal operator, i.e. the resolvent operator, defined through

$x = (I + \tau \partial F)^{-1}(y) = \arg \min_x \left\{\frac{\|x - y\|^2}{2\tau} + F(x)\right\}$

is easy to compute. This may mean it is available in closed form, or is easy to approximate using any optimization algorithm. Note that Moreau's identity states that

$x = (I + \tau \partial F)^{-1}(x) + \tau\left(I + \frac{1}{\tau} \partial F^*\right)^{-1} \left(\frac{x}{\tau}\right)$

such that $(I+\tau\partial F^∗)^{−1}$ can be computed from $(I + \tau \partial F)^{-1}$ and vice-versa (a simple proof and an example can be found in the lecture notes by Candes). The proposed algorithm is summarized in Algorithm 1 and based on the iterative scheme

$y^{n + 1} = (I + \sigma \partial F^*)^{-1}(y^n + \sigma K \bar{x}^n)$

$x^{n + 1} = (I + \tau \partial G)^{-1}(x^n - \tau K^* y^{n + 1})$

$\bar{x}^{n + 1} = x^{n + 1} + \theta(x^{n + 1} - x^n)$

For some $\theta$. The Arrow-Hurwicz algorithm is a special case where $\theta = 0$. The convergence analysis considers the case where $\theta=1$ and holds for $L = \|K\|$ and $\tau \sigma L^2 < 1$ where $\tau$ and $\sigma$ are as in Algorithm 1. Note that for $K$ being the identity, the algorithm resembles the Douglas-Rachford splitting algorithm [refb 7 refs] (which is a special case of the general proximal point algorithm).

Algorithm 1: The proposed primal-dual, first-order algorithm for the non-smooth but convex optimization problem as formulated above.

As simple example, Chambolle and Pock discuss image denoising using the ROF functional []:

$\min_u \int_\Omega |Du| + \frac{\lambda}{2}\|u - g\|_2^2$

where $\int_\Omega |Du|$ describes the total variation. $Du$ refers to the distributional derivative of $u$ such that for sufficiently smooth $u$ the total variation becomes $\int_\Omega |\nabla u|$. In the discrete setting, i.e. $u$ being a two-dimensional array, the function is written as

$\min_u \|\nabla u\|_1 + \frac{\lambda}{2} \|u - g\|_2^2$

The total variation in the discrete case reduces to the sum over the Euclidean norm of the gradients at each array position. In order to apply the proposed algorithm, $G$ and $F^∗$ need to be identified. But first, we give the primal-dual formulation:

$\min_u \max_p - \langle u, \text{div} p\rangle + \frac{\lambda}{2} \|u - g\|_2^2 - \delta_P(p)$

where now $G(u)=\frac{\lambda}{2} \|u - g\|_2^2$ and $F^∗(p) = \delta_P (p)$ after having substituted the convex conjugate of the total variation (details can be found in this great introduction to total variation, see Section 3.2.3). The set $P$ is given by $\{p \in Y: \|p\|_\infty \leq 1\}$ and $\delta_P$ and $\delta_P$ denotes the indicator function of the set $P$. It remains to formulate the proximal operators for $G$ and $F^∗$. These can be found in most textbooks on variational methods for image processing and are given by:

$p = (I + \sigma \partial F^*)^{-1}(\tilde{p}) \Leftrightarrow p_{i,j} = \frac{\tilde{p}_{i,j}}{\max(1, |\tilde{p}_{i,j}|}$

$u = (I + \tau \partial G)^{-1} (\tilde{u}) \Leftrightarrow u_{i,j} = \frac{\tilde{u}_{i,j} + \tau \lambda g_{i,j}}{1 + \tau \lambda}$

Now the algorithm can be applied. Examples are given in Figure 1.

Figure 1: Denoising using the ROF functional and the proposed algorithm.
  • [] K. J. Arrow, L. Hurwicz, and H. Uzawa. Studies in linear and non-linear programming. With contributions by H. B. Chenery, S. M. Johnson, S. Karlin, T. Marschak, R. M. Solow. Stanford Mathematical Studies in the Social Sciences, vol. II. Stanford University Press, Stanford, Calif., 1958.
  • [] A. Chambolle. Total variation minimization and a class of binary MRF models. In Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 136–152, 2005.
  • [] L. Rudin, S. J. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992. [also in Experimental Mathematics: Computational Issues in Nonlinear Science (Proc. Los Alamos Conf. 1991)].
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.