# DAVIDSTUTZ

Check out our latest research on adversarial robustness and generalization of deep networks.
11thJANUARY2018

Antonin Chambolle, Thomas Pock. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision, 2011.

Chambolle und Pock describe a first-order primal-dual algorithm for non-smooth, convex optimization which is closely related to the Arrow-Hurwicz algorithm []. In the following, I want to briefly sum up their discussion while focussing on the a simple example, the ROF functional [] for image denoising, and the algorithm without convergence analysis.

$\min_x \max_y \langle Kx, y \rangle + G(x) -F^*(y)$

where $K: X \mapsto Y$ is a continuous linear operator, $G:X \mapsto [0,\infty)$ and $F^∗:Y\mapsto[0,\infty)$ for finite dimensional real vector spaces $X$ and $Y$. Furthermore, $G$ and $F^∗$ being a proper, convex, lower-semicontinuous functions and $F^∗$ being the convex conjugate of a convex lower-semicontinuous function $F$. Overall, this problem is the primal dual formulation of the convex optimization problem

$\min_x F(Kx) + G(x)$

Note that a solution $(\hat{x}, \hat{y})$ of the problem, which is assumed to exist, must satisfy

$K\hat{x} \in \partial F^*(\hat{y})$

$- (K*\hat{y}) \in \partial G(\hat{x})$

which represents the first-order condition. Here, $\partial F^*$ and $\partial G$ represent the subgradients of $F^*$ and $G$, respectively. An important underlying assumption for the presented algorithm is that $F$ and $G$ are simple, meaning that the proximal operator, i.e. the resolvent operator, defined through

$x = (I + \tau \partial F)^{-1}(y) = \arg \min_x \left\{\frac{\|x - y\|^2}{2\tau} + F(x)\right\}$

is easy to compute. This may mean it is available in closed form, or is easy to approximate using any optimization algorithm. Note that Moreau's identity states that

$x = (I + \tau \partial F)^{-1}(x) + \tau\left(I + \frac{1}{\tau} \partial F^*\right)^{-1} \left(\frac{x}{\tau}\right)$

such that $(I+\tau\partial F^∗)^{−1}$ can be computed from $(I + \tau \partial F)^{-1}$ and vice-versa (a simple proof and an example can be found in the lecture notes by Candes). The proposed algorithm is summarized in Algorithm 1 and based on the iterative scheme

$y^{n + 1} = (I + \sigma \partial F^*)^{-1}(y^n + \sigma K \bar{x}^n)$

$x^{n + 1} = (I + \tau \partial G)^{-1}(x^n - \tau K^* y^{n + 1})$

$\bar{x}^{n + 1} = x^{n + 1} + \theta(x^{n + 1} - x^n)$

For some $\theta$. The Arrow-Hurwicz algorithm is a special case where $\theta = 0$. The convergence analysis considers the case where $\theta=1$ and holds for $L = \|K\|$ and $\tau \sigma L^2 < 1$ where $\tau$ and $\sigma$ are as in Algorithm 1. Note that for $K$ being the identity, the algorithm resembles the Douglas-Rachford splitting algorithm [refb 7 refs] (which is a special case of the general proximal point algorithm).

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: