2D TV Denosing with ADMM -- Mathematics && Implementation

December 25, 2020

ADMM

For 2D Total variation denosing, we have the following objective, where $x$ is the clean image we want to optimize, $y$ is the origin noisy image, and $D_r $and $D_c$ are doubly block circulant matrices for two 2D convolution.

In detail, both of $x$ and $y$ are vectorized into 1D vectors, and the total variation terms are expressed as two convolutions in Matrix-vector form.

$$ \operatorname{minimize} \enspace \frac{1}{2} \|x-y\|_{2}^{2} + \lambda\|D_r x\|_{1} + \lambda\|D_c x\|_{1} $$

With ADMM, we substitute$D_* x$ with$ z_*$, and add two constraints.

$$ \begin{array}{ll} \operatorname{minimize} & \frac{1}{2} \|x-y\|_{2}^{2} + \lambda\|z_r\|_{1} + \lambda\|z_c\|_{1} \\ \text { subject to } & D_r x-z_r=0 \\ \text { subject to } & D_c x-z_c=0 \end{array} $$

Then, we use augmented lagrangian method to remove constraints：

$$ \begin{aligned} L_{\rho}( x , z_r , \nu_r, z_c, \nu_c ) = \frac{1}{2}\|x-y\|_{2}^{2} & + \lambda\|z_r\|_{1}+ \nu_r ^{ T }(D_r x-z_r)+\frac{\rho}{2}\|D_r x -z_r\|_{2}^{2} \\ & + \lambda\|z_c\|_{1}+ \nu_c ^{ T }(D_c x-z_c)+\frac{\rho}{2}\|D_c x -z_c\|_{2}^{2} \end{aligned} $$

Let $\mu_r = \nu_r / \rho$，$\mu_c = \nu_c / \rho$，transform the above equation into the following one:

$$ \begin{aligned} L_{\rho}( x , z_r , \nu_r, z_c, \nu_c )= \frac{1}{2}\|x-y\|_{2}^{2} & + \lambda\|z_r\|_{1}+ \frac{\rho}{2}\|D_r x-z_r+ \mu_r \|_{2}^{2}-\frac{\rho}{2}\| \mu_r \|_{2}^{2} \\ & + \lambda\|z_c\|_{1}+ \frac{\rho}{2}\|D_c x-z_c+ \mu_c \|_{2}^{2}-\frac{\rho}{2}\| \mu_c \|_{2}^{2} \end{aligned} $$

X subproblem #

For x subproblem, we need to optimize the following equation:

$$ \begin{aligned} x ^{(k+1)} =\arg \min _{ x } \enspace \|x^{(k)}-y\|_{2}^{2} & + \left\|\sqrt{\rho}D_r x^{(k)}- \sqrt{\rho}(z_r ^{(k)}- \mu_r ^{(k)})\right\|_{2}^{2} \\ & + \left\|\sqrt{\rho}D_c x^{(k)}- \sqrt{\rho}(z_c ^{(k)}- \mu_c ^{(k)})\right\|_{2}^{2} \end{aligned} $$

This is a least square problem

$$ \min _{x}\left\|\left[\begin{array}{c} I \\ \sqrt{\rho} D_r \\ \sqrt{\rho} D_c \\ \end{array}\right] x^{(k)} -\left[\begin{array}{c} y \\ \sqrt{\rho}\left( z_r ^{(k)}- \mu_r ^{(k)}\right) \\ \sqrt{\rho}\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \\ \end{array}\right]\right\|_{2}^{2} $$

And we can solve it with the solution of least square $(X^TX)^{-1}X^TY$：

$$ \begin{aligned} x ^{(k+1)} &=\left( I +\rho (D_r^TD_r + D_c^TD_c) \right)^{-1}\left[ I, \sqrt{\rho} D_r^T, \sqrt{\rho} D_c^T \right]\left[\begin{array}{c} y \\ \left.\sqrt{\rho}\left( z_r ^{(k)}- \mu_r ^{(k)}\right)\right] \\ \left.\sqrt{\rho}\left( z_c ^{(k)}- \mu_c ^{(k)}\right)\right] \\ \end{array}\right] \\ &=\left( I +\rho (D_r^TD_r + D_c^TF_c) \right)^{-1} \left( y + \rho \left[ D_r^T\left( z_r ^{(k)}- \mu_r ^{(k)}\right) + D_c^T\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \right] \right) \end{aligned} $$

DFT Speedup #

The inverse of matrix $I +\rho (D_r^TD_r + D_c^TD_c)$ is computationally expensive. To see why, suppose there is a 200 * 300 image, then the shape of $D_r$ and $D_c$are both (200*300, 200*300), and it means we have to take inverse of a giant (200*300, 200*300) matrix, which is impractical.

Thanks to convolution theorem, we could solve this equation purely in frequency domain.

For detail, we start with moving $(I +\rho (D_r^TD_r + D_c^TD_c))^{-1}$ to the left hand side:

$$ \left( I +\rho (D_r^TD_r + D_c^TF_c) \right) x ^{(k+1)} = \left( y + \rho \left[ D_r^T\left( z_r ^{(k)}- \mu_r ^{(k)}\right) + D_c^T\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \right] \right) $$

Suppose $\mathcal{F}$ is the Fourier matrix and $\mathcal{F}^{-1}$ is the inverse Fourier matrix. Perform fourier transform on the both sides of above equation gives us the following result (using the linearity of Fourier transform):

$$ \left( \mathcal{F}I +\rho (\mathcal{F}D_r^TD_r + \mathcal{F}D_c^TF_c) \right) x ^{(k+1)} = \left( \mathcal{F}y + \rho \left[ \mathcal{F}D_r^T\left( z_r ^{(k)}- \mu_r ^{(k)}\right) + \mathcal{F}D_c^T\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \right] \right) $$

If we use the property of $\mathcal{F}^{-1}F = \mathcal{F}^H\mathcal{F} = I$, we could get:

$$ \begin{aligned} \left( \mathcal{F}I +\rho (\mathcal{F}D_r^TD_r + \mathcal{F}D_c^TF_c) \right) \mathcal{F}^H\mathcal{F} x ^{(k+1)} = LHS \\ \text{=>} \left( \mathcal{F}I\mathcal{F}^H +\rho (\mathcal{F}D_r^TD_r\mathcal{F}^H + \mathcal{F}D_c^TF_c\mathcal{F}^H) \right) \mathcal{F} x ^{(k+1)} = LHS \end{aligned} $$

From the facts below:

$\mathcal{F}I\mathcal{F}^H = I$
Fourier matrix can diagonize any circulant matrix in the way of $\mathcal{F}D\mathcal{F}^H$, See [Link1], [Link2].

We could get:

$$ \left( I +\rho (\Lambda_r + \Lambda_c) \right) \mathcal{F} x ^{(k+1)} = \left( \mathcal{F}y + \rho \left[ \mathcal{F}D_r^T\left( z_r ^{(k)}- \mu_r ^{(k)}\right) + \mathcal{F}D_c^T\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \right] \right) $$

In frequency domain, convolution is element-wise multiplication, so the final result is:

$$ x ^{(k+1)} = \mathcal{F}^{-1} \frac{\left( \mathcal{F}y + \rho \left[ \mathcal{F}D_r^T\left( z_r ^{(k)}- \mu_r ^{(k)}\right) + \mathcal{F}D_c^T\left( z_c ^{(k)}- \mu_c ^{(k)}\right) \right] \right)} {\left( I +\rho (\Lambda_r + \Lambda_c) \right)} $$

Implementation #

Recall that any doubly block circulant matrices $D$ can be diagonized by Fourier matrix. The column of $F^H$ are the eigen vectors and the corresponding eigen values are the DFT values of the signal generating the circulant matrix.

And with the following equations, we know the eigen value of $D^TD$ are just the multiplication of the eigen matrix of $D$ and its conjugate.

$$ H = \mathcal{F}^{H} D \mathcal{F} \\ H^H = \mathcal{F}^{H} D^T \mathcal{F} \\ H^HH = \mathcal{F}^{H} D^T \mathcal{F}\mathcal{F}^{H} D \mathcal{F} = \mathcal{F}^{H} D^T D \mathcal{F} $$

Summarize all the fact above, we could caluclate $\Lambda_r + \Lambda_c$ with the following Python snippet.

eigDtD = np.abs(np.fft.fft2(np.array([[1, -1]]), (row, col))) ** 2 + \
         np.abs(np.fft.fft2(np.array([[1, -1]]).transpose(), (row, col))) ** 2

Note that we use the fact that $A^HA = ||A||_2^2$

Since $\Lambda_r + \Lambda_c$ is diagonal, addition and multiplication of $I +\rho (\Lambda_r + \Lambda_c)$ could be done in element-wise way.

lhs = 1 + rho * eigDtD