Appying Adjoints Twice

Applying Adjoints Twice: An Efficient Gradient Implementation for Models with Linear Structure with Applications in Reconstruction for EPMA

Tamme Claus¹, Gaurav Achuda², Silvia Richter², Manuel Torrilhon¹

¹ ACoM, Applied and Computational Mathematics, RWTH Aachen University

² GFE, Central Facility for Electron Microscopy, RWTH Aachen University

Computational Methods for Inverse Problems, ECCOMAS 2024

Motivation: Electron Probe Microanalysis (EPMA)

"Material imaging based on characteristic x-ray emission"

Material Science, Quality control, Mineralogy, Semiconductors
Ionization of material sample using high-energy focused electron beam
Wavelength-dispersive spectrometers measure characteristic x-radiation

Microprobe at GFE (source: gfe.rwth-aachen.de)

Example Material: two-phase (A, B)
Electron propagation affected by scattering (animated: different energies)
Ionization and emission of characteristic x-rays (wavelength of A-rays and B-rays differ)
Spatial information gained via different beam positions (linescan), beam angles and beam energies
inverse problem of material reconstruction \begin{equation} p^* = \underset{p \in P}{\text{arg min}} \underbrace{|| \Sigma(p) - \tilde{\Sigma} ||^2}_{=C(p)} \end{equation}
gradient-based iterative methods (Steepest Descent, BFGS, or HMC)
we focus on computing $\nabla_p C(p)$ efficiently via adjoints

Motivation: Matrix Product Bracketing

Consider Matrix Multiplication ($N \gg I, J$):

\[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{blue}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{blue}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{red}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{red}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{blue}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{red}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{blue}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{red}{\Bigg)} \]

where to bracket?

different computational costs: $\color{red}{I\times N^2 + IJ\times N} \text{ or } \color{blue}{J\times N^2 + IJ\times N}$

what happens if $\boldsymbol A$ is more general, e.g. "solution of a linear system / (linear, linearized) weak form" instead of "matrix multiplication"? \[ \boldsymbol A g^{(i)} = \{u \in V \, | \, a(u, v) + \langle g^{(i)}, v \rangle = 0 \quad \forall v \in V\} \]
what happens if $\boldsymbol A$ is a derivative (structure: chain rule, product rule, ...)? \[ \boldsymbol A g^{(i)} = \frac{\partial f(h(w))}{\partial v}\frac{\partial h(w)}{\partial w}g^{(i)} \]

In general (reduced cost, if $J < I$): \[ \underbrace{\Sigma^T}_{I \times J} = \underbrace{G^T}_{I \times \infty} \cdot{} \underbrace{A^*}_{\infty \times \infty} \cdot{} \underbrace{H^T}_{\infty \times J} \] where $\boldsymbol A ^* $is the adjoint operator.

Gradients in Inverse Problems

"Determine the model parameters $p$ such that the model $\Sigma(p)$ produces the observed data $\tilde{\Sigma}$."

classical approach

\begin{align*} p^* = \underset{p \in P}{\text{arg min}} || \Sigma(p) - \tilde{\Sigma} ||^2 \end{align*}

iterative approximation of the minimum based on the gradient of the objective function $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2$
Gradient Descent, CG, BFGS, ...
regularization: addition of a penalty term $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2 + \mathcal{R}(p)$
we will do: $\Sigma = \Sigma \circ \gamma$

bayesian approach

approximate the posterior $\pi(p|\tilde{\Sigma})$ (typically: iterative sampling, MCMC)
Hamiltonian Monte-Carlo (based on the gradient of the target density)

Efficient evaluation of the objective function and its gradient is crucial!

Applying Adjoints Twice: Ingredients

An adjoint method

Definition of the Adjoint ($H, G$ Hilbert spaces, $A: G \to H$ continuous, linear) \[ \langle h, A(g) \rangle_H := \langle A^*(h), g \rangle_G \quad \forall h \in H \, \forall g \in G \]

given multiple $g^{(1)}, ... g^{(I)} \in G$, $h^{(1)}, ... h^{(J)} \in H$ we want to compute

\[ \Sigma^{(ji)} = \langle h^{(j)}, A(g^{(i)}) \rangle_H = \langle A^*(h^{(j)}), g^{(i)} \rangle_G \quad \forall i=1, ...I \, j=1, ...J \]

non-adjoint implementation($\color{green}{J} > \color{orange}{I}$)		adjoint implementation ($\color{orange}{I} > \color{green}{J}$)
\begin{align} &\text{foreach } \color{orange}{i = 1, ..., I} \text{ do} \\ &\qquad \color{orange}{v^{(i)} \leftarrow A(g^{(i)})}\\ &\qquad \text{foreach } \color{green}{j = 1, ..., J} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{h^{(j)}}, \color{orange}{v^{(i)}} \rangle_H\\ &\qquad \text{end}\\ &\text{end} \end{align}		\begin{align} &\text{foreach } \color{green}{j = 1, ..., J} \text{ do} \\ &\qquad \color{green}{\lambda^{(j)} \leftarrow A^(h^{(j)})}\\ &\qquad \text{foreach } \color{orange}{i = 1, ..., I} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{\lambda^{(j)}}, \color{orange}{g^{(i)}} \rangle_G\\ &\qquad \text{end}\\ &\text{end} \end{align*}
cost: $\color{orange}{I} \times \mathcal{C}(A) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$		cost: $\color{green}{J} \times \mathcal{C}(A^*) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$

The adjoint $A^*$ of the solution operator $A$ of a linear weak form

$A: G \to H$ ($U, G, H$ Hilbert spaces: $U \subseteq G, H$)

$\color{orange}{A(g) = (u \in U | a(u, v) + b(v) = 0\, \forall v \in U)}$ where $b(v) = \langle v, g\rangle_G$
$\color{green}{A^*(h) = (\lambda \in U | a(v, \lambda) + c(v) = 0 \, \forall v \in U)}$ where $c(\lambda) = \langle h, \lambda \rangle_H$

The adjoint of the solution of a linear weak form (linear pde)

$A: G \to H$ is the solution operator of a linear weak form ($U, G, H$ Hilbert spaces: $U \subseteq G, H$)

$\color{orange}{A(g) = (u \in U | a(u, v) + b(v) = 0\, \forall v \in U)}$ where $b(v) = \langle v, g\rangle_G$
$\color{green}{A^*(h) =}$ $ \color{green}{(\lambda \in U | a(v, \lambda) + c(v) = 0 \, \forall v \in U)}$ where $c(\lambda) = \langle h, \lambda \rangle_H$

Derivation: we introduce $\lambda \in U$ (in continuous adjoint method (for $\partial_p$): "Lagrange-multiplier") \begin{align*} \langle h, A(g) \rangle_H = \langle h, u \rangle_H &= c(u) \\ &=c(u) + \color{orange}{\underbrace{a(u, \lambda) + b(\lambda)}_{=0 \, \forall \lambda \in U}} \\ &=\color{green}{\underbrace{c(u) + a(u, \lambda)}_{=0 \, \forall u \in U}} + b(\lambda) \\ & \hphantom{= c(u) + a(u, \lambda)} = b(\lambda) = \langle \lambda, g \rangle_G = \langle A^*(h), g \rangle_G \end{align*}

given multiple $g^{(i)} \in G$ ($b^{(i)} \in L(G, \mathbb{R})$) and $h^{(j)} \in H$ ($c^{(j)} \in L(H, \mathbb{R})$)

non-adjoint implementation

\begin{align*} &\text{foreach } \color{orange}{i = 1, ... I} \\ &\qquad \color{orange}{\text{solve } a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in U}\\ &\qquad \text{foreach } \color{green}{j = 1, ...J}\\ &\qquad \qquad \Sigma^{(ij)} = \color{green}{c^{(j)}}(\color{orange}{u^{(i)}})\\ &\qquad \text{end}\\ &\text{end} \end{align*}

adjoint implementation

\begin{align*} &\text{foreach } \color{green}{j = 1, ...J} \\ &\qquad \color{green}{\text{solve } a(v, \lambda^{(j)}) + c^{(j)} = 0 \quad \forall v \in U}\\ &\qquad \text{foreach } \color{orange}{i = 1, ...I}\\ &\qquad \qquad \Sigma^{(ij)} = \color{orange}{b^{(i)}}(\color{green}{\lambda^{(j)}})\\ &\qquad \text{end}\\ &\text{end} \end{align*}

switching trial and test function is the continuous analog to solving the transposed system

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Let $(H, \langle \cdot{}, \cdot{} \rangle_H)$ be a (real) Hilbert space with inner product $\langle \cdot{}, \cdot{} \rangle_H$. For every continuous linear functional $f_h \in H^*$ (the dual of $H$), there exists a unique vector $h \in H$, called the Riesz Representation of $f_h$, such that \[ f_h(x) = \langle h, x \rangle_H \quad \forall x \in H. \]

Definition: Adjoint Operator

Let $(G, \langle \cdot{}, \cdot{} \rangle_G)$ be another (real) Hilbert space, $A:G\to H$ a continuous linear operator between $G$ and $H$ and $f_h \in H^*$ a continuous linear functional. \[ f_h(A(g)) = \langle h, A(g) \rangle_H \quad \forall g \in G \] But $f_h(A(g))$ is also a continuous linear functional $f_\lambda(g)$ in $G$ with a Riesz Representation $\lambda \in G$ \[ f_h(A(g)) = f_\lambda(g) = \langle \lambda, g \rangle_G := \langle A^*(h), g \rangle_G. \]

Examples of (real) adjoint operators

Scalar Multiplication

$G, H = \mathbb{R}$
$A: \mathbb{R} \to \mathbb{R}, g \mapsto Ag = a \cdot{} g,\quad a \in \mathbb{R}$
$A^*: \mathbb{R} \to \mathbb{R}, \mu \mapsto A^* \mu = a \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}} = \langle \mu, a \cdot{} g \rangle_{\mathbb{R}} = \langle a \cdot{} \mu, g \rangle_{\mathbb{R}} = \langle A^*\mu , g \rangle_{\mathbb{R}} \quad \forall g \in \mathbb{R}\, \forall \mu \in \mathbb{R} \end{equation*}

Matrix Multiplication

$G = \mathbb{R}^n, H = \mathbb{R}^m$
$A: \mathbb{R}^n \to \mathbb{R}^m, g \mapsto Ag = M \cdot{} g, \quad M \in \mathbb{R}^{m \times n}$
$A^*: \mathbb{R}^m \to \mathbb{R}^n, \mu \mapsto A^*\mu = M^T \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^m} = \langle \mu, M \cdot{} g \rangle_{\mathbb{R}^m} = \langle M^T \cdot{} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \quad \forall g \in \mathbb{R}^n \, \forall \mu \in \mathbb{R}^m \end{equation*}

Integration

$G = L^2(\Omega)$ (with $\langle \cdot{}, \cdot{} \rangle_{L^2(\Omega)} = \int_{\Omega} \cdot{} \cdot{} d x$), $H = \mathbb{R}$
$A: L^2(\Omega) \to \mathbb{R}, g(\cdot{}) \mapsto Ag = \int_{\Omega} g(x) d x$
$A^*: \mathbb{R} \to L^2(\Omega), \mu \mapsto (A^* \mu)(\cdot{}) = (x \mapsto \mu \cdot{} 1_{\Omega}(x))$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_\mathbb{R} = \mu \cdot{} \int_\Omega g(x) dx = \int_\Omega \mu \cdot{} 1_\Omega(x) g(x) dx = \langle (A^*\mu)(\cdot{}), g(\cdot{}) \rangle_{L^2(\Omega)} \quad \forall g \in L^2(\Omega) \, \forall \mu \in \mathbb{R} \end{equation*}

Examples of (real) adjoint operators

Linear Solve

$G, H = \mathbb{R}^n$ and $M \in \mathbb{R}^{n\times n}$ invertible
$A: \mathbb{R}^n \to \mathbb{R}^n, g \mapsto (h \in \mathbb{R}^n | M \cdot{} h = g)$, (or $A = M^{-1}$)
$A^*: \mathbb{R}^n \to \mathbb{R}^n, \mu \mapsto (\lambda \in \mathbb{R}^n | M^T \cdot{} \lambda = \mu)$, (or $A^* = M^{-T}$)

Derivation (intentionally complicated): We reinterpret \begin{equation*} M \cdot{} h = g \quad \Leftrightarrow \quad \langle v, M \cdot{} h \rangle = \langle v, g \rangle \quad \forall v \in G \end{equation*} in particular (for a fixed but unspecified $\lambda \in G$) \begin{equation} 0 = \langle \lambda, g \rangle - \langle \lambda, M \cdot{} h \rangle \quad \Leftrightarrow \quad 0 = \langle \lambda, g \rangle - \langle M^T \cdot{} \lambda, h \rangle \end{equation} \begin{align*} \langle \mu, Ag \rangle &= \langle \mu, h \rangle\\ &= \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle + \langle \lambda, g \rangle \\ & \quad \text{ with } \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle = 0 \quad \forall h \in H \\ & \quad \text{ or } M^T \cdot{} \lambda = \mu \\ &= \langle A^* \mu, g \rangle \end{align*} (alternatively) \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^n} = \langle \mu, M^{-1} g \rangle_{\mathbb{R}^n} = \langle M^{-T} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \end{equation*}

Examples of (real) adjoint operators

Weak form

$G, H$ (real Hilbert spaces)
$A: G \to H, g \mapsto \{h \in H | a(h, v) + \langle g, v \rangle_G = 0 \quad \forall v \in G\}$, $\quad a(\cdot{}, \cdot{}) \text{ coercive bilinear form}$
$A^*: H \to G, \mu \mapsto \{\lambda \in G | a(v, \lambda) + \langle \mu,v \rangle_H = 0 \quad \forall v \in H\}$

Derivation: \begin{align*} \langle \mu, Ag \rangle_H = f_\mu(h) &= f_\mu(h) + \underbrace{a(h, \lambda) + f_g(\lambda)}_{=0\quad \forall \lambda \in G} \\ &= \underbrace{f_\mu(h) + a(h, \lambda)}_{!= 0 \quad \forall h \in H} + f_g(\lambda) = f_g(\lambda) = \langle \lambda, g \rangle_G \end{align*}

Derivative Notation (from Algorithmic Differentiation)

given $f_{(\cdot{})}: X \to Y, x \mapsto f_x$ ($X, Y$ Hilbert) non-linear, Frechet-differentiable

we define:

the (continuous, linear) tangent operator $\dot{f}_x \in L(X, Y)$ (directional derivative) \begin{align*} \dot{f}_x(\dot{x}) = \frac{\partial f_x}{\partial x}(\dot{x}) = \lim_{h \to 0} \frac{f_{x + h\dot{x}} - f_{x}}{h} \end{align*}
the (continuous, linear) adjoint operator $\bar{f}_x \in L(Y, X)$ \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x (\bar{y}), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*}

Also we define ($y = f_x$):

tangent variables $\dot{x} \in X, \dot{y} \in Y$: given $\dot{x} \in X$, $\dot{y} = \dot{f}_x(\dot{x})$
adjoint variables $\bar{x} \in X, \bar{y} \in Y$: given $\bar{y} \in Y$, $\bar{x} = \bar{f}_x(\bar{y})$

Applying this to a composition $ f_x = \varphi^{(N)}_{v^{(N-1)}} \circ \, ...\, \circ\, \varphi^{(1)}_x$ where $\varphi^{(n)}$ are single assignments (chain rule!) is tangent/adjoint mode automatic differentiation. (neglecting all implementational details..)

Automatic/Algorithmic Differentiation

Example

\begin{equation} y = f_x = g \circ h_x \quad v = h_x \end{equation} by the chain rule, the tangent operator $\dot{f}_x$ is \begin{equation} \dot{y} = \dot{f}_x(\dot{x}) = \dot{g}_v(\dot{h}_x(\dot{x})) \end{equation} for the adjoint operator $\bar{f}_x$ we use \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{y}, \dot{g}_v(\dot{h}_x(\dot{x})) \rangle_Y = \langle \bar{g}_v (\bar{y}), \dot{h}_x(\dot{x}) \rangle_V = \langle \bar{h}_x(\bar{g}_v(\bar{y})), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*} and find \begin{equation} \bar{f}_x = \bar{h}_x(\bar{g}_v(\bar{y})) \end{equation}

Tangent mode

\begin{align*} &\text{foreach } j = 1, ... J \\ &\qquad \dot{x} \leftarrow e_j \\ &\qquad \dot{y} \leftarrow \dot{g}_v(\dot{h}_x(\dot{x}))\\ &\qquad \text{foreach } i = 1, ... I \\ &\qquad \qquad (Df)^{(i, j)} = \langle e_j, \dot{y} \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

Adjoint mode

\begin{align*} &\text{foreach } i = 1, ... I \\ &\qquad \bar{y} \leftarrow e_i \\ &\qquad \bar{x} \leftarrow \bar{h}_x(\bar{g}_v(\bar{y}))\\ &\qquad \text{foreach } j = 1, ... J \\ &\qquad \qquad (Df)^{(i, j)} = \langle \bar{x}, e_j \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

AD is even more systematic..

Automatic Algorithmic Differentiation

Every numerical program $f: \mathbb{R}^{n-1} \to \mathbb{R}^{m+1}$ can be decomposed into single assignments $(\varphi^{(1)}, ... \varphi^{(N)})$. \begin{align*} &(v^{(0)}, v^{(-1)}, ..., v^{(-n)})^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \quad \forall n = 1, ... N \\ &y = (v^{(N)}, v^{(N-1)}, ..., v^{(N-m)})^T \end{align*}

For every single assignment $\varphi^{(n)}_{(v^{(j)})_{j \prec n}}$ we know

tangent operator $\dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v}^{(j)})_{j \prec n})$ and
adjoint operator $\bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n})$

Tangent mode

\begin{align*} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &(\dot{v}^{(0)}, \dot{v}^{(-1)}, ...)^T = \dot{x} \\ &\hspace{-30px}\begin{cases} v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \\ \dot{v}^{(n)} = \dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v})_{j \prec n}) \end{cases} \small{\quad \forall n = 1, ... N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &\dot{y} = (\dot{v}^{(N)}, \dot{v}^{(N-1)}, ...)^T \end{align*}

Adjoint mode

\begin{alignat*}{2} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} &&\small{\quad \forall n = 1, ..., N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &(\bar{v}^{(N)}, \bar{v}^{(N-1)}, ...)^T = \bar{y} \\ &\bar{v}^{(n)} = \bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n}) &&\small{\quad \forall n = N-m-1, ..., -n}\\ &\bar{x} = (\bar{v}^{(0)}, \bar{v}^{(-1)}, ...)^T \end{alignat*}

Applying Adjoints Twice

parameters $p \in \mathbb{R}^N$
domain $\mathcal{R}$
system $a_p \in \text{Bil}(U(\mathcal{R}) \times U(\mathcal{R}), \mathbb{R})$
excitations $\boldsymbol{b} = (b^{(1)}, ... b^{(I)})^T \in L(U(\mathcal{R}), \mathbb{R}^I)$
extraction $c \in L(U(\mathcal{R}), \mathbb{R})$
measurements $\boldsymbol{\Sigma} = (\Sigma^{(1)}, ..., \Sigma^{(I)})^T \in \mathbb{R}^I$
discrepancy $g: \mathbb{R}^I \to \mathbb{R}$
$N, I \gg 1$, ($J=1$)

Notation: $f_{(\cdot{})}: X \to Y$ non-linear, differentiable

tangent operator $\dot{f}_x(\dot{x}) = \lim_{h \to 0} \frac{f_{x+h\dot{x}} - f_x}{h}$
adjoint operator $\bar{f}_x: \langle \bar{y}, \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x(\bar{y}), \dot{x}\rangle_X \, \forall \bar{y}, \dot{x}$
tangent variables $\dot{x}, \dot{y}$, where $\dot{y} = \dot{f}_x(\dot{x})$
adjoint variables $\bar{x}, \bar{y}$, where $\bar{x} = \bar{f}_x(\bar{y})$

Model ($u^{(i)} \in U(\mathcal{R}))$:

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in U(\mathcal{R}) \\ \Sigma^{(i)} &= c(u^{(i)}) \\ C &= g_{\boldsymbol \Sigma} \\ \end{align*}

1st "adjoint method" ($\lambda \in U(\mathcal{R})$):

\begin{align*} a_p(v, \lambda) + c(v) &= 0 \quad \forall v \in U(\mathcal{R}) \\ \Sigma^{(i)} &= b^{(i)}(\lambda) \\ C &= g_{\boldsymbol \Sigma} \end{align*}

Tangent model/sensitivity ($\dot{\lambda}^{(n)} \in U(\mathcal{R})$):

\begin{align*} a_p(v, \dot{\lambda}^{(n)}) + \dot{a}_p(v, \lambda, e^{(n)}) &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(\dot{\lambda}^{(n)})) \end{align*}

\begin{align*} a_p(v, \dot{\lambda}^{(n)}) + \overbrace{\dot{a}_p(v, \lambda, e^{(n)})}^{=\beta^{(n)}(v)} &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \underbrace{\dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(\dot{\lambda}^{(n)}))}_{=\alpha(\dot{\lambda}^{(n)})} \end{align*}

2nd (continuous) "adjoint method" ($\bar{\lambda} \in U(\mathcal{R})$):

\begin{align*} a_p(\bar{\lambda}, v) + \dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(v)) &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \dot{a}_p(\bar{\lambda}, \lambda, e^{(n)}) \end{align*}

\begin{align*} \bar{\Sigma} &= \bar{g}_{\boldsymbol \Sigma}(\bar{C})\\ a_p(\bar{\lambda}, v) + \bar{\boldsymbol \Sigma}^T \boldsymbol b(v) &= 0 \quad \forall v \in U(\mathcal{R})\\ \bar{p} &= \bar{a}_p(\bar{\lambda}, \lambda, \bar{C})\\ \nabla_p C &= \bar{p} \end{align*}

Applying Adjoints Twice: Summary

adjoint forward + adjoint derivative

non-adjoint forward + tangent derivative

\begin{align} a_p(u, \lambda) + c(u) &= 0 \quad \forall u \in V(\mathcal{R})\\ \boldsymbol \Sigma &= \boldsymbol b(\lambda) \\ C &= g_{\boldsymbol \Sigma} \\ \bar{\boldsymbol \Sigma} &= \bar{g}_{\boldsymbol \Sigma}(\bar{C})\\ a_p(\bar{\lambda}, \dot{\lambda}) + \bar{\boldsymbol{\Sigma}}^T \boldsymbol b(\dot{\lambda}) &= 0 \quad \forall \dot{\lambda} \in V(\mathcal{R}) \\ \bar{p} &= \bar{a}_p(\bar{\lambda}, \lambda, \bar{C})\\ (\nabla_p C)^{(n)} &= \bar{p}^{(n)} \end{align}

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V(\mathcal{R}) \\ \Sigma^{(i)} &= c(u^{(i)}) \\ C &= g_{\boldsymbol \Sigma} \\ a_p(\dot{u}^{(i, n)}, v) + \dot{a}_p(u^{(i)}, v, e^{(n)}) &= 0 \quad \forall v \in V(\mathcal{R})\\ \dot{\Sigma}^{(i, n)} &= c(\dot{u}^{(i, n)}) \\ (\nabla_p C)^{(n)} &= \dot{g}_{\boldsymbol \Sigma}(\dot{\boldsymbol \Sigma}^{(n)}) \end{align*}

Generalizations:

$p$-dependent excitations $b^{(i)}(v) \rightsquigarrow b_p^{(i)}(v)$
$p$-dependent extractions $c(u) \rightsquigarrow c_p(u)$
multiple extractions $c_p(u) \rightsquigarrow c^{(j)}_p(u)$ ($I, N \gg J$)

Assumptions:

the cost $\mathcal{C}(a)$ of solving $a(\cdot{}, \cdot{}) + ... = 0$ dominates over $\mathcal{C}(\langle \cdot{} \rangle)$
$a(\cdot{}, \cdot{})$ cannot be solved directly (e.g. LU)

Costs:

non-adjoint forward + tangent derivative: $(I + IN)\times \mathcal{C}(a) + ... \mathcal{C}(\langle\cdot\rangle)$
adjoint forward + adjoint derivative: $2J \times \mathcal{C}(a) + ... \times \mathcal{C}(\langle\cdot\rangle)$

Example: An inverse problem based on the Poisson equation

forward (strong form)

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ &C = \frac{1}{2 IJ}\sum_{i, j=1}^{I,J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align*}

$m, h^{(j)} \in L^2(\mathcal{R})$, $g^{(i)} \in L^2(\partial \mathcal{R})$
weak enforcement of boundary conditions $u, v \in H^1_h(\mathcal{R})$
Aubin-Nitsche trick: stable bilinear form \[a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in H^1_h(\mathcal{R})\]
$I, N \gg J$ (many bc $g^{(i)}$, many parameters $p \in \mathbb{R}^N$, some extractions $h^{(j)}$)
here: $I=800$, $N=757$, $J=7$

forward solutions $u^{(i)}$		measurements $\Sigma^{(i, j)}$

Example: Derivation

weak forms

\begin{align*} a_m(u, v) &= \int_{\mathcal{R}} m \nabla u \cdot{} \nabla v dx - \int_{\partial \mathcal{R}} m \nabla_n u v + u \nabla_n v \, d\Gamma + \int_{\partial \mathcal{R}} \alpha u v \, d \Gamma \\ b^{(i)}(v) &= \int_{\partial \mathcal{R}} \nabla_n v g^{(i)} \, d\Gamma - \int_{\partial \mathcal{R}} \alpha g^{(i)} v \, d\Gamma\\ c^{(j)}(u) &= \int_{\mathcal{R}} h^{(j)} u \, dx \end{align*}

adjoint forward

\begin{align*} a_m(u, \lambda^{(j)}) + c^{(j)}(u) = 0 \quad \forall u\\ \int_{\mathcal{R}} m \nabla u \cdot{} \nabla \lambda^{(j)} dx - \int_{\partial \mathcal{R}} m \nabla_n u \lambda^{(j)} + u \nabla_n \lambda^{(j)} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha u \lambda^{(j)} \, d \Gamma + \int_{\mathcal{R}} \mu^{(j)} u \, dx = 0 \quad \forall u \end{align*}

adjoint forward (strong form)

\begin{align*} \begin{cases} -\nabla m \cdot{} \nabla \lambda^{(j)} = -h^{(j)} \quad &\forall x \in \mathcal{R}\\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

Example: Derivation

adjoint derivative

\begin{align*} &a_m(\bar{\lambda}^{(j)}, \dot{\lambda}) + \bar{\boldsymbol \Sigma}^{(j)T} \boldsymbol{b}(\dot{\lambda}) = 0 \quad \forall \dot{\lambda} \\ &\int_{\mathcal{R}} m \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \dot{\lambda} dx - \int_{\partial \mathcal{R}} m \nabla_n \bar{\lambda}^{(j)} \dot{\lambda} + \bar{\lambda}^{(j)} \nabla_n \dot{\lambda} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha \bar{\lambda}^{(j)} \dot{\lambda} \, d \Gamma \\ &+ \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda} \sum_{i=1}^{I}(\bar{\Sigma}^{(i, j)} g^{(i)}) \, d\Gamma - \int_{\partial \mathcal{R}} \alpha \sum_{i=1}^{I} (\bar{\Sigma}^{(i, j)} g^{(i)}) \dot{\lambda} \, d\Gamma = 0 \quad \forall \dot{\lambda} \end{align*}

adjoint derivative (strong form)

\begin{align*} \begin{cases} - \nabla m \cdot{} \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

tangent model

\begin{align*} \dot{C} = \sum_{j=1}^{J} \dot{a}_m(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{m}) = \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx - \int_{\partial \mathcal{R}} \dot{m} \nabla_n \bar{\lambda}^{(j)} \underbrace{\lambda^{(j)}}_{=0} \, d \Gamma \end{align*}

adjoint model

\begin{align*} \langle \bar{C}, \sum_{j=1}^{J} \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx \rangle = \langle \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)}, \dot{m} \rangle_{L^2(\mathcal{R})} \\ \bar{a}_m(\bar{\lambda}^{(:)}, \lambda^{(:)}, \bar{C}) = \sum_{j=1}^{J} \bar{C} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \end{align*}

Example: Derivation

adjoint parametrization

\begin{align*} \langle \bar{m}, \dot{m} \rangle_{L^2(\mathcal{R})} = \langle \bar{m}, \dot{\gamma}_p(\dot{p}) \rangle_{L^2(\mathcal{R})} = \langle \int_{\mathcal{R}} \frac{\partial \gamma_p}{\partial p} \bar{m} \, dx, \dot{p} \rangle_{\mathbb{R}} \end{align*}

Example: An inverse problem based on the Poisson equation

adjoint forward (strong form) \begin{align} &\begin{cases} -\nabla \cdot{} m \nabla \lambda^{(j)} = -h^{(j)} &\forall x \in \mathcal{R} \\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\partial \mathcal{R}} \nabla_n \lambda^{(j)} g^{(i)} \, d x\\ &C = \frac{1}{2 I J} \sum_{i, j}^{I, J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align}	adjoint forward $\lambda^{(j)}$	rhs: extractions $h^{(j)}$ numerical costs: (adjoint) $\sim 52ms \approx 7 \times 7ms$ (non-adjoint) $\sim 5.3s \approx 800 \times 7ms$ adjoint and non-adjoint measurements agree (numerical precision)
adjoint derivative (strong form) \begin{align} &\bar{\Sigma}^{(i, j)} = \frac{1}{IJ} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)}) \bar{C}\\ &\begin{cases} -\nabla \cdot{} m \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R}\\ \end{cases} \\ &\bar{m} = \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \\ \end{align}	gradient $\bar{m}$	rhs: augmented excitations $\bar{\boldsymbol \Sigma}^{(j)T}{\boldsymbol g}$ cost (adjoint): (adjoint) $\sim 98 ms \approx 2 \times 7 \times 7ms$ (finite differences) $\sim 45 s \approx 757 \times 55ms$ (finite differences + non-adjoint)$\sim 1h\, 10min \approx 757 \times 800 \times 7ms$ gradient agrees with finite differences (approximately)

Example: An inverse problem based on the Poisson equation

\begin{align*} m = \gamma_p = x \mapsto (0.1, 0.9, 0.4)^T \cdot{} \mathcal{NN}^{2 \to 20 \text{(tanh)} \to 20\text{(tanh)} \to 3\text{(softmax)}}_p(x) \end{align*}

inverting on $m$ directly is ill-posed
motivated from SciML: use a $\mathcal{NN}$ as the parametrization $\gamma_p$
we use the parametrization $\gamma_p$ to "regularize" (here: phase classification)
number of parameters: $544$
coarser discretization for the inversion ($757$ material cells vs $2972$ for the true measurements)
$1$% random noise added to the true measurements

measurements $\Sigma^{(i, j)}$ and $\tilde{\Sigma}^{(i, j)}$	optimized material $m$	objective $\text{MSE}(\Sigma^{(i, j)}, \tilde{\Sigma}^{(i,j)})$ true material $\tilde{m}$

convergence dependes in the inital guess (does not always converge)
optimizer: BFGS
taylor the parametrization $\gamma_p$ to a specific problem (layers, other geometry, phase values, e.g.)

Outlook: Material Reconstruction in EPMA

model: (stationary) radiative transfer/linear Boltzmann equation in continuous-slowing-down approximation (CSD)
adjoint model: reversed in time and direction(or space) $a_m(\psi, \phi)$ vs. $a_m(\phi, \psi)$

forward $\psi^{(i)}$	adjoint forward $\lambda^{(j)}$	gradient $\bar{\rho}_e$

References

J. Bünger: Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. Phd thesis, RWTH Aachen University (2021)
doi:10.18154/RWTH-2021-05180
T. Bui-Tanh: Adjoint and Its roles in Sciences, Engineering, and Mathematics: A Tutorial. arXiv (2023)
doi:10.48550/arXiv.2306.09917
J.A. Halbleib and J.E. Morel: Adjoint Monte Carlo Electron Transport in the Continuous-Slowing-Down-Approximation. J. Comput. Phys. (1980)
doi:10.1016/0021-9991(80)90106-0
U. Naumann: The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. SIAM (2012)
doi:10.1137/1.9781611972078
R.E. Plessix: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. (2006)
doi:10.1111/j.1365-246X.2006.02978.x

Slides + Pluto.jl notebook (Poisson example)

github.com/tam724

Material Reconstruction in EPMA: Optimization Example

$\mathcal{NN}$ parametrization along a single dimension (but with an angle)

final measurements (not perfect..)

Material Reconstruction in EPMA

(stationary) radiative transfer/linear boltzmann equation in continuous-slowing-down approximation (CSD) ($x \in \mathcal{R} \subset \mathbb{R}^3$, $\epsilon \in \mathbb{R}^+$, $\Omega \in S^2$) \begin{align} -\partial_\epsilon (s(x, \epsilon) \psi^{(i)}(x, \epsilon, \Omega)) + \Omega \cdot{} \nabla_x \psi^{(i)}(x, \epsilon, \Omega) = \int_{S^2} k(x, \epsilon, \Omega \cdot{} \Omega') \psi^{(i)}(x, \epsilon, \Omega') \, d\Omega' - \tau(x, \epsilon) \psi^{(i)}(x, \epsilon, \Omega) \end{align}
beam is modeled by boundary conditions (excitations) \begin{align} \psi^{(i)}(x, \epsilon, \Omega) = g^{(i)}(x, \epsilon, \Omega) \quad \forall x \in \partial \mathcal{R} | n \cdot{} \Omega < 0 \end{align}
$i = 1, ..., I$: beam position, beam energy, beam direction
additivity approximation (for $\sigma = s$, $k$ and $\tau$), $p \in \mathbb{R}^N$ \begin{equation} \sigma(x, \cdot{}) = \sum_{e=1}^{n_e} \rho_{e}(x) \sigma_e(\cdot{})\quad \text{where} \quad (\rho_1, ... \rho_{n_e})^T = \gamma_p \end{equation}
x-ray generation and detection (extraction) \begin{align} \Sigma^{(i, j)} = \int_{\mathcal{R}} \int_{0}^{\infty} \sigma^{ion}_j(\epsilon) \rho_j(x) e^{-\int_{d(x)} \sum_{e=1}^{n_e} \mu_e^{(j)} \rho_e(y) \,d y}\int_{S^2} \psi^{(i)}(x, \epsilon, \Omega) \,d \Omega \, d \epsilon \,d x \end{align}
$j = 1, ..., J$: number of measured x-ray lines/materials

Material Reconstruction in EPMA

compute measurements $\Sigma^{(i, j)}$ with the extraction as the source to the adjoint RT-CSD
adjoint RT-CSD: reversed in energy and direction(or space) $a_m(\psi, \phi)$ vs. $a_m(\phi, \psi)$

forward $\psi^{(i)}$	adjoint forward $\lambda^{(j)}$

cost: $I \times \mathcal{C}(a) \sim I \times 2\text{min}$	cost: $J \times \mathcal{C}(a) \sim J \times 2\text{min}$

orange/green marker: measurements from forward
green lines: measurements from adjoint forward

Material Reconstruction in EPMA

compute the gradient $\bar{\rho}_e$ using the "adjoint augmented" excitations as the source to the RT-CSD

adjoint derivative $\bar{\lambda}^{(j)}$	gradient $\bar{\rho}_e$

Applying Adjoints Twice: An Efficient Gradient Implementation for Models with Linear Structure with Applications in Reconstruction for EPMA

Motivation: Electron Probe Microanalysis (EPMA)

Motivation: Matrix Product Bracketing

Gradients in Inverse Problems

classical approach

bayesian approach

Applying Adjoints Twice: Ingredients

An adjoint method

non-adjoint implementation($\color{green}{J} > \color{orange}{I}$)

adjoint implementation ($\color{orange}{I} > \color{green}{J}$)

The adjoint $A^*$ of the solution operator $A$ of a linear weak form

The adjoint of the solution of a linear weak form (linear pde)

non-adjoint implementation

adjoint implementation

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Definition: Adjoint Operator

Examples of (real) adjoint operators

Scalar Multiplication

Matrix Multiplication

Integration

Examples of (real) adjoint operators

Linear Solve

Examples of (real) adjoint operators

Weak form

Derivative Notation (from Algorithmic Differentiation)

Automatic/Algorithmic Differentiation

Example

Tangent mode

Adjoint mode

Automatic Algorithmic Differentiation

Tangent mode

Adjoint mode

Applying Adjoints Twice

Model ($u^{(i)} \in U(\mathcal{R}))$:

1st "adjoint method" ($\lambda \in U(\mathcal{R})$):

Tangent model/sensitivity ($\dot{\lambda}^{(n)} \in U(\mathcal{R})$):

2nd (continuous) "adjoint method" ($\bar{\lambda} \in U(\mathcal{R})$):

Applying Adjoints Twice: Summary

adjoint forward + adjoint derivative

non-adjoint forward + tangent derivative

Generalizations:

Assumptions:

Costs:

Example: An inverse problem based on the Poisson equation

forward (strong form)

forward solutions $u^{(i)}$

measurements $\Sigma^{(i, j)}$

Example: Derivation

weak forms

adjoint forward

adjoint forward (strong form)

Example: Derivation

adjoint derivative

adjoint derivative (strong form)

tangent model

adjoint model

Example: Derivation

adjoint parametrization

Example: An inverse problem based on the Poisson equation

adjoint forward (strong form)

adjoint forward $\lambda^{(j)}$

adjoint derivative (strong form)

gradient $\bar{m}$

Example: An inverse problem based on the Poisson equation

measurements $\Sigma^{(i, j)}$ and $\tilde{\Sigma}^{(i, j)}$

optimized material $m$

objective $\text{MSE}(\Sigma^{(i, j)}, \tilde{\Sigma}^{(i,j)})$

true material $\tilde{m}$

Outlook: Material Reconstruction in EPMA

forward $\psi^{(i)}$

adjoint forward $\lambda^{(j)}$

gradient $\bar{\rho}_e$

References

Slides + Pluto.jl notebook (Poisson example)

Material Reconstruction in EPMA: Optimization Example

Material Reconstruction in EPMA

Material Reconstruction in EPMA

forward $\psi^{(i)}$

adjoint forward $\lambda^{(j)}$