Applying Adjoints Twice: An Efficient Gradient Implementation for Models with Linear Structure with Applications in Reconstruction for EPMA

Tamme Claus1, Gaurav Achuda2, Silvia Richter2, Manuel Torrilhon1

1 ACoM, Applied and Computational Mathematics, RWTH Aachen University

2 GFE, Central Facility for Electron Microscopy, RWTH Aachen University

Computational Methods for Inverse Problems, ECCOMAS 2024

Motivation: Electron Probe Microanalysis (EPMA)

"Material imaging based on characteristic x-ray emission"

  • Material Science, Quality control, Mineralogy, Semiconductors
  • Ionization of material sample using high-energy focused electron beam
  • Wavelength-dispersive spectrometers measure characteristic x-radiation

Microprobe at GFE (source: gfe.rwth-aachen.de)

Material
Interaction Volume
  • Example Material: two-phase (A, B)
  • Electron propagation affected by scattering (animated: different energies)
  • Ionization and emission of characteristic x-rays (wavelength of A-rays and B-rays differ)
  • Spatial information gained via different beam positions (linescan), beam angles and beam energies
  • inverse problem of material reconstruction \begin{equation} p^* = \underset{p \in P}{\text{arg min}} \underbrace{|| \Sigma(p) - \tilde{\Sigma} ||^2}_{=C(p)} \end{equation}
  • gradient-based iterative methods (Steepest Descent, BFGS, or HMC)
  • we focus on computing $\nabla_p C(p)$ efficiently via adjoints

Motivation: Matrix Product Bracketing

Consider Matrix Multiplication ($N \gg I, J$):
\[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{blue}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{blue}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{red}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{red}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{blue}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{red}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{blue}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{red}{\Bigg)} \]

where to bracket?

different computational costs: $\color{red}{I\times N^2 + IJ\times N} \text{ or } \color{blue}{J\times N^2 + IJ\times N}$

  • what happens if $\boldsymbol A$ is more general, e.g. "solution of a linear system / (linear, linearized) weak form" instead of "matrix multiplication"? \[ \boldsymbol A g^{(i)} = \{u \in V \, | \, a(u, v) + \langle g^{(i)}, v \rangle = 0 \quad \forall v \in V\} \]
  • what happens if $\boldsymbol A$ is a derivative (structure: chain rule, product rule, ...)? \[ \boldsymbol A g^{(i)} = \frac{\partial f(h(w))}{\partial v}\frac{\partial h(w)}{\partial w}g^{(i)} \]
In general (reduced cost, if $J < I$): \[ \underbrace{\Sigma^T}_{I \times J} = \underbrace{G^T}_{I \times \infty} \cdot{} \underbrace{A^*}_{\infty \times \infty} \cdot{} \underbrace{H^T}_{\infty \times J} \] where $\boldsymbol A ^* $is the adjoint operator.

Gradients in Inverse Problems

"Determine the model parameters $p$ such that the model $\Sigma(p)$ produces the observed data $\tilde{\Sigma}$."

classical approach

\begin{align*} p^* = \underset{p \in P}{\text{arg min}} || \Sigma(p) - \tilde{\Sigma} ||^2 \end{align*}
  • iterative approximation of the minimum based on the gradient of the objective function $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2$
  • Gradient Descent, CG, BFGS, ...
  • regularization: addition of a penalty term $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2 + \mathcal{R}(p)$
  • we will do: $\Sigma = \Sigma \circ \gamma$

bayesian approach

  • approximate the posterior $\pi(p|\tilde{\Sigma})$ (typically: iterative sampling, MCMC)
  • Hamiltonian Monte-Carlo (based on the gradient of the target density)

Efficient evaluation of the objective function and its gradient is crucial!

Applying Adjoints Twice: Ingredients

An adjoint method

Definition of the Adjoint ($H, G$ Hilbert spaces, $A: G \to H$ continuous, linear) \[ \langle h, A(g) \rangle_H := \langle A^*(h), g \rangle_G \quad \forall h \in H \, \forall g \in G \]

  • given multiple $g^{(1)}, ... g^{(I)} \in G$, $h^{(1)}, ... h^{(J)} \in H$ we want to compute
\[ \Sigma^{(ji)} = \langle h^{(j)}, A(g^{(i)}) \rangle_H = \langle A^*(h^{(j)}), g^{(i)} \rangle_G \quad \forall i=1, ...I \, j=1, ...J \]

non-adjoint implementation($\color{green}{J} > \color{orange}{I}$)

adjoint implementation ($\color{orange}{I} > \color{green}{J}$)

\begin{align*} &\text{foreach } \color{orange}{i = 1, ..., I} \text{ do} \\ &\qquad \color{orange}{v^{(i)} \leftarrow A(g^{(i)})}\\ &\qquad \text{foreach } \color{green}{j = 1, ..., J} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{h^{(j)}}, \color{orange}{v^{(i)}} \rangle_H\\ &\qquad \text{end}\\ &\text{end} \end{align*}
\begin{align*} &\text{foreach } \color{green}{j = 1, ..., J} \text{ do} \\ &\qquad \color{green}{\lambda^{(j)} \leftarrow A^*(h^{(j)})}\\ &\qquad \text{foreach } \color{orange}{i = 1, ..., I} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{\lambda^{(j)}}, \color{orange}{g^{(i)}} \rangle_G\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • cost: $\color{orange}{I} \times \mathcal{C}(A) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$
  • cost: $\color{green}{J} \times \mathcal{C}(A^*) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$

The adjoint $A^*$ of the solution operator $A$ of a linear weak form

$A: G \to H$ ($U, G, H$ Hilbert spaces: $U \subseteq G, H$)

  • $\color{orange}{A(g) = (u \in U | a(u, v) + b(v) = 0\, \forall v \in U)}$ where $b(v) = \langle v, g\rangle_G$
  • $\color{green}{A^*(h) = (\lambda \in U | a(v, \lambda) + c(v) = 0 \, \forall v \in U)}$ where $c(\lambda) = \langle h, \lambda \rangle_H$

The adjoint of the solution of a linear weak form (linear pde)

$A: G \to H$ is the solution operator of a linear weak form ($U, G, H$ Hilbert spaces: $U \subseteq G, H$)

  • $\color{orange}{A(g) = (u \in U | a(u, v) + b(v) = 0\, \forall v \in U)}$ where $b(v) = \langle v, g\rangle_G$
  • $\color{green}{A^*(h) =}$ $ \color{green}{(\lambda \in U | a(v, \lambda) + c(v) = 0 \, \forall v \in U)}$ where $c(\lambda) = \langle h, \lambda \rangle_H$

Derivation: we introduce $\lambda \in U$ (in continuous adjoint method (for $\partial_p$): "Lagrange-multiplier") \begin{align*} \langle h, A(g) \rangle_H = \langle h, u \rangle_H &= c(u) \\ &=c(u) + \color{orange}{\underbrace{a(u, \lambda) + b(\lambda)}_{=0 \, \forall \lambda \in U}} \\ &=\color{green}{\underbrace{c(u) + a(u, \lambda)}_{=0 \, \forall u \in U}} + b(\lambda) \\ & \hphantom{= c(u) + a(u, \lambda)} = b(\lambda) = \langle \lambda, g \rangle_G = \langle A^*(h), g \rangle_G \end{align*}

  • given multiple $g^{(i)} \in G$ ($b^{(i)} \in L(G, \mathbb{R})$) and $h^{(j)} \in H$ ($c^{(j)} \in L(H, \mathbb{R})$)

non-adjoint implementation

\begin{align*} &\text{foreach } \color{orange}{i = 1, ... I} \\ &\qquad \color{orange}{\text{solve } a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in U}\\ &\qquad \text{foreach } \color{green}{j = 1, ...J}\\ &\qquad \qquad \Sigma^{(ij)} = \color{green}{c^{(j)}}(\color{orange}{u^{(i)}})\\ &\qquad \text{end}\\ &\text{end} \end{align*}

adjoint implementation

\begin{align*} &\text{foreach } \color{green}{j = 1, ...J} \\ &\qquad \color{green}{\text{solve } a(v, \lambda^{(j)}) + c^{(j)} = 0 \quad \forall v \in U}\\ &\qquad \text{foreach } \color{orange}{i = 1, ...I}\\ &\qquad \qquad \Sigma^{(ij)} = \color{orange}{b^{(i)}}(\color{green}{\lambda^{(j)}})\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • switching trial and test function is the continuous analog to solving the transposed system

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Let $(H, \langle \cdot{}, \cdot{} \rangle_H)$ be a (real) Hilbert space with inner product $\langle \cdot{}, \cdot{} \rangle_H$. For every continuous linear functional $f_h \in H^*$ (the dual of $H$), there exists a unique vector $h \in H$, called the Riesz Representation of $f_h$, such that \[ f_h(x) = \langle h, x \rangle_H \quad \forall x \in H. \]

Definition: Adjoint Operator

Let $(G, \langle \cdot{}, \cdot{} \rangle_G)$ be another (real) Hilbert space, $A:G\to H$ a continuous linear operator between $G$ and $H$ and $f_h \in H^*$ a continuous linear functional. \[ f_h(A(g)) = \langle h, A(g) \rangle_H \quad \forall g \in G \] But $f_h(A(g))$ is also a continuous linear functional $f_\lambda(g)$ in $G$ with a Riesz Representation $\lambda \in G$ \[ f_h(A(g)) = f_\lambda(g) = \langle \lambda, g \rangle_G := \langle A^*(h), g \rangle_G. \]

Examples of (real) adjoint operators

Scalar Multiplication

  • $G, H = \mathbb{R}$
  • $A: \mathbb{R} \to \mathbb{R}, g \mapsto Ag = a \cdot{} g,\quad a \in \mathbb{R}$
  • $A^*: \mathbb{R} \to \mathbb{R}, \mu \mapsto A^* \mu = a \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}} = \langle \mu, a \cdot{} g \rangle_{\mathbb{R}} = \langle a \cdot{} \mu, g \rangle_{\mathbb{R}} = \langle A^*\mu , g \rangle_{\mathbb{R}} \quad \forall g \in \mathbb{R}\, \forall \mu \in \mathbb{R} \end{equation*}

Matrix Multiplication

  • $G = \mathbb{R}^n, H = \mathbb{R}^m$
  • $A: \mathbb{R}^n \to \mathbb{R}^m, g \mapsto Ag = M \cdot{} g, \quad M \in \mathbb{R}^{m \times n}$
  • $A^*: \mathbb{R}^m \to \mathbb{R}^n, \mu \mapsto A^*\mu = M^T \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^m} = \langle \mu, M \cdot{} g \rangle_{\mathbb{R}^m} = \langle M^T \cdot{} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \quad \forall g \in \mathbb{R}^n \, \forall \mu \in \mathbb{R}^m \end{equation*}

Integration

  • $G = L^2(\Omega)$ (with $\langle \cdot{}, \cdot{} \rangle_{L^2(\Omega)} = \int_{\Omega} \cdot{} \cdot{} d x$), $H = \mathbb{R}$
  • $A: L^2(\Omega) \to \mathbb{R}, g(\cdot{}) \mapsto Ag = \int_{\Omega} g(x) d x$
  • $A^*: \mathbb{R} \to L^2(\Omega), \mu \mapsto (A^* \mu)(\cdot{}) = (x \mapsto \mu \cdot{} 1_{\Omega}(x))$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_\mathbb{R} = \mu \cdot{} \int_\Omega g(x) dx = \int_\Omega \mu \cdot{} 1_\Omega(x) g(x) dx = \langle (A^*\mu)(\cdot{}), g(\cdot{}) \rangle_{L^2(\Omega)} \quad \forall g \in L^2(\Omega) \, \forall \mu \in \mathbb{R} \end{equation*}

Examples of (real) adjoint operators

Linear Solve

  • $G, H = \mathbb{R}^n$ and $M \in \mathbb{R}^{n\times n}$ invertible
  • $A: \mathbb{R}^n \to \mathbb{R}^n, g \mapsto (h \in \mathbb{R}^n | M \cdot{} h = g)$, (or $A = M^{-1}$)
  • $A^*: \mathbb{R}^n \to \mathbb{R}^n, \mu \mapsto (\lambda \in \mathbb{R}^n | M^T \cdot{} \lambda = \mu)$, (or $A^* = M^{-T}$)

Derivation (intentionally complicated): We reinterpret \begin{equation*} M \cdot{} h = g \quad \Leftrightarrow \quad \langle v, M \cdot{} h \rangle = \langle v, g \rangle \quad \forall v \in G \end{equation*} in particular (for a fixed but unspecified $\lambda \in G$) \begin{equation} 0 = \langle \lambda, g \rangle - \langle \lambda, M \cdot{} h \rangle \quad \Leftrightarrow \quad 0 = \langle \lambda, g \rangle - \langle M^T \cdot{} \lambda, h \rangle \end{equation} \begin{align*} \langle \mu, Ag \rangle &= \langle \mu, h \rangle\\ &= \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle + \langle \lambda, g \rangle \\ & \quad \text{ with } \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle = 0 \quad \forall h \in H \\ & \quad \text{ or } M^T \cdot{} \lambda = \mu \\ &= \langle A^* \mu, g \rangle \end{align*} (alternatively) \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^n} = \langle \mu, M^{-1} g \rangle_{\mathbb{R}^n} = \langle M^{-T} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \end{equation*}

Examples of (real) adjoint operators

Weak form

  • $G, H$ (real Hilbert spaces)
  • $A: G \to H, g \mapsto \{h \in H | a(h, v) + \langle g, v \rangle_G = 0 \quad \forall v \in G\}$, $\quad a(\cdot{}, \cdot{}) \text{ coercive bilinear form}$
  • $A^*: H \to G, \mu \mapsto \{\lambda \in G | a(v, \lambda) + \langle \mu,v \rangle_H = 0 \quad \forall v \in H\}$

Derivation: \begin{align*} \langle \mu, Ag \rangle_H = f_\mu(h) &= f_\mu(h) + \underbrace{a(h, \lambda) + f_g(\lambda)}_{=0\quad \forall \lambda \in G} \\ &= \underbrace{f_\mu(h) + a(h, \lambda)}_{!= 0 \quad \forall h \in H} + f_g(\lambda) = f_g(\lambda) = \langle \lambda, g \rangle_G \end{align*}

Derivative Notation (from Algorithmic Differentiation)

  • given $f_{(\cdot{})}: X \to Y, x \mapsto f_x$ ($X, Y$ Hilbert) non-linear, Frechet-differentiable

we define:

  • the (continuous, linear) tangent operator $\dot{f}_x \in L(X, Y)$ (directional derivative) \begin{align*} \dot{f}_x(\dot{x}) = \frac{\partial f_x}{\partial x}(\dot{x}) = \lim_{h \to 0} \frac{f_{x + h\dot{x}} - f_{x}}{h} \end{align*}
  • the (continuous, linear) adjoint operator $\bar{f}_x \in L(Y, X)$ \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x (\bar{y}), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*}

Also we define ($y = f_x$):

  • tangent variables $\dot{x} \in X, \dot{y} \in Y$: given $\dot{x} \in X$, $\dot{y} = \dot{f}_x(\dot{x})$
  • adjoint variables $\bar{x} \in X, \bar{y} \in Y$: given $\bar{y} \in Y$, $\bar{x} = \bar{f}_x(\bar{y})$

Applying this to a composition $ f_x = \varphi^{(N)}_{v^{(N-1)}} \circ \, ...\, \circ\, \varphi^{(1)}_x$ where $\varphi^{(n)}$ are single assignments (chain rule!) is tangent/adjoint mode automatic differentiation. (neglecting all implementational details..)

Automatic/Algorithmic Differentiation

Example

\begin{equation} y = f_x = g \circ h_x \quad v = h_x \end{equation} by the chain rule, the tangent operator $\dot{f}_x$ is \begin{equation} \dot{y} = \dot{f}_x(\dot{x}) = \dot{g}_v(\dot{h}_x(\dot{x})) \end{equation} for the adjoint operator $\bar{f}_x$ we use \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{y}, \dot{g}_v(\dot{h}_x(\dot{x})) \rangle_Y = \langle \bar{g}_v (\bar{y}), \dot{h}_x(\dot{x}) \rangle_V = \langle \bar{h}_x(\bar{g}_v(\bar{y})), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*} and find \begin{equation} \bar{f}_x = \bar{h}_x(\bar{g}_v(\bar{y})) \end{equation}

Tangent mode

\begin{align*} &\text{foreach } j = 1, ... J \\ &\qquad \dot{x} \leftarrow e_j \\ &\qquad \dot{y} \leftarrow \dot{g}_v(\dot{h}_x(\dot{x}))\\ &\qquad \text{foreach } i = 1, ... I \\ &\qquad \qquad (Df)^{(i, j)} = \langle e_j, \dot{y} \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

Adjoint mode

\begin{align*} &\text{foreach } i = 1, ... I \\ &\qquad \bar{y} \leftarrow e_i \\ &\qquad \bar{x} \leftarrow \bar{h}_x(\bar{g}_v(\bar{y}))\\ &\qquad \text{foreach } j = 1, ... J \\ &\qquad \qquad (Df)^{(i, j)} = \langle \bar{x}, e_j \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • AD is even more systematic..

Automatic Algorithmic Differentiation

Every numerical program $f: \mathbb{R}^{n-1} \to \mathbb{R}^{m+1}$ can be decomposed into single assignments $(\varphi^{(1)}, ... \varphi^{(N)})$. \begin{align*} &(v^{(0)}, v^{(-1)}, ..., v^{(-n)})^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \quad \forall n = 1, ... N \\ &y = (v^{(N)}, v^{(N-1)}, ..., v^{(N-m)})^T \end{align*}

For every single assignment $\varphi^{(n)}_{(v^{(j)})_{j \prec n}}$ we know

  • tangent operator $\dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v}^{(j)})_{j \prec n})$ and
  • adjoint operator $\bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n})$

Tangent mode

\begin{align*} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &(\dot{v}^{(0)}, \dot{v}^{(-1)}, ...)^T = \dot{x} \\ &\hspace{-30px}\begin{cases} v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \\ \dot{v}^{(n)} = \dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v})_{j \prec n}) \end{cases} \small{\quad \forall n = 1, ... N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &\dot{y} = (\dot{v}^{(N)}, \dot{v}^{(N-1)}, ...)^T \end{align*}

Adjoint mode

\begin{alignat*}{2} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} &&\small{\quad \forall n = 1, ..., N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &(\bar{v}^{(N)}, \bar{v}^{(N-1)}, ...)^T = \bar{y} \\ &\bar{v}^{(n)} = \bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n}) &&\small{\quad \forall n = N-m-1, ..., -n}\\ &\bar{x} = (\bar{v}^{(0)}, \bar{v}^{(-1)}, ...)^T \end{alignat*}

Applying Adjoints Twice

  • parameters $p \in \mathbb{R}^N$
  • domain $\mathcal{R}$
  • system $a_p \in \text{Bil}(U(\mathcal{R}) \times U(\mathcal{R}), \mathbb{R})$
  • excitations $\boldsymbol{b} = (b^{(1)}, ... b^{(I)})^T \in L(U(\mathcal{R}), \mathbb{R}^I)$
  • extraction $c \in L(U(\mathcal{R}), \mathbb{R})$
  • measurements $\boldsymbol{\Sigma} = (\Sigma^{(1)}, ..., \Sigma^{(I)})^T \in \mathbb{R}^I$
  • discrepancy $g: \mathbb{R}^I \to \mathbb{R}$
  • $N, I \gg 1$, ($J=1$)
Notation: $f_{(\cdot{})}: X \to Y$ non-linear, differentiable
  • tangent operator $\dot{f}_x(\dot{x}) = \lim_{h \to 0} \frac{f_{x+h\dot{x}} - f_x}{h}$
  • adjoint operator $\bar{f}_x: \langle \bar{y}, \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x(\bar{y}), \dot{x}\rangle_X \, \forall \bar{y}, \dot{x}$
  • tangent variables $\dot{x}, \dot{y}$, where $\dot{y} = \dot{f}_x(\dot{x})$
  • adjoint variables $\bar{x}, \bar{y}$, where $\bar{x} = \bar{f}_x(\bar{y})$

Model ($u^{(i)} \in U(\mathcal{R}))$:

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in U(\mathcal{R}) \\ \Sigma^{(i)} &= c(u^{(i)}) \\ C &= g_{\boldsymbol \Sigma} \\ \end{align*}

1st "adjoint method" ($\lambda \in U(\mathcal{R})$):

\begin{align*} a_p(v, \lambda) + c(v) &= 0 \quad \forall v \in U(\mathcal{R}) \\ \Sigma^{(i)} &= b^{(i)}(\lambda) \\ C &= g_{\boldsymbol \Sigma} \end{align*}

Tangent model/sensitivity ($\dot{\lambda}^{(n)} \in U(\mathcal{R})$):

\begin{align*} a_p(v, \dot{\lambda}^{(n)}) + \dot{a}_p(v, \lambda, e^{(n)}) &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(\dot{\lambda}^{(n)})) \end{align*}
\begin{align*} a_p(v, \dot{\lambda}^{(n)}) + \overbrace{\dot{a}_p(v, \lambda, e^{(n)})}^{=\beta^{(n)}(v)} &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \underbrace{\dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(\dot{\lambda}^{(n)}))}_{=\alpha(\dot{\lambda}^{(n)})} \end{align*}

2nd (continuous) "adjoint method" ($\bar{\lambda} \in U(\mathcal{R})$):

\begin{align*} a_p(\bar{\lambda}, v) + \dot{g}_{\boldsymbol \Sigma}(\boldsymbol b(v)) &= 0 \quad \forall v \in U(\mathcal{R})\\ \dot{C}^{(n)} &= \dot{a}_p(\bar{\lambda}, \lambda, e^{(n)}) \end{align*}
\begin{align*} \bar{\Sigma} &= \bar{g}_{\boldsymbol \Sigma}(\bar{C})\\ a_p(\bar{\lambda}, v) + \bar{\boldsymbol \Sigma}^T \boldsymbol b(v) &= 0 \quad \forall v \in U(\mathcal{R})\\ \bar{p} &= \bar{a}_p(\bar{\lambda}, \lambda, \bar{C})\\ \nabla_p C &= \bar{p} \end{align*}

Applying Adjoints Twice: Summary

adjoint forward + adjoint derivative

non-adjoint forward + tangent derivative

\begin{align} a_p(u, \lambda) + c(u) &= 0 \quad \forall u \in V(\mathcal{R})\\ \boldsymbol \Sigma &= \boldsymbol b(\lambda) \\ C &= g_{\boldsymbol \Sigma} \\ \bar{\boldsymbol \Sigma} &= \bar{g}_{\boldsymbol \Sigma}(\bar{C})\\ a_p(\bar{\lambda}, \dot{\lambda}) + \bar{\boldsymbol{\Sigma}}^T \boldsymbol b(\dot{\lambda}) &= 0 \quad \forall \dot{\lambda} \in V(\mathcal{R}) \\ \bar{p} &= \bar{a}_p(\bar{\lambda}, \lambda, \bar{C})\\ (\nabla_p C)^{(n)} &= \bar{p}^{(n)} \end{align} \begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V(\mathcal{R}) \\ \Sigma^{(i)} &= c(u^{(i)}) \\ C &= g_{\boldsymbol \Sigma} \\ a_p(\dot{u}^{(i, n)}, v) + \dot{a}_p(u^{(i)}, v, e^{(n)}) &= 0 \quad \forall v \in V(\mathcal{R})\\ \dot{\Sigma}^{(i, n)} &= c(\dot{u}^{(i, n)}) \\ (\nabla_p C)^{(n)} &= \dot{g}_{\boldsymbol \Sigma}(\dot{\boldsymbol \Sigma}^{(n)}) \end{align*}

Generalizations:

  • $p$-dependent excitations $b^{(i)}(v) \rightsquigarrow b_p^{(i)}(v)$
  • $p$-dependent extractions $c(u) \rightsquigarrow c_p(u)$
  • multiple extractions $c_p(u) \rightsquigarrow c^{(j)}_p(u)$ ($I, N \gg J$)

Assumptions:

  • the cost $\mathcal{C}(a)$ of solving $a(\cdot{}, \cdot{}) + ... = 0$ dominates over $\mathcal{C}(\langle \cdot{} \rangle)$
  • $a(\cdot{}, \cdot{})$ cannot be solved directly (e.g. LU)

Costs:

  • non-adjoint forward + tangent derivative: $(I + IN)\times \mathcal{C}(a) + ... \mathcal{C}(\langle\cdot\rangle)$
  • adjoint forward + adjoint derivative: $2J \times \mathcal{C}(a) + ... \times \mathcal{C}(\langle\cdot\rangle)$

Example: An inverse problem based on the Poisson equation

forward (strong form)

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ &C = \frac{1}{2 IJ}\sum_{i, j=1}^{I,J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align*}
  • $m, h^{(j)} \in L^2(\mathcal{R})$, $g^{(i)} \in L^2(\partial \mathcal{R})$
  • weak enforcement of boundary conditions $u, v \in H^1_h(\mathcal{R})$
  • Aubin-Nitsche trick: stable bilinear form \[a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in H^1_h(\mathcal{R})\]
  • $I, N \gg J$ (many bc $g^{(i)}$, many parameters $p \in \mathbb{R}^N$, some extractions $h^{(j)}$)
  • here: $I=800$, $N=757$, $J=7$

forward solutions $u^{(i)}$

measurements $\Sigma^{(i, j)}$

Example: Derivation

weak forms

\begin{align*} a_m(u, v) &= \int_{\mathcal{R}} m \nabla u \cdot{} \nabla v dx - \int_{\partial \mathcal{R}} m \nabla_n u v + u \nabla_n v \, d\Gamma + \int_{\partial \mathcal{R}} \alpha u v \, d \Gamma \\ b^{(i)}(v) &= \int_{\partial \mathcal{R}} \nabla_n v g^{(i)} \, d\Gamma - \int_{\partial \mathcal{R}} \alpha g^{(i)} v \, d\Gamma\\ c^{(j)}(u) &= \int_{\mathcal{R}} h^{(j)} u \, dx \end{align*}

adjoint forward

\begin{align*} a_m(u, \lambda^{(j)}) + c^{(j)}(u) = 0 \quad \forall u\\ \int_{\mathcal{R}} m \nabla u \cdot{} \nabla \lambda^{(j)} dx - \int_{\partial \mathcal{R}} m \nabla_n u \lambda^{(j)} + u \nabla_n \lambda^{(j)} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha u \lambda^{(j)} \, d \Gamma + \int_{\mathcal{R}} \mu^{(j)} u \, dx = 0 \quad \forall u \end{align*}

adjoint forward (strong form)

\begin{align*} \begin{cases} -\nabla m \cdot{} \nabla \lambda^{(j)} = -h^{(j)} \quad &\forall x \in \mathcal{R}\\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

Example: Derivation

adjoint derivative

\begin{align*} &a_m(\bar{\lambda}^{(j)}, \dot{\lambda}) + \bar{\boldsymbol \Sigma}^{(j)T} \boldsymbol{b}(\dot{\lambda}) = 0 \quad \forall \dot{\lambda} \\ &\int_{\mathcal{R}} m \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \dot{\lambda} dx - \int_{\partial \mathcal{R}} m \nabla_n \bar{\lambda}^{(j)} \dot{\lambda} + \bar{\lambda}^{(j)} \nabla_n \dot{\lambda} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha \bar{\lambda}^{(j)} \dot{\lambda} \, d \Gamma \\ &+ \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda} \sum_{i=1}^{I}(\bar{\Sigma}^{(i, j)} g^{(i)}) \, d\Gamma - \int_{\partial \mathcal{R}} \alpha \sum_{i=1}^{I} (\bar{\Sigma}^{(i, j)} g^{(i)}) \dot{\lambda} \, d\Gamma = 0 \quad \forall \dot{\lambda} \end{align*}

adjoint derivative (strong form)

\begin{align*} \begin{cases} - \nabla m \cdot{} \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

tangent model

\begin{align*} \dot{C} = \sum_{j=1}^{J} \dot{a}_m(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{m}) = \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx - \int_{\partial \mathcal{R}} \dot{m} \nabla_n \bar{\lambda}^{(j)} \underbrace{\lambda^{(j)}}_{=0} \, d \Gamma \end{align*}

adjoint model

\begin{align*} \langle \bar{C}, \sum_{j=1}^{J} \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx \rangle = \langle \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)}, \dot{m} \rangle_{L^2(\mathcal{R})} \\ \bar{a}_m(\bar{\lambda}^{(:)}, \lambda^{(:)}, \bar{C}) = \sum_{j=1}^{J} \bar{C} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \end{align*}

Example: Derivation

adjoint parametrization

\begin{align*} \langle \bar{m}, \dot{m} \rangle_{L^2(\mathcal{R})} = \langle \bar{m}, \dot{\gamma}_p(\dot{p}) \rangle_{L^2(\mathcal{R})} = \langle \int_{\mathcal{R}} \frac{\partial \gamma_p}{\partial p} \bar{m} \, dx, \dot{p} \rangle_{\mathbb{R}} \end{align*}

Example: An inverse problem based on the Poisson equation

adjoint forward (strong form)

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla \lambda^{(j)} = -h^{(j)} &\forall x \in \mathcal{R} \\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\partial \mathcal{R}} \nabla_n \lambda^{(j)} g^{(i)} \, d x\\ &C = \frac{1}{2 I J} \sum_{i, j}^{I, J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align*}
adjoint forward $\lambda^{(j)}$
  • rhs: extractions $h^{(j)}$
  • numerical costs:
    • (adjoint) $\sim 52ms \approx 7 \times 7ms$
    • (non-adjoint) $\sim 5.3s \approx 800 \times 7ms$
  • adjoint and non-adjoint measurements agree (numerical precision)

adjoint derivative (strong form)

\begin{align*} &\bar{\Sigma}^{(i, j)} = \frac{1}{IJ} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)}) \bar{C}\\ &\begin{cases} -\nabla \cdot{} m \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R}\\ \end{cases} \\ &\bar{m} = \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \\ \end{align*}
gradient $\bar{m}$
  • rhs: augmented excitations $\bar{\boldsymbol \Sigma}^{(j)T}{\boldsymbol g}$
  • cost (adjoint):
    • (adjoint) $\sim 98 ms \approx 2 \times 7 \times 7ms$
    • (finite differences) $\sim 45 s \approx 757 \times 55ms$
    • (finite differences + non-adjoint)$\sim 1h\, 10min \approx 757 \times 800 \times 7ms$
  • gradient agrees with finite differences (approximately)

Example: An inverse problem based on the Poisson equation

\begin{align*} m = \gamma_p = x \mapsto (0.1, 0.9, 0.4)^T \cdot{} \mathcal{NN}^{2 \to 20 \text{(tanh)} \to 20\text{(tanh)} \to 3\text{(softmax)}}_p(x) \end{align*}
  • inverting on $m$ directly is ill-posed
  • motivated from SciML: use a $\mathcal{NN}$ as the parametrization $\gamma_p$
  • we use the parametrization $\gamma_p$ to "regularize" (here: phase classification)
  • number of parameters: $544$
  • coarser discretization for the inversion ($757$ material cells vs $2972$ for the true measurements)
  • $1$% random noise added to the true measurements
measurements $\Sigma^{(i, j)}$ and $\tilde{\Sigma}^{(i, j)}$
optimized material $m$
objective $\text{MSE}(\Sigma^{(i, j)}, \tilde{\Sigma}^{(i,j)})$
true material $\tilde{m}$
  • convergence dependes in the inital guess (does not always converge)
  • optimizer: BFGS
  • taylor the parametrization $\gamma_p$ to a specific problem (layers, other geometry, phase values, e.g.)

Outlook: Material Reconstruction in EPMA

  • model: (stationary) radiative transfer/linear Boltzmann equation in continuous-slowing-down approximation (CSD)
  • adjoint model: reversed in time and direction(or space) $a_m(\psi, \phi)$ vs. $a_m(\phi, \psi)$

forward $\psi^{(i)}$

adjoint forward $\lambda^{(j)}$

gradient $\bar{\rho}_e$

References

  • J. Bünger: Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. Phd thesis, RWTH Aachen University (2021)
    doi:10.18154/RWTH-2021-05180
  • T. Bui-Tanh: Adjoint and Its roles in Sciences, Engineering, and Mathematics: A Tutorial. arXiv (2023)
    doi:10.48550/arXiv.2306.09917
  • J.A. Halbleib and J.E. Morel: Adjoint Monte Carlo Electron Transport in the Continuous-Slowing-Down-Approximation. J. Comput. Phys. (1980)
    doi:10.1016/0021-9991(80)90106-0
  • U. Naumann: The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. SIAM (2012)
    doi:10.1137/1.9781611972078
  • R.E. Plessix: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. (2006)
    doi:10.1111/j.1365-246X.2006.02978.x

Slides + Pluto.jl notebook (Poisson example)


github.com/tam724

Material Reconstruction in EPMA: Optimization Example

  • $\mathcal{NN}$ parametrization along a single dimension (but with an angle)

  • final measurements (not perfect..)

Material Reconstruction in EPMA

  • (stationary) radiative transfer/linear boltzmann equation in continuous-slowing-down approximation (CSD) ($x \in \mathcal{R} \subset \mathbb{R}^3$, $\epsilon \in \mathbb{R}^+$, $\Omega \in S^2$) \begin{align} -\partial_\epsilon (s(x, \epsilon) \psi^{(i)}(x, \epsilon, \Omega)) + \Omega \cdot{} \nabla_x \psi^{(i)}(x, \epsilon, \Omega) = \int_{S^2} k(x, \epsilon, \Omega \cdot{} \Omega') \psi^{(i)}(x, \epsilon, \Omega') \, d\Omega' - \tau(x, \epsilon) \psi^{(i)}(x, \epsilon, \Omega) \end{align}
  • beam is modeled by boundary conditions (excitations) \begin{align} \psi^{(i)}(x, \epsilon, \Omega) = g^{(i)}(x, \epsilon, \Omega) \quad \forall x \in \partial \mathcal{R} | n \cdot{} \Omega < 0 \end{align}
  • $i = 1, ..., I$: beam position, beam energy, beam direction
  • additivity approximation (for $\sigma = s$, $k$ and $\tau$), $p \in \mathbb{R}^N$ \begin{equation} \sigma(x, \cdot{}) = \sum_{e=1}^{n_e} \rho_{e}(x) \sigma_e(\cdot{})\quad \text{where} \quad (\rho_1, ... \rho_{n_e})^T = \gamma_p \end{equation}
  • x-ray generation and detection (extraction) \begin{align} \Sigma^{(i, j)} = \int_{\mathcal{R}} \int_{0}^{\infty} \sigma^{ion}_j(\epsilon) \rho_j(x) e^{-\int_{d(x)} \sum_{e=1}^{n_e} \mu_e^{(j)} \rho_e(y) \,d y}\int_{S^2} \psi^{(i)}(x, \epsilon, \Omega) \,d \Omega \, d \epsilon \,d x \end{align}
  • $j = 1, ..., J$: number of measured x-ray lines/materials

Material Reconstruction in EPMA

  • compute measurements $\Sigma^{(i, j)}$ with the extraction as the source to the adjoint RT-CSD
  • adjoint RT-CSD: reversed in energy and direction(or space) $a_m(\psi, \phi)$ vs. $a_m(\phi, \psi)$
  • forward $\psi^{(i)}$

    adjoint forward $\lambda^{(j)}$

    cost: $I \times \mathcal{C}(a) \sim I \times 2\text{min}$ cost: $J \times \mathcal{C}(a) \sim J \times 2\text{min}$
  • orange/green marker: measurements from forward
  • green lines: measurements from adjoint forward

Material Reconstruction in EPMA

  • compute the gradient $\bar{\rho}_e$ using the "adjoint augmented" excitations as the source to the RT-CSD

adjoint derivative $\bar{\lambda}^{(j)}$

gradient $\bar{\rho}_e$