Appying Adjoints Twice

Applying Adjoints Twice: Efficient Gradients for Inverse Modeling with Radiative Transfer in EPMA

Tamme Claus¹, Gaurav Achuda², Silvia Richter², Manuel Torrilhon¹

¹ ACoM, Applied and Computational Mathematics, RWTH Aachen University

² GFE, Central Facility for Electron Microscopy, RWTH Aachen University

STCE Retreat 2025

26.08.2025

Motivation: Electron Probe Microanalysis (EPMA)

"Material imaging based on characteristic x-ray emission"

Material Science, Quality control, Mineralogy, Semiconductors
Ionization of material sample using focused electron beam
Wavelength-dispersive spectrometers measure characteristic x-radiation

Microprobe at GFE (source: gfe.rwth-aachen.de)

Example Material: two-phase (A, B)
Electron propagation affected by scattering (animated: different energies)
Ionization and emission of characteristic x-rays (wavelength of A-rays and B-rays differ)
Spatial information gained via different beam positions (linescan), beam angles and beam energies
inverse problem of material reconstruction
\begin{equation} p^* = \underset{p \in P}{\text{arg min}} \underbrace{|| \Sigma(p) - \tilde{\Sigma} ||^2}_{=C(p)} \end{equation}
gradient-based iterative methods (Steepest Descent, BFGS, or HMC)

while (∇ₚC != 0) foreach δp foreach beam solve_pde() end end end

we focus on computing $\nabla_p C(p)$ efficiently via adjoints

Gradients in Inverse Problems

"Determine the model parameters $p$ such that the model $\Sigma(p)$ produces the observed data $\tilde{\Sigma}$."

classical approach

\begin{align*} p^* = \underset{p \in P}{\text{arg min}} || \Sigma(p) - \tilde{\Sigma} ||^2 \end{align*}

iterative approximation of the minimum based on the gradient of the objective function $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2$
Gradient Descent, CG, BFGS, ...
regularization: addition of a penalty term $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2 + \mathcal{R}(p)$
we will do: $\Sigma = \Sigma \circ \gamma$

bayesian approach

approximate the posterior $\pi(p|\tilde{\Sigma})$ (typically: iterative sampling, MCMC)
Hamiltonian Monte-Carlo (based on the gradient of the target density)

Efficient evaluation of the objective function and its gradient is crucial!

Motivation: Matrix Product Bracketing

Consider Matrix Multiplication ($N \gg I, J$):

\[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{green}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{green}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{orange}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{orange}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{green}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{orange}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{green}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{orange}{\Bigg)} \]

Where to bracket?

computational costs: $\color{orange}{I\times N^2 + IJ\times N} \text{ or } \color{green}{J\times N^2 + IJ\times N}$

Generalizations:

$\boldsymbol A$ is solution of a linear system / linear (partial) differential equation? \[ \boldsymbol A g^{(i)} = (u \in U \, | \, a(u, v) + \langle g^{(i)}, v \rangle = 0 \quad \forall v \in V) \]
$\boldsymbol A$ is a (Frechet-) derivative (structure: chain rule, product rule, ...)? \[ \boldsymbol A g^{(i)} = \frac{\partial f(y(x))}{\partial x} g^{(i)} = \frac{\partial f(y(x))}{\partial y}\frac{\partial y(x)}{\partial x}g^{(i)} \]

Two equivalent formulations

using the adjoint operator $\boldsymbol A ^*$.

Motivation: Intepreting Adjoint AD as a Riesz Representation

Given $f: \mathbb{R}^n \to \mathbb{R}^m, x \mapsto f(x) = y$ we want to compute its Jacobian $\Sigma := Jf_x \in \mathbb{R}^{m \times n}$

\begin{align} \text{tangent-mode:} \quad \Sigma^{(:, i)} &= \dot{f}_x(e^{(i)}) \color{gray}{= (Jf_x) e^{(i)}}\\ \text{adjoint-mode:} \quad \Sigma^{(j, :)} &= \bar{f}_x(e^{(j)}) \color{gray}{= (Jf_x)^T e^{(j)}} \end{align}	"derivative-driver" $e^{(i)} \in \mathbb{R}^n, e^{(j)} \in \mathbb{R}^m$ unit vectors
$\Sigma^{(j, i)} = (e^{(j)})^T \dot{f}_x(e^{(i)})$ $\Sigma^{(j, i)} = (e^{(j)})^T \dot{f}_x(e^{(i)}) = (\bar{f}_x(e^{(j)}))^T e^{(i)}$	scalar assignment
$\Sigma^{(j, i)} = \langle e^{(j)}, \dot{f}_x(e^{(i)}) \rangle_{\mathbb{R}^m}$ $\Sigma^{(j, i)} = \langle e^{(j)}, \dot{f}_x(e^{(i)}) \rangle_{\mathbb{R}^m} = \langle \bar{f}_x(e^{(j)}), e^{(i)} \rangle_{\mathbb{R}^n}$	inner product $x^T y = \langle x, y \rangle$
$\Sigma^{(j, i)} = \langle h^{(j)}, A(g^{(i)}) \rangle_{H}$ $\Sigma^{(j, i)} = \langle h^{(j)}, A(g^{(i)}) \rangle_{H} = \langle A^*(h^{(j)}), g^{(i)} \rangle_G$	$\dot{f}_x \rightsquigarrow A: G \to H$ (bounded, linear) $e^{(j)} \in \mathbb{R}^m \rightsquigarrow h^{(j)} \in H$ (Hilbert) $e^{(i)} \in \mathbb{R}^n \rightsquigarrow g^{(i)} \in G$ (Hilbert)

Riesz Representation Theorem (Definition Adjoint Operator)

For every bouded, linear functional $F: G \to \mathbb{R}\phantom{:g \mapsto \langle h^{(j)}, A(g) \rangle_{H}}$ there exists a $\lambda^{(j)} \in G$ \begin{align*} F(g) = \langle \lambda^{(j)}, g \rangle_G\phantom{= \langle A^*(h^{(j)}), g \rangle_G} \end{align*} For every bouded, linear functional $F: G \to \mathbb{R}:g \mapsto \langle h^{(j)}, A(g) \rangle_{H}$ there exists a $\lambda^{(j)} \in G$ \begin{align*} F(g) = \langle \lambda^{(j)}, g \rangle_G = \langle A^*(h^{(j)}), g \rangle_G \end{align*}

The "adjoint mode" can do more than derivatives!

Luenberger D.G. (1997) Optimization by Vector Space Methods
Bui-Thanh, T. (2023). Adjoint and Its roles in Sciences, Engineering, and Mathematics: A Tutorial. doi:10.48550/arXiv.2306.09917

An Abstract Adjoint Method in a Computational Context

Definition of the adjoint operator ($H, G$ Hilbert spaces, $A: G \to H$ continuous, linear operator) \[ \langle h, A(g) \rangle_H = \langle A^*(h), g \rangle_G \quad \forall h \in H \,, g \in G \]

given multiple $g^{(1)}, ... g^{(I)} \in G$ and $h^{(1)}, ... h^{(J)} \in H$ we want to compute

\[ \Sigma^{(ji)} = \langle h^{(j)}, A(g^{(i)}) \rangle_H = \langle A^*(h^{(j)}), g^{(i)} \rangle_G \quad \forall i=1, ...I\, j=1, ...J \]

Non-adjoint Implementation

\begin{align*} &\text{foreach } \color{orange}{i = 1, ..., I} \text{ do} \\ &\qquad \color{orange}{v^{(i)} \leftarrow A(g^{(i)})}\\ &\qquad \text{foreach } \color{green}{j = 1, ..., J} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{h^{(j)}}, \color{orange}{v^{(i)}} \rangle_H\\ &\qquad \text{end}\\ &\text{end} \end{align*}

cost: $\color{orange}{I} \times \mathcal{C}(A) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$

Adjoint Implementation

\begin{align*} &\text{foreach } \color{green}{j = 1, ..., J} \text{ do} \\ &\qquad \color{green}{\lambda^{(j)} \leftarrow A^*(h^{(j)})}\\ &\qquad \text{foreach } \color{orange}{i = 1, ..., I} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{\lambda^{(j)}}, \color{orange}{g^{(i)}} \rangle_G\\ &\qquad \text{end}\\ &\text{end} \end{align*}

cost: $\color{green}{J} \times \mathcal{C}(A^*) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$

typically: $\mathcal{C}(A) \approx \mathcal{C}(A^*)$ and $\mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G) \approx \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$ and $\mathcal{C}(A) \gg \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$

$\color{green}{J} > \color{orange}{I}$: non-adjoint implementation
$\color{orange}{I} > \color{green}{J}$: adjoint implementation

Instead of computing/approximating the "solution" $v^{(i)}= A(g^{(i)})$, we approximate the Riesz representation $\lambda^{(j)} \in G$ of the linear functional $\cdot \mapsto \langle h^{(j)}, A(\cdot{}) \rangle_H$

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Let $(H, \langle \cdot{}, \cdot{} \rangle_H)$ be a (real) Hilbert space with inner product $\langle \cdot{}, \cdot{} \rangle_H$. For every continuous linear functional $f_h \in H^*$ (the dual of $H$), there exists a unique vector $h \in H$, called the Riesz Representation of $f_h$, such that \[ f_h(x) = \langle h, x \rangle_H \quad \forall x \in H. \]

Definition: Adjoint Operator

Let $(G, \langle \cdot{}, \cdot{} \rangle_G)$ be another (real) Hilbert space, $A:G\to H$ a continuous linear operator between $G$ and $H$ and $f_h \in H^*$ a continuous linear functional. \[ f_h(A(g)) = \langle h, A(g) \rangle_H \quad \forall g \in G \] But $f_h(A(g))$ is also a continuous linear functional $f_\lambda(g)$ in $G$ with a Riesz Representation $\lambda \in G$ \[ f_h(A(g)) = f_\lambda(g) = \langle \lambda, g \rangle_G := \langle A^*(h), g \rangle_G. \]

Examples of (real) adjoint operators

Scalar Multiplication

$G, H = \mathbb{R}$
$A: \mathbb{R} \to \mathbb{R}, g \mapsto Ag = a \cdot{} g,\quad a \in \mathbb{R}$
$A^*: \mathbb{R} \to \mathbb{R}, \mu \mapsto A^* \mu = a \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}} = \langle \mu, a \cdot{} g \rangle_{\mathbb{R}} = \langle a \cdot{} \mu, g \rangle_{\mathbb{R}} = \langle A^*\mu , g \rangle_{\mathbb{R}} \quad \forall g \in \mathbb{R}\, \forall \mu \in \mathbb{R} \end{equation*}

Matrix Multiplication

$G = \mathbb{R}^n, H = \mathbb{R}^m$
$A: \mathbb{R}^n \to \mathbb{R}^m, g \mapsto Ag = M \cdot{} g, \quad M \in \mathbb{R}^{m \times n}$
$A^*: \mathbb{R}^m \to \mathbb{R}^n, \mu \mapsto A^*\mu = M^T \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^m} = \langle \mu, M \cdot{} g \rangle_{\mathbb{R}^m} = \langle M^T \cdot{} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \quad \forall g \in \mathbb{R}^n \, \forall \mu \in \mathbb{R}^m \end{equation*}

Integration

$G = L^2(\Omega)$ (with $\langle \cdot{}, \cdot{} \rangle_{L^2(\Omega)} = \int_{\Omega} \cdot{} \cdot{} d x$), $H = \mathbb{R}$
$A: L^2(\Omega) \to \mathbb{R}, g(\cdot{}) \mapsto Ag = \int_{\Omega} g(x) d x$
$A^*: \mathbb{R} \to L^2(\Omega), \mu \mapsto (A^* \mu)(\cdot{}) = (x \mapsto \mu \cdot{} 1_{\Omega}(x))$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_\mathbb{R} = \mu \cdot{} \int_\Omega g(x) dx = \int_\Omega \mu \cdot{} 1_\Omega(x) g(x) dx = \langle (A^*\mu)(\cdot{}), g(\cdot{}) \rangle_{L^2(\Omega)} \quad \forall g \in L^2(\Omega) \, \forall \mu \in \mathbb{R} \end{equation*}

Examples of (real) adjoint operators

Linear Solve

$G, H = \mathbb{R}^n$ and $M \in \mathbb{R}^{n\times n}$ invertible
$A: \mathbb{R}^n \to \mathbb{R}^n, g \mapsto (h \in \mathbb{R}^n | M \cdot{} h = g)$, (or $A = M^{-1}$)
$A^*: \mathbb{R}^n \to \mathbb{R}^n, \mu \mapsto (\lambda \in \mathbb{R}^n | M^T \cdot{} \lambda = \mu)$, (or $A^* = M^{-T}$)

Derivation (intentionally complicated): We reinterpret \begin{equation*} M \cdot{} h = g \quad \Leftrightarrow \quad \langle v, M \cdot{} h \rangle = \langle v, g \rangle \quad \forall v \in G \end{equation*} in particular (for a fixed but unspecified $\lambda \in G$) \begin{equation} 0 = \langle \lambda, g \rangle - \langle \lambda, M \cdot{} h \rangle \quad \Leftrightarrow \quad 0 = \langle \lambda, g \rangle - \langle M^T \cdot{} \lambda, h \rangle \end{equation} \begin{align*} \langle \mu, Ag \rangle &= \langle \mu, h \rangle\\ &= \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle + \langle \lambda, g \rangle \\ & \quad \text{ with } \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle = 0 \quad \forall h \in H \\ & \quad \text{ or } M^T \cdot{} \lambda = \mu \\ &= \langle A^* \mu, g \rangle \end{align*} (alternatively) \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^n} = \langle \mu, M^{-1} g \rangle_{\mathbb{R}^n} = \langle M^{-T} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \end{equation*}

Examples of (real) adjoint operators

Weak form

$G, H$ (real Hilbert spaces)
$A: G \to H, g \mapsto \{h \in H | a(h, v) + \langle g, v \rangle_G = 0 \quad \forall v \in G\}$, $\quad a(\cdot{}, \cdot{}) \text{ coercive bilinear form}$
$A^*: H \to G, \mu \mapsto \{\lambda \in G | a(v, \lambda) + \langle \mu,v \rangle_H = 0 \quad \forall v \in H\}$

Derivation: \begin{align*} \langle \mu, Ag \rangle_H = f_\mu(h) &= f_\mu(h) + \underbrace{a(h, \lambda) + f_g(\lambda)}_{=0\quad \forall \lambda \in G} \\ &= \underbrace{f_\mu(h) + a(h, \lambda)}_{!= 0 \quad \forall h \in H} + f_g(\lambda) = f_g(\lambda) = \langle \lambda, g \rangle_G \end{align*}

Derivative Notation (from Algorithmic Differentiation)

given $f_{(\cdot{})}: X \to Y, x \mapsto f_x$ ($X, Y$ Hilbert) non-linear, Frechet-differentiable

we define:

the (continuous, linear) tangent operator $\dot{f}_x \in L(X, Y)$ (directional derivative) \begin{align*} \dot{f}_x(\dot{x}) = \frac{\partial f_x}{\partial x}(\dot{x}) = \lim_{h \to 0} \frac{f_{x + h\dot{x}} - f_{x}}{h} \end{align*}
the (continuous, linear) adjoint operator $\bar{f}_x \in L(Y, X)$ \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x (\bar{y}), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*}

Also we define ($y = f_x$):

tangent variables $\dot{x} \in X, \dot{y} \in Y$: given $\dot{x} \in X$, $\dot{y} = \dot{f}_x(\dot{x})$
adjoint variables $\bar{x} \in X, \bar{y} \in Y$: given $\bar{y} \in Y$, $\bar{x} = \bar{f}_x(\bar{y})$

Applying this to a composition $ f_x = \varphi^{(N)}_{v^{(N-1)}} \circ \, ...\, \circ\, \varphi^{(1)}_x$ where $\varphi^{(n)}$ are single assignments (chain rule!) is tangent/adjoint mode automatic differentiation. (neglecting all implementational details..)

Automatic/Algorithmic Differentiation

Example

\begin{equation} y = f_x = g \circ h_x \quad v = h_x \end{equation} by the chain rule, the tangent operator $\dot{f}_x$ is \begin{equation} \dot{y} = \dot{f}_x(\dot{x}) = \dot{g}_v(\dot{h}_x(\dot{x})) \end{equation} for the adjoint operator $\bar{f}_x$ we use \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{y}, \dot{g}_v(\dot{h}_x(\dot{x})) \rangle_Y = \langle \bar{g}_v (\bar{y}), \dot{h}_x(\dot{x}) \rangle_V = \langle \bar{h}_x(\bar{g}_v(\bar{y})), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*} and find \begin{equation} \bar{f}_x = \bar{h}_x(\bar{g}_v(\bar{y})) \end{equation}

Tangent mode

\begin{align*} &\text{foreach } j = 1, ... J \\ &\qquad \dot{x} \leftarrow e_j \\ &\qquad \dot{y} \leftarrow \dot{g}_v(\dot{h}_x(\dot{x}))\\ &\qquad \text{foreach } i = 1, ... I \\ &\qquad \qquad (Df)^{(i, j)} = \langle e_j, \dot{y} \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

Adjoint mode

\begin{align*} &\text{foreach } i = 1, ... I \\ &\qquad \bar{y} \leftarrow e_i \\ &\qquad \bar{x} \leftarrow \bar{h}_x(\bar{g}_v(\bar{y}))\\ &\qquad \text{foreach } j = 1, ... J \\ &\qquad \qquad (Df)^{(i, j)} = \langle \bar{x}, e_j \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

AD is even more systematic..

Automatic Algorithmic Differentiation

Every numerical program $f: \mathbb{R}^{n-1} \to \mathbb{R}^{m+1}$ can be decomposed into single assignments $(\varphi^{(1)}, ... \varphi^{(N)})$. \begin{align*} &(v^{(0)}, v^{(-1)}, ..., v^{(-n)})^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \quad \forall n = 1, ... N \\ &y = (v^{(N)}, v^{(N-1)}, ..., v^{(N-m)})^T \end{align*}

For every single assignment $\varphi^{(n)}_{(v^{(j)})_{j \prec n}}$ we know

tangent operator $\dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v}^{(j)})_{j \prec n})$ and
adjoint operator $\bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n})$

Tangent mode

\begin{align*} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &(\dot{v}^{(0)}, \dot{v}^{(-1)}, ...)^T = \dot{x} \\ &\hspace{-30px}\begin{cases} v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \\ \dot{v}^{(n)} = \dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v})_{j \prec n}) \end{cases} \small{\quad \forall n = 1, ... N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &\dot{y} = (\dot{v}^{(N)}, \dot{v}^{(N-1)}, ...)^T \end{align*}

Adjoint mode

\begin{alignat*}{2} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} &&\small{\quad \forall n = 1, ..., N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &(\bar{v}^{(N)}, \bar{v}^{(N-1)}, ...)^T = \bar{y} \\ &\bar{v}^{(n)} = \bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n}) &&\small{\quad \forall n = N-m-1, ..., -n}\\ &\bar{x} = (\bar{v}^{(0)}, \bar{v}^{(-1)}, ...)^T \end{alignat*}

Adjoint² Ingredients: Algorithmic Differentiation (AD)

Given a (Frechet-)differentiable operator $f: \mathcal{X} \to Y, x \mapsto f(x) =: y$ (open $\mathcal{X} \subseteq X$)

(typically in AD-literature: $X, Y$ finite dimensional Hilbert spaces)

Definition of the (Frechet-)derivative (tangent operator) $\dot{f}_x: T_xX \to T_yY$ by
\begin{equation} \lim_{||\dot{x}|| \to 0} \frac{||f(x + \dot{x}) - f(x) - \dot{f}_x(\dot{x})||}{||\dot{x}||} \quad \text{with } \dot{x} \in T_xX \end{equation}
Definition of the adjoint (tangent) operator ($T_xX$, $T_yY$ Hilbert spaces, $\dot{f}_x$ continuous, linear)
\begin{equation} \langle \bar{y}, \dot{f}_x(\dot{x}) \rangle_{T_y Y} = \langle \bar{f}_x(\bar{y}), \dot{x} \rangle_{T_x X} \quad \forall \dot{x} \in T_xX, \bar{y} \in T_yY \end{equation}

(notation adopted from AD)
Example: compute the Jacobian $Jf_x$ of $f: \mathbb{R}^I \to \mathbb{R}^J$
\begin{equation} [Jf_x]_{ji} = \underbrace{\langle \color{green}{e_j}, \color{orange}{\dot{f}_x(e_i)} \rangle}_{\small \text{tangent mode (AD)}} = \underbrace{\langle \color{green}{\bar{f}_x(e_j)}, \color{orange}{e_i} \rangle}_{\small\text{adjoint mode (AD)}} \end{equation}

Algorithmic Differentation Tools

automatic composition of "elemental" tangent/adjoint operators $\dot{f}_x(\cdot{})$/$\bar{f}_x(\cdot{})$ based on the chain rule to compute

tangent vectors: $\dot{y}, \dot{x}$
adjoint vectors: $\bar{x}, \bar{y}$

where $\langle \bar{y}, \dot{y} \rangle = \langle \bar{x}, \dot{x}\rangle$.


								using ChainRulesCore, Zygote
								function f(a, b)
									y = 2*a*a + b
									return y
								end
								
								function ChainRulesCore.rrule(::typeof(f), a, b)
									y = 2*a*a + b
									function f̄(ȳ)
										ā = 4*a * ȳ
										b̄ = ȳ
										return ZeroTangent(), ā, b̄
									end
									return y, f̄
								end
							
								Zygote.withgradient(f, 1.0, 2.0) # (val = 4.0, grad = (4.0, 1.0))

Griewank, A., & Walther, A. (2008). Evaluating Derivatives. doi:10.1137/1.9780898717761
Naumann, U. (2011). The Art of Differentiating Computer Programs. doi:10.1137/1.9781611972078
Pearlmutter, B. A., & Siskind, J. M. (2008). Reverse-mode AD in a functional framework. doi:10.1145/1330017.1330018
Blondel, M., et. al. (2021). Efficient and Modular Implicit Differentiation. doi:10.48550/ARXIV.2105.15183

Adjoint² Ingredients: Example: linear pde (weak form)

Given a solution operator of a linear pde $A: G \to H$ ($U, V, G, H$ Hilbert spaces $U\subseteq H$, $V\subseteq G$) \begin{equation} \color{orange}{A(g) = (u \in U | a(u, v) + \langle v, g\rangle_G = 0\quad \forall v \in V)} \end{equation}

Derivation: we introduce $\lambda \in V$ \begin{align*} \Sigma = \langle h, A(g) \rangle_H &= \langle h, u \rangle_H \\ &=\langle h, u \rangle_H + \color{orange}{\underbrace{a(u, \lambda) + \langle \lambda, g\rangle_G}_{=0 \, \forall \lambda \in U}} \\ &=\color{green}{\underbrace{\langle h, u \rangle_H + a(u, \lambda)}_{=0 \, \forall u \in U}} + \langle \lambda, g\rangle_G \\ & \hphantom{\, = \langle h, u \rangle_H + a(u, \lambda)} = \langle \lambda, g \rangle_G = \langle A^*(h), g \rangle_G = \Sigma \end{align*}

The adjoint operator $A^*: H \to G$ is \begin{equation} \color{green}{A^*(h) = (\lambda \in V | a(u, \lambda) + \langle h, u \rangle_H = 0 \quad \forall u \in U)} \end{equation}

"continuous adjoint method"
"derivation from Lagrangian / augmented functional"
"implicit differentiation"

Adjoint² Ingredients: Dual/Adjoint Consistency

A discretization is dual consistent, if:

"The adjoint of the discretized problem is a consistent discretization of the adjoint of the continuous problem."

Example (neglecting BC):

\begin{align} L u + g &= 0 \quad \forall x \in \mathcal{R} \\ \Sigma &= \int_{\mathcal{R}} h u \, dx \end{align}

\begin{equation} \text{continuous dual problem} \end{equation}

\begin{equation} \leftrightarrow \end{equation}

\begin{align} L^* \lambda + h &= 0 \quad \forall x \in \mathcal{R} \\ \Sigma &= \int_{\mathcal{R}} g \lambda \, dx \end{align}

\begin{equation} \updownarrow \text{ discretization } \end{equation}

\begin{equation} \updownarrow \text{ ? } \end{equation}

\begin{align} L_\Delta u_\Delta + g_\Delta &= 0 \\ \Sigma_\Delta &= \langle h_\Delta, u_\Delta \rangle \end{align}

\begin{equation} \text{discrete dual problem} \end{equation}

\begin{equation} \leftrightarrow \end{equation}

\begin{align} L^T_\Delta \lambda_\Delta + h_\Delta &= 0 \\ \Sigma_\Delta &= \langle \lambda_\Delta, g_\Delta \rangle \end{align}

interpretability: $\lambda_\Delta$ is the approximation of the continuous adjoint equation
"superconvergence" of linear functionals $|\Sigma - \Sigma_\Delta|$ (convergence order doubling)
no ambiguity between:
- "optimize-then-discretize" vs. "discretize-then-optimize"
- "continuous adjoint method" vs. "discrete adjoint method"

Examples:

ODE/FD-discretizations: explicit/implicit euler: (not dual consistent), implicit midpoint/trapeziodal rule (with suitable quadrature + staggered variables dual consistent)
FE-discretization: (with weak boundary conditions + suitable/no stabilization dual consistent)
DG/SBP (see literature)

Hicken, J. E., & Zingg, D. W. (2014). Dual consistency and functional accuracy: a finite-difference perspective. doi:10.1016/j.jcp.2013.08.014
Oliver, T. A., & Darmofal, D. L. (2009). Analysis of Dual Consistency for Discontinuous Galerkin Discretizations of Source Terms. doi:10.1137/080721467
Hartmann, R. (2007). Adjoint Consistency Analysis of Discontinuous Galerkin Discretizations. doi:10.1137/060665117
Giles, M. B., & Pierce, N. A. (2003). Adjoint Error Correction for Integral Outputs. doi:10.1007/978-3-662-05189-4_2

Applying Adjoints Twice

parameters $p \in P$
parameter-dependent bilinear form $a_p: U \times V (\times P) \to \mathbb{R}$
linear excitations $b^{(i)} : V \to \mathbb{R}$
linear extractions $c^{(j)}: U \to \mathbb{R}$
observables $\Sigma^{(ji)} \in \mathbb{R}$

tangent of $a_p$ $\dot{a}_p: U \times V \times P \to \mathbb{R}$
corres. adjoint $\bar{a}_p: U \times V \times \mathbb{R} \to P$
tangent spaces = spaces (?)
given tangent/adjoint vectors $\dot{p}$ and $\bar{\Sigma}^{(ji)}$ resp.

Model ($u^{(i)} \in U)$: \begin{align} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^{(j)}(u^{(i)}) \end{align}	1st "adjoint method" ($\lambda^{(j)} \in V$): \begin{align} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U \\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ \end{align}
Tangent/sensitivity model ($\dot{\lambda}^{(j)} \in V)$): \begin{align} a_p(v, \dot{\lambda}^{(j)}) + \dot{a}_p(v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U\\ \dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)}) \end{align}	2nd "adjoint method" ($\bar{\lambda}^{(j)} \in U$): \begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \dot{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{p}) \end{align} \begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), \dot{p} \rangle \end{align}
from AD ($\langle \bar{\Sigma}, \dot{\Sigma} \rangle = \langle \bar{p}, \dot{p} \rangle$): \begin{align} a_p(\underbrace{\bar{\Sigma}^{(ji)} v}_{=\bar{\lambda}^{(j)} \small(\text{with fixed } v)}, \dot{\lambda}^{(j)}) + \dot{a}_p(\bar{\Sigma}^{(ji)} v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \bar{\Sigma}^{(ji)} b^{(i)}(\dot{\lambda}^{(j)}) \end{align}

Applying Adjoints Twice: Summary

with objective function $g: \mathbb{R}^{J\times I} \to \mathbb{R}, \Sigma \mapsto g(\Sigma) = C$ and parameters $p \in P = \mathbb{R}^N$

non-adjoint forward + non-adjoint derivative

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^ {(j)}(u^{(i)}) \\ C &= g(\Sigma^{(\cdot{}\cdot{})}) \\ a_p(\dot{u}^{(in)}, v) + \dot{a}_p(u^{(i)}, v, e^{(n)}) &= 0 \quad \forall v \in V\\ \dot{\Sigma}^{(jin)} &= c^{(j)}(\dot{u}^{(in)}) \\ (\nabla_p C)^{(n)} &= \dot{g}_{\Sigma^{(\cdot{}\cdot{})}}(\dot{\Sigma}^{(\cdot{}\cdot{}n)}) \end{align*}

cost: $(I + IN)\times \mathcal{C}(a_p(\cdot{}, \cdot{})) + ... $

adjoint forward + adjoint derivative

\begin{align} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U\\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ C &= g(\Sigma^{(\cdot{}\cdot{})}) \\ \bar{\Sigma}^{(\cdot{}\cdot{})} &= \bar{g}_{\Sigma^{(\cdot{}\cdot{})}}(\bar{C})\\ a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ (\nabla_p C)^{(n)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), e^{(n)}\rangle \end{align}

cost: $2J \times \mathcal{C}(a_p(\cdot{}, \cdot{})) + ... $

Generalization:

parameter-dependent $b_p^{(i)}(v)$ and $c_p^{(j)}(u)$ \begin{equation} \bar{p} = \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1) + \bar{c}_p^{(j)}(\bar{\lambda}^{(j)}, 1) + \bar{b}^{(i)}_p(\lambda^{(j)}, \bar{\Sigma}^{(ji)}) \end{equation}

Assumptions:

$\mathcal{C}(a_p(\cdot{}, \cdot{})) \gg \mathcal{C}(b^{(i)}), \mathcal{C}(c^{(j)}), ...$
$a_p(\cdot{}, \cdot{})$ cannot be solved directly (e.g. LU)
$I > J$, $N > J$

Example: An inverse problem based on the Poisson equation

model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ &C = \frac{1}{2 IJ}\sum_{i, j=1}^{I,J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align*}

$m, h^{(j)} \in L^2(\mathcal{R})$, $g^{(i)} \in L^2(\partial \mathcal{R})$
weak enforcement of boundary conditions $u, v \in H^1_h(\mathcal{R})$
Aubin-Nitsche trick: stable bilinear form \[a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in H^1_h(\mathcal{R})\]
$I, N \gg J$ (many bc $g^{(i)}$, many parameters $p \in \mathbb{R}^N$, some extractions $h^{(j)}$)
here: $I=200$, $N=2972$, $J=7$
FE-framework: Gridap.jl, Linear Solver: gmres + ilu

solutions $u^{(i)}$		measurements $\Sigma^{(i, j)}$

Derivation: An inverse problem based on the Poisson equation

weak forms

\begin{align*} a_p(u, v) &= \int_{\mathcal{R}} m_p \nabla u \cdot{} \nabla v dx - \int_{\partial \mathcal{R}} m_p \nabla_n u v + u \nabla_n v \, d\Gamma + \int_{\partial \mathcal{R}} \alpha_h u v \, d \Gamma \\ b^{(i)}(v) &= \int_{\partial \mathcal{R}} \nabla_n v g^{(i)} \, d\Gamma - \int_{\partial \mathcal{R}} \alpha_h g^{(i)} v \, d\Gamma\\ c^{(j)}(u) &= \int_{\mathcal{R}} h^{(j)} u \, dx \end{align*}

tangent and adjoint operators

\begin{align*} \dot{a}_p(u, v, \dot{p}) &= \int_{\mathcal{R}} \dot{m}_p(\dot{p}) \nabla u \cdot{} \nabla v \,d x - \int_{\partial \mathcal{R}} \dot{m}_p(\dot{p}) \nabla_n u v \,d \Gamma \\ \bar{a}_p(u, v, \bar{C}) &= \bar{C}\int_{\mathcal{R}} \partial_p m_p \nabla u \cdot{} \nabla v \,d x - \bar{C}\int_{\partial \mathcal{R}} \partial_p m_p \nabla_n u v \,d \Gamma \end{align*} where $\partial_p m_p$ is the parameter-gradient of the field $m$ defined by \begin{equation} \dot{m}_p(\dot{p})(x) = \langle \partial_p m_p(x), \dot{p} \rangle \end{equation}

Example: Derivation

adjoint derivative

\begin{align*} &a_m(\bar{\lambda}^{(j)}, \dot{\lambda}) + \bar{\boldsymbol \Sigma}^{(j)T} \boldsymbol{b}(\dot{\lambda}) = 0 \quad \forall \dot{\lambda} \\ &\int_{\mathcal{R}} m \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \dot{\lambda} dx - \int_{\partial \mathcal{R}} m \nabla_n \bar{\lambda}^{(j)} \dot{\lambda} + \bar{\lambda}^{(j)} \nabla_n \dot{\lambda} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha \bar{\lambda}^{(j)} \dot{\lambda} \, d \Gamma \\ &+ \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda} \sum_{i=1}^{I}(\bar{\Sigma}^{(i, j)} g^{(i)}) \, d\Gamma - \int_{\partial \mathcal{R}} \alpha \sum_{i=1}^{I} (\bar{\Sigma}^{(i, j)} g^{(i)}) \dot{\lambda} \, d\Gamma = 0 \quad \forall \dot{\lambda} \end{align*}

adjoint derivative (strong form)

\begin{align*} \begin{cases} - \nabla m \cdot{} \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

tangent model

\begin{align*} \dot{C} = \sum_{j=1}^{J} \dot{a}_m(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{m}) = \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx - \int_{\partial \mathcal{R}} \dot{m} \nabla_n \bar{\lambda}^{(j)} \underbrace{\lambda^{(j)}}_{=0} \, d \Gamma \end{align*}

adjoint model

\begin{align*} \langle \bar{C}, \sum_{j=1}^{J} \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx \rangle = \langle \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)}, \dot{m} \rangle_{L^2(\mathcal{R})} \\ \bar{a}_m(\bar{\lambda}^{(:)}, \lambda^{(:)}, \bar{C}) = \sum_{j=1}^{J} \bar{C} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \end{align*}

Example: Derivation

adjoint parametrization

\begin{align*} \langle \bar{m}, \dot{m} \rangle_{L^2(\mathcal{R})} = \langle \bar{m}, \dot{\gamma}_p(\dot{p}) \rangle_{L^2(\mathcal{R})} = \langle \int_{\mathcal{R}} \frac{\partial \gamma_p}{\partial p} \bar{m} \, dx, \dot{p} \rangle_{\mathbb{R}} \end{align*}

Example: An inverse problem based on the Poisson equation

measurements $\Sigma^{(ji)}$

non-adjoint model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(ji)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ \end{align*}

$I = 200$ excitations
$J = 7$ extractions
$N = 2972$ grid cells

solution $u^{(i)}$

$\approx 100ms (\approx 200\times 0.5ms + \text{...})$

$\approx 5.5ms (\approx 7\times 0.5ms + \text{...})$

adjoint model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla \lambda^{(j)} = -h^{(j)} &\forall x \in \mathcal{R} \\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\Sigma^{(ji)} = \int_{\partial \mathcal{R}} \nabla_n \lambda^{(j)} g^{(i)} \, d x\\ \end{align*}

strong form not required for implementation
riesz representation $\nabla_n \lambda$ has all information to compute measurements for arbitrary $g^{(i)}$

adjoint solution $\lambda^{(j)}$

riesz representation $\nabla_n \lambda^{(j)}$

Example: An inverse problem based on the Poisson equation

gradient $\nabla_m C$

non-adjoint derivative

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla \dot{\lambda}^{(j)} = \nabla \cdot{} \dot{m} \nabla \lambda^{(j)} &\forall x \in \mathcal{R} \\ \dot{\lambda}^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\dot{\Sigma}^{(ji)} = \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda}^{(j)} g^{(i)} \, d x\\ &\dot{C} = \frac{1}{IJ} \sum_{i, j=1}^{I,J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)})\dot{\Sigma}^{(ji)} \end{align*}

$m(x) = 0.5$

tangent solution $\dot{\lambda}^{(j)}$

$\approx 16.6s (\approx 2972 \times 5.5ms + \text{...})$
$\approx 4\text{min }30s (\approx 2972 \times 200 \times 0.5ms + \text{...})$

$\approx 14.6ms (\approx 2\times 5.5ms + \text{...})$

adjoint derivative

\begin{align*} &\bar{\Sigma}^{(ji)} = \frac{1}{I J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)}) \bar{C} \\ &\begin{cases} -\nabla \cdot{} m \nabla \bar{\lambda}^{(j)} = 0 &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(ji)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\bar{m} = \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \\ \end{align*}

adjoint solution $\bar{\lambda}^{(j)}$

Taylor remainder

Example: An inverse problem based on the Poisson equation

runtime: $5s$
without adjoint sensitivity: $\approx 1h$
without adjoint$^2$: $\approx 11\text{h } 45\text{min}$

$m$-parametrization

\begin{align} v^{[0]} &= (x, y)^T \\ v^{[1]} &= \text{relu}(W^{[1]} \cdot{} v^{[0]} + b^{[1]}) \\ v^{[2]} &= \text{relu}(W^{[2]} \cdot{} v^{[1]} + b^{[2]}) \\ v^{[3]} &= \text{softmax}(W^{[3]} \cdot{} v^{[2]} + b^{[3]}) \\ m &= (0.1, 0.4, 0.9) \cdot{} v^{[3]} \end{align}

possible application

transformation to "phase classification problem"
from SciML: neural network as parametrization $m(p)$
the parametrization $m(p)$ "regularizes"
number of parameters: $1903$
no "inverse crime":
- $2972$ vs $72965$ grid cells
- additive noise $\mathcal{N}(\mu=0, \sigma=1\times 10^{-5})$
L-BFGS (Optim.jl), NN (Lux.jl), AD (Zygote.jl) + custom adjoint

measurements $\Sigma^{(i, j)}$ and $\tilde{\Sigma}^{(i, j)}$

optimized material $m$

true material

"neural networks bias towards sparse solutions"
convergence dependes on the inital guess (does not always converge)
taylor the parametrization to specific problems (layers, other geometries, phase values, e.g.)

Material Reconstruction in EPMA

radiative transfer equation

($\psi^{(i)}$: electron fluence)

\begin{align*} \Omega \cdot{} \nabla_x \psi + \Sigma^{\text{tot}} \psi &= \int_{\mathbb{R}^+} \int_{S^2} \Sigma^{\text{scat}} \psi \, d\Omega' \, d\epsilon' \quad \forall (x, \epsilon, \Omega) \in \mathcal{R} \times \mathcal{E} \times S^2 \\ \end{align*} \begin{align*} \psi &= g_{\text{beam}}^{(i)} \quad \forall n\cdot{}\Omega < 0 \\ \end{align*}

observations

$\Sigma^{(ji)} = \int_{\mathcal{R}} \Sigma^{\text{x-ray}, (j)}\psi^{(i)} \, d \Gamma$

objective

$C = \frac{1}{IJ} \sum_{i,j=1}^{I, J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)})^2$

additivity approximation

\begin{equation} \Sigma^{\star}(x, \epsilon, \epsilon', \Omega, \Omega') = \sum_{e=1}^{n_e} \mathcal{N}_e(x) \sigma_e^{\star}(\epsilon, \epsilon', \Omega, \Omega') \end{equation}

phase space

discretization

$\Omega$: spherical harmonics basis ($P_N$)
$x$: mixed Lagrange basis ($L^2_h, H^1_h$)
$\epsilon$: staggered implicit trapeziodal/midpoint rule
Schur complement (eliminates odd $^-$ variables)
MINRES: solves remaining symmetric system
"matrix-free" method
"kronecker trick" $(B^T \otimes A)\text{vec}(X) = AXB$
dual consistent discretization

implementation in julia using: CUDA.jl, Gridap.jl, Krylov.jl, Optim.jl, Lux.jl, ...

Egger, H., & Schlottbom, M. (2012). A Mixed Variational Framework For The Radiative Transfer Equation. doi:10.1142/s021820251150014x
Egger, H., & Schlottbom, M. (2014). Numerical methods for parameter identification in stationary radiative transfer. doi:10.1007/s10589-014-9657-9
Egger, H., & Schlottbom, M. (2016). A class of Galerkin Schemes for Time-Dependent Radiative Transfer. doi:/10.1137/15M1051336

Material Reconstruction in EPMA

radiative transfer equation

($\psi^{(i)}$: electron fluence)

observations

$\Sigma^{(ji)} = \int_{\mathcal{R}} \Sigma^{\text{x-ray}, (j)}\psi^{(i)} \, d \Gamma$

objective

$C = \frac{1}{IJ} \sum_{i,j=1}^{I, J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)})^2$

additivity approximation for scattering kernels $\Sigma^\star$

\begin{equation} \Sigma^{\star}(x, \epsilon, \epsilon', \Omega, \Omega') = \sum_{e=1}^{n_e} \mathcal{N}_e(x) \sigma_e^{\star}(\epsilon, \epsilon', \Omega, \Omega') \end{equation}

continuous slowing down (csd)-approximation

\begin{align} \Sigma \psi - \int_{\mathcal{E}} \Sigma \psi \, d \epsilon \approx -\partial_{\epsilon}( s \psi ) \\ \psi(\epsilon=\epsilon_{\text{max}}) = 0 \end{align}

discretization

even/odd splitting \begin{equation} \psi^\pm(\cdot{}, \cdot{}, \Omega) = \frac{\psi(\cdot{}, \cdot{}, \Omega) \pm \psi(\cdot{}, \cdot{}, -\Omega)}{2} \end{equation}
weak form of radiative transfer \begin{align} \begin{split}\label{eq:linear_boltzmann_csd_biliner_form} a_p(\psi, \phi) = \textstyle -\langle\partial_\epsilon (s_p \psi) \phi \rangle &+ \langle \Omega \cdot{} \nabla_x \psi^+ \phi^- - \psi^- \Omega \cdot{} \nabla_x \phi^+ \rangle \\ &+ \textstyle[|n \cdot{} \Omega| \psi^+ \phi^+]_{\partial_x} + \langle \Sigma^{\text{el,tot}}_p \psi \phi - \int_{S^2} \Sigma^{\text{el}}_p \psi \, d \Omega' \phi \rangle \end{split} \end{align} \begin{align} b^{(i)}(\phi) &= 2 [n \cdot{} \Omega g^{(i)} \phi^+ ]_{\partial_x^-}. \end{align} \begin{align} c^{(j)}_p(\psi) &= \langle \mathcal{N}_j \sigma^{\text{x-ray}}_j \psi^+ \rangle. \end{align}

$\Omega$: spherical harmonics basis ($P_N$)
$x$: mixed Lagrange basis ($L^2_h, H^1_h$)
$\epsilon$: staggered implicit trapeziodal/midpoint rule
Schur complement (eliminates odd $^-$ variables)
MINRES: solves remaining symmetric system
"matrix-free" method
"kronecker trick" $(B^T \otimes A)\text{vec}(X) = AXB$
dual consistent discretization

implementation in julia using: CUDA.jl, Gridap.jl, Krylov.jl, Optim.jl, Lux.jl, ...

Egger, H., & Schlottbom, M. (2012). A Mixed Variational Framework For The Radiative Transfer Equation. doi:10.1142/s021820251150014x
Egger, H., & Schlottbom, M. (2014). Numerical methods for parameter identification in stationary radiative transfer. doi:10.1007/s10589-014-9657-9
Egger, H., & Schlottbom, M. (2016). A class of Galerkin Schemes for Time-Dependent Radiative Transfer. doi:/10.1137/15M1051336

Derivation: Material Reconstruction in EPMA

\begin{equation} \psi(x, \epsilon_i, \Omega) \approx \underbrace{\begin{pmatrix}\mathcal{X}^+(x)^T & \mathcal{X}^-(x)^T\end{pmatrix}}_{=\mathcal{X}(x)^T} \begin{pmatrix} \Psi^+_i & 0 \\ 0 & \Psi^-_i\end{pmatrix} \underbrace{\begin{pmatrix} \Upsilon^+(\Omega) \\ \Upsilon^-(\Omega) \end{pmatrix}}_{=\Upsilon(\Omega)}, \end{equation}

\begin{align} \boldsymbol{N}_e^\pm &= \int_{\mathcal{R}} \mathcal{X}^\pm \mathcal{N}_e (\mathcal{X}^\pm)^T \, d x \in \mathbb{R}^{n_x^\pm \times n_x^\pm}\\ \boldsymbol{\nabla}_d &= \int_{\mathcal{R}} \nabla_{x_d} \mathcal{X}^+ (\mathcal{X}^-)^T \, d x \in \mathbb{R}^{n_x^+ \times n_x^-}\\ \boldsymbol{\partial}_d &= \int_{\partial \mathcal{R}} |n_d| \mathcal{X}^+ (\mathcal{X}^+)^T \, d \Gamma \in \mathbb{R}^{n_x^+ \times n_x^+} \end{align}

\begin{align} \boldsymbol{I}^\pm &= \int_{S^2} \Upsilon^\pm (\Upsilon^\pm)^T \, d \Omega = I \in \mathbb{R}^{n_\Omega^\pm \times n_\Omega^\pm} \\ \boldsymbol{\Omega}_d &= \int_{S^2} \Omega_d \Upsilon^- (\Upsilon^+)^T \, d \Omega \in \mathbb{R}^{n_\Omega^- \times n_\Omega^+}\\ \boldsymbol{|\Omega|}_d &= \int_{S^2} |\Omega_d| \Upsilon^+ (\Upsilon^+)^T \, d \Omega \in \mathbb{R}^{n_\Omega^+ \times n_\Omega^+} \end{align}

\begin{equation} \boldsymbol{k}_e^\pm = \int_{S^2} \int_{S^2} k_e(\Omega \cdot{} \Omega') \Upsilon^\pm \, d \Omega' (\Upsilon^\pm)^T \, d \Omega \in \mathbb{R}^{n_\Omega^\pm \times n_\Omega^\pm} \end{equation}

\begin{align} \begin{split} \boldsymbol{a}(\Psi, \Phi) = \sum_{i=0}^{N-1} \,& [-\boldsymbol{N}_e^+ \frac{s_e^{i+1} \Psi^+_{i+1} - s_e^{i}\Psi^+_{i}}{\Delta \epsilon} \boldsymbol{I}^+ -\boldsymbol{\nabla}_d \frac{\Psi^-_{i} + \Psi^-_{i+1}}{2} \boldsymbol{\Omega}_d +\boldsymbol{\partial}_d \frac{\Psi^+_{i} + \Psi^+_{i+1}}{2} \boldsymbol{|\Omega|}_d \\ &\quad +\boldsymbol{N}_e^+ \frac{\tau_e^{i}\Psi^+_{i} + \tau_e^{i+1}\Psi^+_{i+1}}{2} (\boldsymbol{I}^+ - \boldsymbol{k}_e^+)]\cdot{} \Phi^+_{i + \frac{1}{2}}\Delta \epsilon\\ +\,&[-\boldsymbol{N}_e^- \frac{s_e^{i+1} \Psi^-_{i+1} - s_e^{i}\Psi^-_{i}}{\Delta \epsilon} \boldsymbol{I}^- +\boldsymbol{\nabla}^T_d \frac{\Psi^+_{i} + \Psi^+_{i+1}}{2} \boldsymbol{\Omega}^T_d \\ &\quad +\boldsymbol{N}_e^- \frac{\tau_e^{i}\Psi^-_{i} + \tau_e^{i+1}\Psi^-_{i+1}}{2}(\boldsymbol{I}^- - \boldsymbol{k}_e^-)] \cdot \Phi^-_{i+\frac{1}{2}}\Delta \epsilon \end{split} \end{align} \begin{align} \boldsymbol{b}^{(j)}(\Phi) \, = \sum_{i=0}^{N-1}\frac{\boldsymbol{g}^+_{i} + \boldsymbol{g}^+_{i+1}}{2} \cdot{} \Phi_{i+\frac{1}{2}}^+\Delta \epsilon \text{ with } \boldsymbol{g}^+_{i} = 2 \int_{\partial \mathcal{R}} \int_{n \cdot{} \Omega < 0} n \cdot{} \Omega \, g^{(j)}(x, \epsilon_i, \Omega) \Upsilon^+(\Omega) \mathcal{X}^+(x)^T \, d \Omega \, d \Gamma \end{align} \begin{align} \boldsymbol{c}^{(j)}(\Psi) \, = \sum_{i=0}^{N-1} \frac{ \boldsymbol{h}^+_{i} \cdot{} \Psi^+_{i} + \boldsymbol{h}^+_{i+1}\cdot{}\Psi^+_{i+1}}{2} \Delta \epsilon \text{ with } \boldsymbol{h}^+_{i} = \int_{\mathcal{R}} \int_{S^2} \mathcal{N}_j(x) \sigma^{\text{x-ray}}(\epsilon_i) \Upsilon^+(\Omega) \mathcal{X}^+(x)^T \, d \Omega \, d x. \end{align}

Material Reconstruction in EPMA

measurements $\Sigma^{(ji)}$

non-adjoint model

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \, \forall v \in V \\ \Sigma^{(ji)} &= c_p^{(j)}(u^{(i)}) \end{align*}

$P_{21}$ moments ($253$ moments)
$40 \times 120$ spatial grid
$20$ energy steps
$2(\epsilon) \times 71(x) \times 3(\Omega) = 426 \text{ beams}$
$2$ elements

solution $\int_{S^2}\int_{\mathcal{E}} u^{(i)} \, d\epsilon \,d\Omega$

$\approx 7\text{min } 6\text{s} (\approx 426\times 1\text{s} + \text{...})$

$\approx 2s (\approx 2\times 1\text{s} + \text{...})$

adjoint model

\begin{align*} a_p(v, \lambda^{(j)}) + c_p^{(j)}(v) = 0 \, \forall v \in U \\ \Sigma^{(ji)} = b^{(i)}(\lambda^{(j)}) \end{align*}

adjoint solution $\int_{S^2}\int_{\mathcal{E}} \lambda^{(j)} \, d\epsilon \,d\Omega$

riesz representation $\int_{S^2} \lambda^{(j)} \,d\Omega$

Material Reconstruction in EPMA

gradient $\Sigma^{(ji)}$

non-adjoint derivative

\begin{align*} a_p(v, \dot{\lambda}^{(j)}) +& \dot{a}_p(v, \lambda^{(j)}, \dot{p}) \\ &+ \dot{c}_p(v, \dot{p}) = 0 \\ \dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)}) \end{align*}

parameter dependent $c$
$\mathcal{N}_e(x) = 0.5$

tangent $\int_{S^2}\int_{\mathcal{E}} \dot{\lambda}^{(j)} \, d\epsilon \,d\Omega$

$\approx 2\text{h } 40\text{min} (\approx 4800\times 2\text{s} + \text{...})$
$\approx 23\text{d } 16\text{h} (\approx 4800\times 7\text{min } 6\text{s} + \text{...})$

$\approx 5s (\approx 2\times 2\text{s} + \text{...})$

adjoint derivative

\begin{align*} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ij)}b^{(i)}(v) = 0 \, \forall v \in U \\ \bar{p} = \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1) + \bar{c}_p(\bar{\lambda}^{(j)}, 1) \end{align*}

adjoint solution $\int_{S^2}\int_{\mathcal{E}} \bar{\lambda}^{(j)} \, d\epsilon \,d\Omega$

Taylor remainder

Material Reconstruction in EPMA

measurements $\Sigma^{(ji)}$

optimized material $\mathcal{N}_e(x)$

EPMA Outlook

more realistic physics (scattering cross section, x-ray absorption, secondary flourescence)
more realistic test-cases (multiple phases)
uncertainty quantification of the inversion result (Hessian)
improve radiative transfer solver (dynamical low-rank)
3D?

References

J. Bünger: Three-dimensional modelling of x-ray emission in electron probe microanalysis based on deterministic transport equations. doi:10.18154/RWTH-2021-05180
T. Bui-Tanh: Adjoint and Its roles in Sciences, Engineering, and Mathematics: A Tutorial. doi:10.48550/arXiv.2306.09917
J.A. Halbleib and J.E. Morel: Adjoint Monte Carlo Electron Transport in the Continuous-Slowing-Down-Approximation. doi:10.1016/0021-9991(80)90106-0

Slides + Pluto.jl notebook (Poisson example)

github.com/tam724

Model ($u^{(i)} \in U)$: \begin{align} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^{(j)}(u^{(i)}) \end{align}	1st "adjoint method" ($\lambda^{(j)} \in V$): \begin{align} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U \\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ \end{align}
Tangent/sensitivity model ($\dot{\lambda}^{(j)} \in V)$): \begin{align} a_p(v, \dot{\lambda}^{(j)}) + \dot{a}_p(v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U\\ \dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)}) \end{align}	2nd "adjoint method" ($\bar{\lambda}^{(j)} \in U$): \begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \dot{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{p}) \end{align} \begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), \dot{p} \rangle \end{align}
from AD ($\langle \bar{\Sigma}, \dot{\Sigma} \rangle = \langle \bar{p}, \dot{p} \rangle$): \begin{align} a_p(\underbrace{\bar{\Sigma}^{(ji)} v}_{=\bar{\lambda}^{(j)} \small(\text{with fixed } v)}, \dot{\lambda}^{(j)}) + \dot{a}_p(\bar{\Sigma}^{(ji)} v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \bar{\Sigma}^{(ji)} b^{(i)}(\dot{\lambda}^{(j)}) \end{align}

Applying Adjoints Twice: Efficient Gradients for Inverse Modeling with Radiative Transfer in EPMA

Motivation: Electron Probe Microanalysis (EPMA)

Gradients in Inverse Problems

classical approach

bayesian approach

Motivation: Matrix Product Bracketing

Where to bracket?

Generalizations:

Two equivalent formulations

Motivation: Intepreting Adjoint AD as a Riesz Representation

Riesz Representation Theorem (Definition Adjoint Operator)

An Abstract Adjoint Method in a Computational Context

Non-adjoint Implementation

Adjoint Implementation

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Definition: Adjoint Operator

Examples of (real) adjoint operators

Scalar Multiplication

Matrix Multiplication

Integration

Examples of (real) adjoint operators

Linear Solve

Examples of (real) adjoint operators

Weak form

Derivative Notation (from Algorithmic Differentiation)

Automatic/Algorithmic Differentiation

Example

Tangent mode

Adjoint mode

Automatic Algorithmic Differentiation

Tangent mode

Adjoint mode

Adjoint2 Ingredients: Algorithmic Differentiation (AD)

Algorithmic Differentation Tools

Adjoint2 Ingredients: Example: linear pde (weak form)

Adjoint2 Ingredients: Dual/Adjoint Consistency

Example (neglecting BC):

Examples:

Applying Adjoints Twice

Model ($u^{(i)} \in U)$:

1st "adjoint method" ($\lambda^{(j)} \in V$):

Tangent/sensitivity model ($\dot{\lambda}^{(j)} \in V)$):

2nd "adjoint method" ($\bar{\lambda}^{(j)} \in U$):

from AD ($\langle \bar{\Sigma}, \dot{\Sigma} \rangle = \langle \bar{p}, \dot{p} \rangle$):

Applying Adjoints Twice: Summary

non-adjoint forward + non-adjoint derivative

adjoint forward + adjoint derivative

Generalization:

Assumptions:

Example: An inverse problem based on the Poisson equation

model

solutions $u^{(i)}$

measurements $\Sigma^{(i, j)}$

Derivation: An inverse problem based on the Poisson equation

weak forms

tangent and adjoint operators

Example: Derivation

adjoint derivative

adjoint derivative (strong form)

tangent model

adjoint model

Example: Derivation

adjoint parametrization

Example: An inverse problem based on the Poisson equation

measurements $\Sigma^{(ji)}$

non-adjoint model

solution $u^{(i)}$

adjoint model

adjoint solution $\lambda^{(j)}$

riesz representation $\nabla_n \lambda^{(j)}$

Example: An inverse problem based on the Poisson equation

gradient $\nabla_m C$

non-adjoint derivative

tangent solution $\dot{\lambda}^{(j)}$

adjoint derivative

adjoint solution $\bar{\lambda}^{(j)}$

Taylor remainder

Example: An inverse problem based on the Poisson equation

$m$-parametrization

Adjoint² Ingredients: Algorithmic Differentiation (AD)

Adjoint² Ingredients: Example: linear pde (weak form)

Adjoint² Ingredients: Dual/Adjoint Consistency