Applying Adjoints Twice: Efficient Gradients for Inverse Modeling with Radiative Transfer in EPMA

Tamme Claus1, Gaurav Achuda2, Silvia Richter2, Manuel Torrilhon1

1 ACoM, Applied and Computational Mathematics, RWTH Aachen University

2 GFE, Central Facility for Electron Microscopy, RWTH Aachen University

IGPM Seminar 2025, RWTH Aachen

23.01.2025

Motivation: Electron Probe Microanalysis (EPMA)

"Material imaging based on characteristic x-ray emission"

  • Material Science, Quality control, Mineralogy, Semiconductors
  • Ionization of material sample using focused electron beam
  • Wavelength-dispersive spectrometers measure characteristic x-radiation

Microprobe at GFE (source: gfe.rwth-aachen.de)

Material
Interaction Volume
  • Example Material: two-phase (A, B)
  • Electron propagation affected by scattering (animated: different energies)
  • Ionization and emission of characteristic x-rays (wavelength of A-rays and B-rays differ)
  • Spatial information gained via different beam positions (linescan), beam angles and beam energies
  • inverse problem of material reconstruction
    \begin{equation} p^* = \underset{p \in P}{\text{arg min}} \underbrace{|| \Sigma(p) - \tilde{\Sigma} ||^2}_{=C(p)} \end{equation}
  • gradient-based iterative methods (Steepest Descent, BFGS, or HMC)
  • while (∇ₚC != 0) foreach δp foreach beam solve_pde() end end end
  • we focus on computing $\nabla_p C(p)$ efficiently via adjoints

Gradients in Inverse Problems

"Determine the model parameters $p$ such that the model $\Sigma(p)$ produces the observed data $\tilde{\Sigma}$."

classical approach

\begin{align*} p^* = \underset{p \in P}{\text{arg min}} || \Sigma(p) - \tilde{\Sigma} ||^2 \end{align*}
  • iterative approximation of the minimum based on the gradient of the objective function $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2$
  • Gradient Descent, CG, BFGS, ...
  • regularization: addition of a penalty term $p \mapsto ||\Sigma(p) - \tilde{\Sigma}||^2 + \mathcal{R}(p)$
  • we will do: $\Sigma = \Sigma \circ \gamma$

bayesian approach

  • approximate the posterior $\pi(p|\tilde{\Sigma})$ (typically: iterative sampling, MCMC)
  • Hamiltonian Monte-Carlo (based on the gradient of the target density)

Efficient evaluation of the objective function and its gradient is crucial!

Motivation: Matrix Product Bracketing

Consider Matrix Multiplication ($N \gg I, J$):
\[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{green}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{white}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{green}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{white}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{white}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{orange}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{white}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{orange}{\Bigg)} \] \[ \underbrace{\begin{pmatrix} & & \\ & \Sigma^{(ji)} & \\ & & \end{pmatrix}^{\color{white}{T}}}_{J \times I} = \color{green}{\Bigg(} \underbrace{\begin{pmatrix} — & h^{(1)} & — \\ — & h^{(j)} & — \\ — & h^{(J)} & — \\ \end{pmatrix}}_{J \times N} \cdot{} \color{orange}{\Bigg(} \underbrace{\boldsymbol{A}^{\color{white}{*}}}_{N \times N} \color{green}{\Bigg) } \cdot{} \underbrace{ \begin{pmatrix} | & | & | \\ g^{(1)} & g^{(i)} & g^{(I)} \\ | & | & | \\ \end{pmatrix}}_{N \times I} \color{orange}{\Bigg)} \]

Where to bracket?

computational costs: $\color{orange}{I\times N^2 + IJ\times N} \text{ or } \color{green}{J\times N^2 + IJ\times N}$

Generalizations:

  • $\boldsymbol A$ is solution of a linear system / linear (partial) differential equation? \[ \boldsymbol A g^{(i)} = (u \in U \, | \, a(u, v) + \langle g^{(i)}, v \rangle = 0 \quad \forall v \in V) \]
  • $\boldsymbol A$ is a (Frechet-) derivative (structure: chain rule, product rule, ...)? \[ \boldsymbol A g^{(i)} = \frac{\partial f(y(x))}{\partial x} g^{(i)} = \frac{\partial f(y(x))}{\partial y}\frac{\partial y(x)}{\partial x}g^{(i)} \]

Two equivalent formulations

using the adjoint operator $\boldsymbol A ^*$.

An Abstract Adjoint Method in a Computational Context

Definition of the adjoint operator ($H, G$ Hilbert spaces, $A: G \to H$ continuous, linear operator) \[ \langle h, A(g) \rangle_H = \langle A^*(h), g \rangle_G \quad \forall h \in H \,, g \in G \]

  • given multiple $g^{(1)}, ... g^{(I)} \in G$ and $h^{(1)}, ... h^{(J)} \in H$ we want to compute
\[ \Sigma^{(ji)} = \langle h^{(j)}, A(g^{(i)}) \rangle_H = \langle A^*(h^{(j)}), g^{(i)} \rangle_G \quad \forall i=1, ...I\, j=1, ...J \]

Non-adjoint Implementation

\begin{align*} &\text{foreach } \color{orange}{i = 1, ..., I} \text{ do} \\ &\qquad \color{orange}{v^{(i)} \leftarrow A(g^{(i)})}\\ &\qquad \text{foreach } \color{green}{j = 1, ..., J} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{h^{(j)}}, \color{orange}{v^{(i)}} \rangle_H\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • cost: $\color{orange}{I} \times \mathcal{C}(A) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$

Adjoint Implementation

\begin{align*} &\text{foreach } \color{green}{j = 1, ..., J} \text{ do} \\ &\qquad \color{green}{\lambda^{(j)} \leftarrow A^*(h^{(j)})}\\ &\qquad \text{foreach } \color{orange}{i = 1, ..., I} \text{ do}\\ &\qquad \qquad \Sigma^{(ji)} \leftarrow \langle \color{green}{\lambda^{(j)}}, \color{orange}{g^{(i)}} \rangle_G\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • cost: $\color{green}{J} \times \mathcal{C}(A^*) + \color{orange}{I}\color{green}{J} \times \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$
  • typically: $\mathcal{C}(A) \approx \mathcal{C}(A^*)$ and $\mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G) \approx \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_H)$ and $\mathcal{C}(A) \gg \mathcal{C}(\langle \cdot{}, \cdot{} \rangle_G)$
    • $\color{green}{J} > \color{orange}{I}$: non-adjoint implementation
    • $\color{orange}{I} > \color{green}{J}$: adjoint implementation
Instead of computing/approximating the "solution" $v^{(i)}= A(g^{(i)})$, we approximate the Riesz representation $\lambda^{(j)} \in G$ of the linear functional $\cdot \mapsto \langle h^{(j)}, A(\cdot{}) \rangle_H$

Riesz Representation Theorem/Definition of the Adjoint

Riesz Representation Theorem

Let $(H, \langle \cdot{}, \cdot{} \rangle_H)$ be a (real) Hilbert space with inner product $\langle \cdot{}, \cdot{} \rangle_H$. For every continuous linear functional $f_h \in H^*$ (the dual of $H$), there exists a unique vector $h \in H$, called the Riesz Representation of $f_h$, such that \[ f_h(x) = \langle h, x \rangle_H \quad \forall x \in H. \]

Definition: Adjoint Operator

Let $(G, \langle \cdot{}, \cdot{} \rangle_G)$ be another (real) Hilbert space, $A:G\to H$ a continuous linear operator between $G$ and $H$ and $f_h \in H^*$ a continuous linear functional. \[ f_h(A(g)) = \langle h, A(g) \rangle_H \quad \forall g \in G \] But $f_h(A(g))$ is also a continuous linear functional $f_\lambda(g)$ in $G$ with a Riesz Representation $\lambda \in G$ \[ f_h(A(g)) = f_\lambda(g) = \langle \lambda, g \rangle_G := \langle A^*(h), g \rangle_G. \]

Examples of (real) adjoint operators

Scalar Multiplication

  • $G, H = \mathbb{R}$
  • $A: \mathbb{R} \to \mathbb{R}, g \mapsto Ag = a \cdot{} g,\quad a \in \mathbb{R}$
  • $A^*: \mathbb{R} \to \mathbb{R}, \mu \mapsto A^* \mu = a \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}} = \langle \mu, a \cdot{} g \rangle_{\mathbb{R}} = \langle a \cdot{} \mu, g \rangle_{\mathbb{R}} = \langle A^*\mu , g \rangle_{\mathbb{R}} \quad \forall g \in \mathbb{R}\, \forall \mu \in \mathbb{R} \end{equation*}

Matrix Multiplication

  • $G = \mathbb{R}^n, H = \mathbb{R}^m$
  • $A: \mathbb{R}^n \to \mathbb{R}^m, g \mapsto Ag = M \cdot{} g, \quad M \in \mathbb{R}^{m \times n}$
  • $A^*: \mathbb{R}^m \to \mathbb{R}^n, \mu \mapsto A^*\mu = M^T \cdot{} \mu$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^m} = \langle \mu, M \cdot{} g \rangle_{\mathbb{R}^m} = \langle M^T \cdot{} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \quad \forall g \in \mathbb{R}^n \, \forall \mu \in \mathbb{R}^m \end{equation*}

Integration

  • $G = L^2(\Omega)$ (with $\langle \cdot{}, \cdot{} \rangle_{L^2(\Omega)} = \int_{\Omega} \cdot{} \cdot{} d x$), $H = \mathbb{R}$
  • $A: L^2(\Omega) \to \mathbb{R}, g(\cdot{}) \mapsto Ag = \int_{\Omega} g(x) d x$
  • $A^*: \mathbb{R} \to L^2(\Omega), \mu \mapsto (A^* \mu)(\cdot{}) = (x \mapsto \mu \cdot{} 1_{\Omega}(x))$

Derivation: \begin{equation*} \langle \mu, Ag \rangle_\mathbb{R} = \mu \cdot{} \int_\Omega g(x) dx = \int_\Omega \mu \cdot{} 1_\Omega(x) g(x) dx = \langle (A^*\mu)(\cdot{}), g(\cdot{}) \rangle_{L^2(\Omega)} \quad \forall g \in L^2(\Omega) \, \forall \mu \in \mathbb{R} \end{equation*}

Examples of (real) adjoint operators

Linear Solve

  • $G, H = \mathbb{R}^n$ and $M \in \mathbb{R}^{n\times n}$ invertible
  • $A: \mathbb{R}^n \to \mathbb{R}^n, g \mapsto (h \in \mathbb{R}^n | M \cdot{} h = g)$, (or $A = M^{-1}$)
  • $A^*: \mathbb{R}^n \to \mathbb{R}^n, \mu \mapsto (\lambda \in \mathbb{R}^n | M^T \cdot{} \lambda = \mu)$, (or $A^* = M^{-T}$)

Derivation (intentionally complicated): We reinterpret \begin{equation*} M \cdot{} h = g \quad \Leftrightarrow \quad \langle v, M \cdot{} h \rangle = \langle v, g \rangle \quad \forall v \in G \end{equation*} in particular (for a fixed but unspecified $\lambda \in G$) \begin{equation} 0 = \langle \lambda, g \rangle - \langle \lambda, M \cdot{} h \rangle \quad \Leftrightarrow \quad 0 = \langle \lambda, g \rangle - \langle M^T \cdot{} \lambda, h \rangle \end{equation} \begin{align*} \langle \mu, Ag \rangle &= \langle \mu, h \rangle\\ &= \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle + \langle \lambda, g \rangle \\ & \quad \text{ with } \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle = 0 \quad \forall h \in H \\ & \quad \text{ or } M^T \cdot{} \lambda = \mu \\ &= \langle A^* \mu, g \rangle \end{align*} (alternatively) \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^n} = \langle \mu, M^{-1} g \rangle_{\mathbb{R}^n} = \langle M^{-T} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \end{equation*}

Examples of (real) adjoint operators

Weak form

  • $G, H$ (real Hilbert spaces)
  • $A: G \to H, g \mapsto \{h \in H | a(h, v) + \langle g, v \rangle_G = 0 \quad \forall v \in G\}$, $\quad a(\cdot{}, \cdot{}) \text{ coercive bilinear form}$
  • $A^*: H \to G, \mu \mapsto \{\lambda \in G | a(v, \lambda) + \langle \mu,v \rangle_H = 0 \quad \forall v \in H\}$

Derivation: \begin{align*} \langle \mu, Ag \rangle_H = f_\mu(h) &= f_\mu(h) + \underbrace{a(h, \lambda) + f_g(\lambda)}_{=0\quad \forall \lambda \in G} \\ &= \underbrace{f_\mu(h) + a(h, \lambda)}_{!= 0 \quad \forall h \in H} + f_g(\lambda) = f_g(\lambda) = \langle \lambda, g \rangle_G \end{align*}

Derivative Notation (from Algorithmic Differentiation)

  • given $f_{(\cdot{})}: X \to Y, x \mapsto f_x$ ($X, Y$ Hilbert) non-linear, Frechet-differentiable

we define:

  • the (continuous, linear) tangent operator $\dot{f}_x \in L(X, Y)$ (directional derivative) \begin{align*} \dot{f}_x(\dot{x}) = \frac{\partial f_x}{\partial x}(\dot{x}) = \lim_{h \to 0} \frac{f_{x + h\dot{x}} - f_{x}}{h} \end{align*}
  • the (continuous, linear) adjoint operator $\bar{f}_x \in L(Y, X)$ \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{f}_x (\bar{y}), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*}

Also we define ($y = f_x$):

  • tangent variables $\dot{x} \in X, \dot{y} \in Y$: given $\dot{x} \in X$, $\dot{y} = \dot{f}_x(\dot{x})$
  • adjoint variables $\bar{x} \in X, \bar{y} \in Y$: given $\bar{y} \in Y$, $\bar{x} = \bar{f}_x(\bar{y})$

Applying this to a composition $ f_x = \varphi^{(N)}_{v^{(N-1)}} \circ \, ...\, \circ\, \varphi^{(1)}_x$ where $\varphi^{(n)}$ are single assignments (chain rule!) is tangent/adjoint mode automatic differentiation. (neglecting all implementational details..)

Automatic/Algorithmic Differentiation

Example

\begin{equation} y = f_x = g \circ h_x \quad v = h_x \end{equation} by the chain rule, the tangent operator $\dot{f}_x$ is \begin{equation} \dot{y} = \dot{f}_x(\dot{x}) = \dot{g}_v(\dot{h}_x(\dot{x})) \end{equation} for the adjoint operator $\bar{f}_x$ we use \begin{align*} \langle \bar{y} , \dot{f}_x(\dot{x}) \rangle_Y = \langle \bar{y}, \dot{g}_v(\dot{h}_x(\dot{x})) \rangle_Y = \langle \bar{g}_v (\bar{y}), \dot{h}_x(\dot{x}) \rangle_V = \langle \bar{h}_x(\bar{g}_v(\bar{y})), \dot{x} \rangle_X \quad \forall \bar{y} \in Y, \dot{x} \in X \end{align*} and find \begin{equation} \bar{f}_x = \bar{h}_x(\bar{g}_v(\bar{y})) \end{equation}

Tangent mode

\begin{align*} &\text{foreach } j = 1, ... J \\ &\qquad \dot{x} \leftarrow e_j \\ &\qquad \dot{y} \leftarrow \dot{g}_v(\dot{h}_x(\dot{x}))\\ &\qquad \text{foreach } i = 1, ... I \\ &\qquad \qquad (Df)^{(i, j)} = \langle e_j, \dot{y} \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}

Adjoint mode

\begin{align*} &\text{foreach } i = 1, ... I \\ &\qquad \bar{y} \leftarrow e_i \\ &\qquad \bar{x} \leftarrow \bar{h}_x(\bar{g}_v(\bar{y}))\\ &\qquad \text{foreach } j = 1, ... J \\ &\qquad \qquad (Df)^{(i, j)} = \langle \bar{x}, e_j \rangle\\ &\qquad \text{end}\\ &\text{end} \end{align*}
  • AD is even more systematic..

Automatic Algorithmic Differentiation

Every numerical program $f: \mathbb{R}^{n-1} \to \mathbb{R}^{m+1}$ can be decomposed into single assignments $(\varphi^{(1)}, ... \varphi^{(N)})$. \begin{align*} &(v^{(0)}, v^{(-1)}, ..., v^{(-n)})^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \quad \forall n = 1, ... N \\ &y = (v^{(N)}, v^{(N-1)}, ..., v^{(N-m)})^T \end{align*}

For every single assignment $\varphi^{(n)}_{(v^{(j)})_{j \prec n}}$ we know

  • tangent operator $\dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v}^{(j)})_{j \prec n})$ and
  • adjoint operator $\bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n})$

Tangent mode

\begin{align*} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &(\dot{v}^{(0)}, \dot{v}^{(-1)}, ...)^T = \dot{x} \\ &\hspace{-30px}\begin{cases} v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} \\ \dot{v}^{(n)} = \dot{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\dot{v})_{j \prec n}) \end{cases} \small{\quad \forall n = 1, ... N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &\dot{y} = (\dot{v}^{(N)}, \dot{v}^{(N-1)}, ...)^T \end{align*}

Adjoint mode

\begin{alignat*}{2} &(v^{(0)}, v^{(-1)}, ...)^T = x \\ &v^{(n)} = \varphi^{(n)}_{(v^{(j)})_{j \prec n}} &&\small{\quad \forall n = 1, ..., N}\\ &y = (v^{(N)}, v^{(N-1)}, ...)^T\\ &(\bar{v}^{(N)}, \bar{v}^{(N-1)}, ...)^T = \bar{y} \\ &\bar{v}^{(n)} = \bar{\varphi}^{(n)}_{(v^{(j)})_{j \prec n}}((\bar{v}^{(k)})_{k \succ n}) &&\small{\quad \forall n = N-m-1, ..., -n}\\ &\bar{x} = (\bar{v}^{(0)}, \bar{v}^{(-1)}, ...)^T \end{alignat*}

Adjoint2 Ingredients: Algorithmic Differentiation (AD)

Given a (Frechet-)differentiable operator $f: \mathcal{X} \to Y, x \mapsto f(x) =: y$ (open $\mathcal{X} \subseteq X$)
(typically in AD-literature: $X, Y$ finite dimensional Hilbert spaces)
  • Definition of the (Frechet-)derivative (tangent operator) $\dot{f}_x: T_xX \to T_yY$ by
    \begin{equation} \lim_{||\dot{x}|| \to 0} \frac{||f(x + \dot{x}) - f(x) - \dot{f}_x(\dot{x})||}{||\dot{x}||} \quad \text{with } \dot{x} \in T_xX \end{equation}
  • Definition of the adjoint (tangent) operator ($T_xX$, $T_yY$ Hilbert spaces, $\dot{f}_x$ continuous, linear)
    \begin{equation} \langle \bar{y}, \dot{f}_x(\dot{x}) \rangle_{T_y Y} = \langle \bar{f}_x(\bar{y}), \dot{x} \rangle_{T_x X} \quad \forall \dot{x} \in T_xX, \bar{y} \in T_yY \end{equation}
    (notation adopted from AD)
  • Example: compute the Jacobian $Jf_x$ of $f: \mathbb{R}^I \to \mathbb{R}^J$
    \begin{equation} [Jf_x]_{ji} = \underbrace{\langle \color{green}{e_j}, \color{orange}{\dot{f}_x(e_i)} \rangle}_{\small \text{tangent mode (AD)}} = \underbrace{\langle \color{green}{\bar{f}_x(e_j)}, \color{orange}{e_i} \rangle}_{\small\text{adjoint mode (AD)}} \end{equation}

Algorithmic Differentation Tools

automatic composition of "elemental" tangent/adjoint operators $\dot{f}_x(\cdot{})$/$\bar{f}_x(\cdot{})$ based on the chain rule to compute
  • tangent vectors: $\dot{y}, \dot{x}$
  • adjoint vectors: $\bar{x}, \bar{y}$

where $\langle \bar{y}, \dot{y} \rangle = \langle \bar{x}, \dot{x}\rangle$.

								using ChainRulesCore, Zygote
								function f(a, b)
									y = 2*a*a + b
									return y
								end
								
								function ChainRulesCore.rrule(::typeof(f), a, b)
									y = 2*a*a + b
									function f̄(ȳ)
										ā = 4*a * ȳ
										b̄ = ȳ
										return ZeroTangent(), ā, b̄
									end
									return y, f̄
								end
							
								Zygote.withgradient(f, 1.0, 2.0) # (val = 4.0, grad = (4.0, 1.0))
							

Adjoint2 Ingredients: Example: linear pde (weak form)

Given a solution operator of a linear pde $A: G \to H$ ($U, V, G, H$ Hilbert spaces $U\subseteq H$, $V\subseteq G$) \begin{equation} \color{orange}{A(g) = (u \in U | a(u, v) + \langle v, g\rangle_G = 0\quad \forall v \in V)} \end{equation}
Derivation: we introduce $\lambda \in V$ \begin{align*} \Sigma = \langle h, A(g) \rangle_H &= \langle h, u \rangle_H \\ &=\langle h, u \rangle_H + \color{orange}{\underbrace{a(u, \lambda) + \langle \lambda, g\rangle_G}_{=0 \, \forall \lambda \in U}} \\ &=\color{green}{\underbrace{\langle h, u \rangle_H + a(u, \lambda)}_{=0 \, \forall u \in U}} + \langle \lambda, g\rangle_G \\ & \hphantom{\, = \langle h, u \rangle_H + a(u, \lambda)} = \langle \lambda, g \rangle_G = \langle A^*(h), g \rangle_G = \Sigma \end{align*}
The adjoint operator $A^*: H \to G$ is \begin{equation} \color{green}{A^*(h) = (\lambda \in V | a(u, \lambda) + \langle h, u \rangle_H = 0 \quad \forall u \in U)} \end{equation}

Adjoint2 Ingredients: Dual/Adjoint Consistency

A discretization is dual consistent, if:
"The adjoint of the discretized problem is a consistent discretization of the adjoint of the continuous problem."
Example (neglecting BC):
\begin{align} L u + g &= 0 \quad \forall x \in \mathcal{R} \\ \Sigma &= \int_{\mathcal{R}} h u \, dx \end{align}
\begin{equation} \text{continuous dual problem} \end{equation}
\begin{equation} \leftrightarrow \end{equation}
\begin{align} L^* \lambda + h &= 0 \quad \forall x \in \mathcal{R} \\ \Sigma &= \int_{\mathcal{R}} g \lambda \, dx \end{align}
\begin{equation} \updownarrow \text{ discretization } \end{equation}
\begin{equation} \updownarrow \text{ ? } \end{equation}
\begin{align} L_\Delta u_\Delta + g_\Delta &= 0 \\ \Sigma_\Delta &= \langle h_\Delta, u_\Delta \rangle \end{align}
\begin{equation} \text{discrete dual problem} \end{equation}
\begin{equation} \leftrightarrow \end{equation}
\begin{align} L^T_\Delta \lambda_\Delta + h_\Delta &= 0 \\ \Sigma_\Delta &= \langle \lambda_\Delta, g_\Delta \rangle \end{align}
  • interpretability: $\lambda_\Delta$ is the approximation of the continuous adjoint equation
  • "superconvergence" of linear functionals $|\Sigma - \Sigma_\Delta|$ (convergence order doubling)
  • no ambiguity between:
    • "optimize-then-discretize" vs. "discretize-then-optimize"
    • "continuous adjoint method" vs. "discrete adjoint method"
Examples:
  • ODE/FD-discretizations: explicit/implicit euler: (not dual consistent), implicit midpoint/trapeziodal rule (with suitable quadrature + staggered variables dual consistent)
  • FE-discretization: (with weak boundary conditions + suitable/no stabilization dual consistent)
  • DG/SBP (see literature)
  • Hicken, J. E., & Zingg, D. W. (2014). Dual consistency and functional accuracy: a finite-difference perspective. doi:10.1016/j.jcp.2013.08.014
  • Oliver, T. A., & Darmofal, D. L. (2009). Analysis of Dual Consistency for Discontinuous Galerkin Discretizations of Source Terms. doi:10.1137/080721467
  • Hartmann, R. (2007). Adjoint Consistency Analysis of Discontinuous Galerkin Discretizations. doi:10.1137/060665117
  • Giles, M. B., & Pierce, N. A. (2003). Adjoint Error Correction for Integral Outputs. doi:10.1007/978-3-662-05189-4_2

Applying Adjoints Twice

  • parameters $p \in P$
  • parameter-dependent bilinear form $a_p: U \times V (\times P) \to \mathbb{R}$
  • linear excitations $b^{(i)} : V \to \mathbb{R}$
  • linear extractions $c^{(j)}: U \to \mathbb{R}$
  • observables $\Sigma^{(ji)} \in \mathbb{R}$
  • tangent of $a_p$ $\dot{a}_p: U \times V \times P \to \mathbb{R}$
  • corres. adjoint $\bar{a}_p: U \times V \times \mathbb{R} \to P$
  • tangent spaces = spaces (?)
  • given tangent/adjoint vectors $\dot{p}$ and $\bar{\Sigma}^{(ji)}$ resp.

Model ($u^{(i)} \in U)$:

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^{(j)}(u^{(i)}) \end{align*}

1st "adjoint method" ($\lambda^{(j)} \in V$):

\begin{align*} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U \\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ \end{align*}

Tangent/sensitivity model ($\dot{\lambda}^{(j)} \in V)$):

\begin{align*} a_p(v, \dot{\lambda}^{(j)}) + \dot{a}_p(v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U\\ \dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)}) \end{align*}

2nd "adjoint method" ($\bar{\lambda}^{(j)} \in U$):

\begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \dot{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{p}) \end{align}
\begin{align} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), \dot{p} \rangle \end{align}

from AD ($\langle \bar{\Sigma}, \dot{\Sigma} \rangle = \langle \bar{p}, \dot{p} \rangle$):

\begin{align} a_p(\underbrace{\bar{\Sigma}^{(ji)} v}_{=\bar{\lambda}^{(j)} \small(\text{with fixed } v)}, \dot{\lambda}^{(j)}) + \dot{a}_p(\bar{\Sigma}^{(ji)} v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \bar{\Sigma}^{(ji)} b^{(i)}(\dot{\lambda}^{(j)}) \end{align}

Applying Adjoints Twice: Summary

with objective function $g: \mathbb{R}^{J\times I} \to \mathbb{R}, \Sigma \mapsto g(\Sigma) = C$ and parameters $p \in P = \mathbb{R}^N$

non-adjoint forward + non-adjoint derivative

\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^ {(j)}(u^{(i)}) \\ C &= g(\Sigma^{(\cdot{}\cdot{})}) \\ a_p(\dot{u}^{(in)}, v) + \dot{a}_p(u^{(i)}, v, e^{(n)}) &= 0 \quad \forall v \in V\\ \dot{\Sigma}^{(jin)} &= c^{(j)}(\dot{u}^{(in)}) \\ (\nabla_p C)^{(n)} &= \dot{g}_{\Sigma^{(\cdot{}\cdot{})}}(\dot{\Sigma}^{(\cdot{}\cdot{}n)}) \end{align*}
  • cost: $(I + IN)\times \mathcal{C}(a_p(\cdot{}, \cdot{})) + ... $

adjoint forward + adjoint derivative

\begin{align} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U\\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ C &= g(\Sigma^{(\cdot{}\cdot{})}) \\ \bar{\Sigma}^{(\cdot{}\cdot{})} &= \bar{g}_{\Sigma^{(\cdot{}\cdot{})}}(\bar{C})\\ a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\ (\nabla_p C)^{(n)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), e^{(n)}\rangle \end{align}
  • cost: $2J \times \mathcal{C}(a_p(\cdot{}, \cdot{})) + ... $

Generalization:

  • parameter-dependent $b_p^{(i)}(v)$ and $c_p^{(j)}(u)$ \begin{equation} \bar{p} = \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1) + \bar{c}_p^{(j)}(\bar{\lambda}^{(j)}, 1) + \bar{b}^{(i)}_p(\lambda^{(j)}, \bar{\Sigma}^{(ji)}) \end{equation}

Assumptions:

  • $\mathcal{C}(a_p(\cdot{}, \cdot{})) \gg \mathcal{C}(b^{(i)}), \mathcal{C}(c^{(j)}), ...$
  • $a_p(\cdot{}, \cdot{})$ cannot be solved directly (e.g. LU)
  • $I > J$, $N > J$

Example: An inverse problem based on the Poisson equation

model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(i, j)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ &C = \frac{1}{2 IJ}\sum_{i, j=1}^{I,J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2 \end{align*}
  • $m, h^{(j)} \in L^2(\mathcal{R})$, $g^{(i)} \in L^2(\partial \mathcal{R})$
  • weak enforcement of boundary conditions $u, v \in H^1_h(\mathcal{R})$
  • Aubin-Nitsche trick: stable bilinear form \[a(u^{(i)}, v) + b^{(i)}(v) = 0 \quad \forall v \in H^1_h(\mathcal{R})\]
  • $I, N \gg J$ (many bc $g^{(i)}$, many parameters $p \in \mathbb{R}^N$, some extractions $h^{(j)}$)
  • here: $I=200$, $N=2972$, $J=7$
  • FE-framework: Gridap.jl, Linear Solver: gmres + ilu

solutions $u^{(i)}$

measurements $\Sigma^{(i, j)}$

Derivation: An inverse problem based on the Poisson equation

weak forms

\begin{align*} a_p(u, v) &= \int_{\mathcal{R}} m_p \nabla u \cdot{} \nabla v dx - \int_{\partial \mathcal{R}} m_p \nabla_n u v + u \nabla_n v \, d\Gamma + \int_{\partial \mathcal{R}} \alpha_h u v \, d \Gamma \\ b^{(i)}(v) &= \int_{\partial \mathcal{R}} \nabla_n v g^{(i)} \, d\Gamma - \int_{\partial \mathcal{R}} \alpha_h g^{(i)} v \, d\Gamma\\ c^{(j)}(u) &= \int_{\mathcal{R}} h^{(j)} u \, dx \end{align*}

tangent and adjoint operators

\begin{align*} \dot{a}_p(u, v, \dot{p}) &= \int_{\mathcal{R}} \dot{m}_p(\dot{p}) \nabla u \cdot{} \nabla v \,d x - \int_{\partial \mathcal{R}} \dot{m}_p(\dot{p}) \nabla_n u v \,d \Gamma \\ \bar{a}_p(u, v, \bar{C}) &= \bar{C}\int_{\mathcal{R}} \partial_p m_p \nabla u \cdot{} \nabla v \,d x - \bar{C}\int_{\partial \mathcal{R}} \partial_p m_p \nabla_n u v \,d \Gamma \end{align*} where $\partial_p m_p$ is the parameter-gradient of the field $m$ defined by \begin{equation} \dot{m}_p(\dot{p})(x) = \langle \partial_p m_p(x), \dot{p} \rangle \end{equation}

Example: Derivation

adjoint derivative

\begin{align*} &a_m(\bar{\lambda}^{(j)}, \dot{\lambda}) + \bar{\boldsymbol \Sigma}^{(j)T} \boldsymbol{b}(\dot{\lambda}) = 0 \quad \forall \dot{\lambda} \\ &\int_{\mathcal{R}} m \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \dot{\lambda} dx - \int_{\partial \mathcal{R}} m \nabla_n \bar{\lambda}^{(j)} \dot{\lambda} + \bar{\lambda}^{(j)} \nabla_n \dot{\lambda} \, d\Gamma + \int_{\partial \mathcal{R}} \alpha \bar{\lambda}^{(j)} \dot{\lambda} \, d \Gamma \\ &+ \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda} \sum_{i=1}^{I}(\bar{\Sigma}^{(i, j)} g^{(i)}) \, d\Gamma - \int_{\partial \mathcal{R}} \alpha \sum_{i=1}^{I} (\bar{\Sigma}^{(i, j)} g^{(i)}) \dot{\lambda} \, d\Gamma = 0 \quad \forall \dot{\lambda} \end{align*}

adjoint derivative (strong form)

\begin{align*} \begin{cases} - \nabla m \cdot{} \nabla \bar{\lambda}^{(j)} = 0 \quad &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(i, j)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases} \end{align*}

tangent model

\begin{align*} \dot{C} = \sum_{j=1}^{J} \dot{a}_m(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{m}) = \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx - \int_{\partial \mathcal{R}} \dot{m} \nabla_n \bar{\lambda}^{(j)} \underbrace{\lambda^{(j)}}_{=0} \, d \Gamma \end{align*}

adjoint model

\begin{align*} \langle \bar{C}, \sum_{j=1}^{J} \sum_{j=1}^{J} \int_{\mathcal{R}} \dot{m} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \, dx \rangle = \langle \bar{C} \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)}, \dot{m} \rangle_{L^2(\mathcal{R})} \\ \bar{a}_m(\bar{\lambda}^{(:)}, \lambda^{(:)}, \bar{C}) = \sum_{j=1}^{J} \bar{C} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \end{align*}

Example: Derivation

adjoint parametrization

\begin{align*} \langle \bar{m}, \dot{m} \rangle_{L^2(\mathcal{R})} = \langle \bar{m}, \dot{\gamma}_p(\dot{p}) \rangle_{L^2(\mathcal{R})} = \langle \int_{\mathcal{R}} \frac{\partial \gamma_p}{\partial p} \bar{m} \, dx, \dot{p} \rangle_{\mathbb{R}} \end{align*}

Example: An inverse problem based on the Poisson equation

measurements $\Sigma^{(ji)}$

non-adjoint model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\ u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R} \end{cases}\\ &\Sigma^{(ji)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\ \end{align*}
  • $I = 200$ excitations
  • $J = 7$ extractions
  • $N = 2972$ grid cells

solution $u^{(i)}$

  • $\approx 100ms (\approx 200\times 0.5ms + \text{...})$
  • $\approx 5.5ms (\approx 7\times 0.5ms + \text{...})$

adjoint model

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla \lambda^{(j)} = -h^{(j)} &\forall x \in \mathcal{R} \\ \lambda^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\Sigma^{(ji)} = \int_{\partial \mathcal{R}} \nabla_n \lambda^{(j)} g^{(i)} \, d x\\ \end{align*}
  • strong form not required for implementation
  • riesz representation $\nabla_n \lambda$ has all information to compute measurements for arbitrary $g^{(i)}$
adjoint solution $\lambda^{(j)}$
riesz representation $\nabla_n \lambda^{(j)}$

Example: An inverse problem based on the Poisson equation

gradient $\nabla_m C$

non-adjoint derivative

\begin{align*} &\begin{cases} -\nabla \cdot{} m \nabla \dot{\lambda}^{(j)} = \nabla \cdot{} \dot{m} \nabla \lambda^{(j)} &\forall x \in \mathcal{R} \\ \dot{\lambda}^{(j)} = 0 \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\dot{\Sigma}^{(ji)} = \int_{\partial \mathcal{R}} \nabla_n \dot{\lambda}^{(j)} g^{(i)} \, d x\\ &\dot{C} = \frac{1}{IJ} \sum_{i, j=1}^{I,J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)})\dot{\Sigma}^{(ji)} \end{align*}
  • $m(x) = 0.5$
tangent solution $\dot{\lambda}^{(j)}$
  • $\approx 16.6s (\approx 2972 \times 5.5ms + \text{...})$
  • $\approx 4\text{min }30s (\approx 2972 \times 200 \times 0.5ms + \text{...})$
  • $\approx 14.6ms (\approx 2\times 5.5ms + \text{...})$

adjoint derivative

\begin{align*} &\bar{\Sigma}^{(ji)} = \frac{1}{I J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)}) \bar{C} \\ &\begin{cases} -\nabla \cdot{} m \nabla \bar{\lambda}^{(j)} = 0 &\forall x \in \mathcal{R} \\ \bar{\lambda}^{(j)} = \sum_{i=1}^{I} \bar{\Sigma}^{(ji)} g^{(i)} \quad &\forall x \in \partial \mathcal{R} \end{cases}\\ &\bar{m} = \sum_{j=1}^{J} \nabla \bar{\lambda}^{(j)} \cdot{} \nabla \lambda^{(j)} \\ \end{align*}
adjoint solution $\bar{\lambda}^{(j)}$
Taylor remainder

Example: An inverse problem based on the Poisson equation

  • runtime: $5s$
  • without adjoint sensitivity: $\approx 1h$
  • without adjoint$^2$: $\approx 11\text{h } 45\text{min}$
$m$-parametrization
\begin{align} v^{[0]} &= (x, y)^T \\ v^{[1]} &= \text{relu}(W^{[1]} \cdot{} v^{[0]} + b^{[1]}) \\ v^{[2]} &= \text{relu}(W^{[2]} \cdot{} v^{[1]} + b^{[2]}) \\ v^{[3]} &= \text{softmax}(W^{[3]} \cdot{} v^{[2]} + b^{[3]}) \\ m &= (0.1, 0.4, 0.9) \cdot{} v^{[3]} \end{align}
possible application
  • transformation to "phase classification problem"
  • from SciML: neural network as parametrization $m(p)$
  • the parametrization $m(p)$ "regularizes"
  • number of parameters: $1903$
  • no "inverse crime":
    • $2972$ vs $72965$ grid cells
    • additive noise $\mathcal{N}(\mu=0, \sigma=1\times 10^{-5})$
  • L-BFGS (Optim.jl), NN (Lux.jl), AD (Zygote.jl) + custom adjoint
measurements $\Sigma^{(i, j)}$ and $\tilde{\Sigma}^{(i, j)}$
optimized material $m$
true material
  • "neural networks bias towards sparse solutions"
  • convergence dependes on the inital guess (does not always converge)
  • taylor the parametrization to specific problems (layers, other geometries, phase values, e.g.)

Material Reconstruction in EPMA

radiative transfer equation
($\psi^{(i)}$: electron fluence)
\begin{align*} \Omega \cdot{} \nabla_x \psi + \Sigma^{\text{tot}} \psi &= \int_{\mathbb{R}^+} \int_{S^2} \Sigma^{\text{scat}} \psi \, d\Omega' \, d\epsilon' \quad \forall (x, \epsilon, \Omega) \in \mathcal{R} \times \mathcal{E} \times S^2 \\ \end{align*} \begin{align*} \psi &= g_{\text{beam}}^{(i)} \quad \forall n\cdot{}\Omega < 0 \\ \end{align*}
observations
$\Sigma^{(ji)} = \int_{\mathcal{R}} \Sigma^{\text{x-ray}, (j)}\psi^{(i)} \, d \Gamma$
objective
$C = \frac{1}{IJ} \sum_{i,j=1}^{I, J} (\Sigma^{(ji)} - \tilde{\Sigma}^{(ji)})^2$
additivity approximation for scattering kernels $\Sigma^\star$
\begin{equation} \Sigma^{\star}(x, \epsilon, \epsilon', \Omega, \Omega') = \sum_{e=1}^{n_e} \mathcal{N}_e(x) \sigma_e^{\star}(\epsilon, \epsilon', \Omega, \Omega') \end{equation}
continuous slowing down (csd)-approximation
\begin{align} \Sigma \psi - \int_{\mathcal{E}} \Sigma \psi \, d \epsilon \approx -\partial_{\epsilon}( s \psi ) \\ \psi(\epsilon=\epsilon_{\text{max}}) = 0 \end{align}
discretization
  • even/odd splitting \begin{equation} \psi^\pm(\cdot{}, \cdot{}, \Omega) = \frac{\psi(\cdot{}, \cdot{}, \Omega) \pm \psi(\cdot{}, \cdot{}, -\Omega)}{2} \end{equation}
  • weak form of radiative transfer \begin{align} \begin{split}\label{eq:linear_boltzmann_csd_biliner_form} a_p(\psi, \phi) = \textstyle -\langle\partial_\epsilon (s_p \psi) \phi \rangle &+ \langle \Omega \cdot{} \nabla_x \psi^+ \phi^- - \psi^- \Omega \cdot{} \nabla_x \phi^+ \rangle \\ &+ \textstyle[|n \cdot{} \Omega| \psi^+ \phi^+]_{\partial_x} + \langle \Sigma^{\text{el,tot}}_p \psi \phi - \int_{S^2} \Sigma^{\text{el}}_p \psi \, d \Omega' \phi \rangle \end{split} \end{align} \begin{align} b^{(i)}(\phi) &= 2 [n \cdot{} \Omega g^{(i)} \phi^+ ]_{\partial_x^-}. \end{align} \begin{align} c^{(j)}_p(\psi) &= \langle \mathcal{N}_j \sigma^{\text{x-ray}}_j \psi^+ \rangle. \end{align}
  • $\Omega$: spherical harmonics basis ($P_N$)
  • $x$: mixed Lagrange basis ($L^2_h, H^1_h$)
  • $\epsilon$: staggered implicit trapeziodal/midpoint rule
  • Schur complement (eliminates odd $^-$ variables)
  • MINRES: solves remaining symmetric system
  • "matrix-free" method
  • "kronecker trick" $(B^T \otimes A)\text{vec}(X) = AXB$
  • dual consistent discretization
implementation in julia using: CUDA.jl, Gridap.jl, Krylov.jl, Optim.jl, Lux.jl, ...
  • Egger, H., & Schlottbom, M. (2012). A Mixed Variational Framework For The Radiative Transfer Equation. doi:10.1142/s021820251150014x
  • Egger, H., & Schlottbom, M. (2014). Numerical methods for parameter identification in stationary radiative transfer. doi:10.1007/s10589-014-9657-9
  • Egger, H., & Schlottbom, M. (2016). A class of Galerkin Schemes for Time-Dependent Radiative Transfer. doi:/10.1137/15M1051336

Derivation: Material Reconstruction in EPMA

\begin{equation} \psi(x, \epsilon_i, \Omega) \approx \underbrace{\begin{pmatrix}\mathcal{X}^+(x)^T & \mathcal{X}^-(x)^T\end{pmatrix}}_{=\mathcal{X}(x)^T} \begin{pmatrix} \Psi^+_i & 0 \\ 0 & \Psi^-_i\end{pmatrix} \underbrace{\begin{pmatrix} \Upsilon^+(\Omega) \\ \Upsilon^-(\Omega) \end{pmatrix}}_{=\Upsilon(\Omega)}, \end{equation}
\begin{align} \boldsymbol{N}_e^\pm &= \int_{\mathcal{R}} \mathcal{X}^\pm \mathcal{N}_e (\mathcal{X}^\pm)^T \, d x \in \mathbb{R}^{n_x^\pm \times n_x^\pm}\\ \boldsymbol{\nabla}_d &= \int_{\mathcal{R}} \nabla_{x_d} \mathcal{X}^+ (\mathcal{X}^-)^T \, d x \in \mathbb{R}^{n_x^+ \times n_x^-}\\ \boldsymbol{\partial}_d &= \int_{\partial \mathcal{R}} |n_d| \mathcal{X}^+ (\mathcal{X}^+)^T \, d \Gamma \in \mathbb{R}^{n_x^+ \times n_x^+} \end{align}
\begin{align} \boldsymbol{I}^\pm &= \int_{S^2} \Upsilon^\pm (\Upsilon^\pm)^T \, d \Omega = I \in \mathbb{R}^{n_\Omega^\pm \times n_\Omega^\pm} \\ \boldsymbol{\Omega}_d &= \int_{S^2} \Omega_d \Upsilon^- (\Upsilon^+)^T \, d \Omega \in \mathbb{R}^{n_\Omega^- \times n_\Omega^+}\\ \boldsymbol{|\Omega|}_d &= \int_{S^2} |\Omega_d| \Upsilon^+ (\Upsilon^+)^T \, d \Omega \in \mathbb{R}^{n_\Omega^+ \times n_\Omega^+} \end{align}
\begin{equation} \boldsymbol{k}_e^\pm = \int_{S^2} \int_{S^2} k_e(\Omega \cdot{} \Omega') \Upsilon^\pm \, d \Omega' (\Upsilon^\pm)^T \, d \Omega \in \mathbb{R}^{n_\Omega^\pm \times n_\Omega^\pm} \end{equation}
\begin{align} \begin{split} \boldsymbol{a}(\Psi, \Phi) = \sum_{i=0}^{N-1} \,& [-\boldsymbol{N}_e^+ \frac{s_e^{i+1} \Psi^+_{i+1} - s_e^{i}\Psi^+_{i}}{\Delta \epsilon} \boldsymbol{I}^+ -\boldsymbol{\nabla}_d \frac{\Psi^-_{i} + \Psi^-_{i+1}}{2} \boldsymbol{\Omega}_d +\boldsymbol{\partial}_d \frac{\Psi^+_{i} + \Psi^+_{i+1}}{2} \boldsymbol{|\Omega|}_d \\ &\quad +\boldsymbol{N}_e^+ \frac{\tau_e^{i}\Psi^+_{i} + \tau_e^{i+1}\Psi^+_{i+1}}{2} (\boldsymbol{I}^+ - \boldsymbol{k}_e^+)]\cdot{} \Phi^+_{i + \frac{1}{2}}\Delta \epsilon\\ +\,&[-\boldsymbol{N}_e^- \frac{s_e^{i+1} \Psi^-_{i+1} - s_e^{i}\Psi^-_{i}}{\Delta \epsilon} \boldsymbol{I}^- +\boldsymbol{\nabla}^T_d \frac{\Psi^+_{i} + \Psi^+_{i+1}}{2} \boldsymbol{\Omega}^T_d \\ &\quad +\boldsymbol{N}_e^- \frac{\tau_e^{i}\Psi^-_{i} + \tau_e^{i+1}\Psi^-_{i+1}}{2}(\boldsymbol{I}^- - \boldsymbol{k}_e^-)] \cdot \Phi^-_{i+\frac{1}{2}}\Delta \epsilon \end{split} \end{align} \begin{align} \boldsymbol{b}^{(j)}(\Phi) \, = \sum_{i=0}^{N-1}\frac{\boldsymbol{g}^+_{i} + \boldsymbol{g}^+_{i+1}}{2} \cdot{} \Phi_{i+\frac{1}{2}}^+\Delta \epsilon \text{ with } \boldsymbol{g}^+_{i} = 2 \int_{\partial \mathcal{R}} \int_{n \cdot{} \Omega < 0} n \cdot{} \Omega \, g^{(j)}(x, \epsilon_i, \Omega) \Upsilon^+(\Omega) \mathcal{X}^+(x)^T \, d \Omega \, d \Gamma \end{align} \begin{align} \boldsymbol{c}^{(j)}(\Psi) \, = \sum_{i=0}^{N-1} \frac{ \boldsymbol{h}^+_{i} \cdot{} \Psi^+_{i} + \boldsymbol{h}^+_{i+1}\cdot{}\Psi^+_{i+1}}{2} \Delta \epsilon \text{ with } \boldsymbol{h}^+_{i} = \int_{\mathcal{R}} \int_{S^2} \mathcal{N}_j(x) \sigma^{\text{x-ray}}(\epsilon_i) \Upsilon^+(\Omega) \mathcal{X}^+(x)^T \, d \Omega \, d x. \end{align}

Material Reconstruction in EPMA

measurements $\Sigma^{(ji)}$
non-adjoint model
\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \, \forall v \in V \\ \Sigma^{(ji)} &= c_p^{(j)}(u^{(i)}) \end{align*}
  • $P_{21}$ moments ($253$ moments)
  • $40 \times 120$ spatial grid
  • $20$ energy steps
  • $2(\epsilon) \times 71(x) \times 3(\Omega) = 426 \text{ beams}$
  • $2$ elements
solution $\int_{S^2}\int_{\mathcal{E}} u^{(i)} \, d\epsilon \,d\Omega$
  • $\approx 7\text{min } 6\text{s} (\approx 426\times 1\text{s} + \text{...})$
  • $\approx 2s (\approx 2\times 1\text{s} + \text{...})$
adjoint model
\begin{align*} a_p(v, \lambda^{(j)}) + c_p^{(j)}(v) = 0 \, \forall v \in U \\ \Sigma^{(ji)} = b^{(i)}(\lambda^{(j)}) \end{align*}
adjoint solution $\int_{S^2}\int_{\mathcal{E}} \lambda^{(j)} \, d\epsilon \,d\Omega$
riesz representation $\int_{S^2} \lambda^{(j)} \,d\Omega$

Material Reconstruction in EPMA

gradient $\Sigma^{(ji)}$
non-adjoint derivative
\begin{align*} a_p(v, \dot{\lambda}^{(j)}) +& \dot{a}_p(v, \lambda^{(j)}, \dot{p}) \\ &+ \dot{c}_p(v, \dot{p}) = 0 \\ \dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)}) \end{align*}
  • parameter dependent $c$
  • $\mathcal{N}_e(x) = 0.5$
tangent $\int_{S^2}\int_{\mathcal{E}} \dot{\lambda}^{(j)} \, d\epsilon \,d\Omega$
  • $\approx 2\text{h } 40\text{min} (\approx 4800\times 2\text{s} + \text{...})$
  • $\approx 23\text{d } 16\text{h} (\approx 4800\times 7\text{min } 6\text{s} + \text{...})$
  • $\approx 5s (\approx 2\times 2\text{s} + \text{...})$
adjoint derivative
\begin{align*} a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ij)}b^{(i)}(v) = 0 \, \forall v \in U \\ \bar{p} = \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1) + \bar{c}_p(\bar{\lambda}^{(j)}, 1) \end{align*}
adjoint solution $\int_{S^2}\int_{\mathcal{E}} \bar{\lambda}^{(j)} \, d\epsilon \,d\Omega$
Taylor remainder

Material Reconstruction in EPMA

measurements $\Sigma^{(ji)}$
optimized material $\mathcal{N}_e(x)$
EPMA Outlook
  • more realistic physics (scattering cross section, x-ray absorption, secondary flourescence)
  • more realistic test-cases (multiple phases)
  • uncertainty quantification of the inversion result (Hessian)
  • improve radiative transfer solver (dynamical low-rank)
  • 3D?

References

Slides + Pluto.jl notebook (Poisson example)

github.com/tam724