Tamme Claus1, Gaurav Achuda2, Silvia Richter2, Manuel Torrilhon1
1 ACoM, Applied and Computational Mathematics, RWTH Aachen University
2 GFE, Central Facility for Electron Microscopy, RWTH Aachen University
IGPM Seminar 2025, RWTH Aachen
23.01.2025
"Material imaging based on characteristic x-ray emission"
Microprobe at GFE (source: gfe.rwth-aachen.de)
while (∇ₚC != 0) foreach δp foreach beam solve_pde() end end end
Efficient evaluation of the objective function and its gradient is crucial!
Definition of the adjoint operator ($H, G$ Hilbert spaces, $A: G \to H$ continuous, linear operator) \[ \langle h, A(g) \rangle_H = \langle A^*(h), g \rangle_G \quad \forall h \in H \,, g \in G \]
Let $(H, \langle \cdot{}, \cdot{} \rangle_H)$ be a (real) Hilbert space with inner product $\langle \cdot{}, \cdot{} \rangle_H$. For every continuous linear functional $f_h \in H^*$ (the dual of $H$), there exists a unique vector $h \in H$, called the Riesz Representation of $f_h$, such that \[ f_h(x) = \langle h, x \rangle_H \quad \forall x \in H. \]
Let $(G, \langle \cdot{}, \cdot{} \rangle_G)$ be another (real) Hilbert space, $A:G\to H$ a continuous linear operator between $G$ and $H$ and $f_h \in H^*$ a continuous linear functional. \[ f_h(A(g)) = \langle h, A(g) \rangle_H \quad \forall g \in G \] But $f_h(A(g))$ is also a continuous linear functional $f_\lambda(g)$ in $G$ with a Riesz Representation $\lambda \in G$ \[ f_h(A(g)) = f_\lambda(g) = \langle \lambda, g \rangle_G := \langle A^*(h), g \rangle_G. \]
Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}} = \langle \mu, a \cdot{} g \rangle_{\mathbb{R}} = \langle a \cdot{} \mu, g \rangle_{\mathbb{R}} = \langle A^*\mu , g \rangle_{\mathbb{R}} \quad \forall g \in \mathbb{R}\, \forall \mu \in \mathbb{R} \end{equation*}
Derivation: \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^m} = \langle \mu, M \cdot{} g \rangle_{\mathbb{R}^m} = \langle M^T \cdot{} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \quad \forall g \in \mathbb{R}^n \, \forall \mu \in \mathbb{R}^m \end{equation*}
Derivation: \begin{equation*} \langle \mu, Ag \rangle_\mathbb{R} = \mu \cdot{} \int_\Omega g(x) dx = \int_\Omega \mu \cdot{} 1_\Omega(x) g(x) dx = \langle (A^*\mu)(\cdot{}), g(\cdot{}) \rangle_{L^2(\Omega)} \quad \forall g \in L^2(\Omega) \, \forall \mu \in \mathbb{R} \end{equation*}
Derivation (intentionally complicated): We reinterpret \begin{equation*} M \cdot{} h = g \quad \Leftrightarrow \quad \langle v, M \cdot{} h \rangle = \langle v, g \rangle \quad \forall v \in G \end{equation*} in particular (for a fixed but unspecified $\lambda \in G$) \begin{equation} 0 = \langle \lambda, g \rangle - \langle \lambda, M \cdot{} h \rangle \quad \Leftrightarrow \quad 0 = \langle \lambda, g \rangle - \langle M^T \cdot{} \lambda, h \rangle \end{equation} \begin{align*} \langle \mu, Ag \rangle &= \langle \mu, h \rangle\\ &= \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle + \langle \lambda, g \rangle \\ & \quad \text{ with } \langle \mu, h \rangle - \langle M^T \cdot{} \lambda, h \rangle = 0 \quad \forall h \in H \\ & \quad \text{ or } M^T \cdot{} \lambda = \mu \\ &= \langle A^* \mu, g \rangle \end{align*} (alternatively) \begin{equation*} \langle \mu, Ag \rangle_{\mathbb{R}^n} = \langle \mu, M^{-1} g \rangle_{\mathbb{R}^n} = \langle M^{-T} \mu, g \rangle_{\mathbb{R}^n} = \langle A^*\mu, g \rangle_{\mathbb{R}^n} \end{equation*}
Derivation: \begin{align*} \langle \mu, Ag \rangle_H = f_\mu(h) &= f_\mu(h) + \underbrace{a(h, \lambda) + f_g(\lambda)}_{=0\quad \forall \lambda \in G} \\ &= \underbrace{f_\mu(h) + a(h, \lambda)}_{!= 0 \quad \forall h \in H} + f_g(\lambda) = f_g(\lambda) = \langle \lambda, g \rangle_G \end{align*}
we define:
Also we define ($y = f_x$):
For every single assignment $\varphi^{(n)}_{(v^{(j)})_{j \prec n}}$ we know
using ChainRulesCore, Zygote
function f(a, b)
y = 2*a*a + b
return y
end
function ChainRulesCore.rrule(::typeof(f), a, b)
y = 2*a*a + b
function f̄(ȳ)
ā = 4*a * ȳ
b̄ = ȳ
return ZeroTangent(), ā, b̄
end
return y, f̄
end
Zygote.withgradient(f, 1.0, 2.0) # (val = 4.0, grad = (4.0, 1.0))
Model ($u^{(i)} \in U)$:\begin{align*} a_p(u^{(i)}, v) + b^{(i)}(v) &= 0 \quad \forall v \in V \\ \Sigma^{(ji)} &= c^{(j)}(u^{(i)}) \end{align*} |
1st "adjoint method" ($\lambda^{(j)} \in V$):\begin{align*} a_p(v, \lambda^{(j)}) + c^{(j)}(v) &= 0 \quad \forall v \in U \\ \Sigma^{(ji)} &= b^{(i)}(\lambda^{(j)}) \\ \end{align*} |
Tangent/sensitivity model ($\dot{\lambda}^{(j)} \in V)$):
\begin{align*}
a_p(v, \dot{\lambda}^{(j)}) + \dot{a}_p(v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U\\
\dot{\Sigma}^{(ji)} &= b^{(i)}(\dot{\lambda}^{(j)})
\end{align*}
|
2nd "adjoint method" ($\bar{\lambda}^{(j)} \in U$):
\begin{align}
a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\
\bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \dot{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, \dot{p})
\end{align}
\begin{align}
a_p(\bar{\lambda}^{(j)}, v) + \bar{\Sigma}^{(ji)} b^{(i)}(v) &= 0 \quad \forall v \in V \\
\bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \langle \bar{a}_p(\bar{\lambda}^{(j)}, \lambda^{(j)}, 1), \dot{p} \rangle
\end{align}
|
from AD ($\langle \bar{\Sigma}, \dot{\Sigma} \rangle = \langle \bar{p}, \dot{p} \rangle$):\begin{align} a_p(\underbrace{\bar{\Sigma}^{(ji)} v}_{=\bar{\lambda}^{(j)} \small(\text{with fixed } v)}, \dot{\lambda}^{(j)}) + \dot{a}_p(\bar{\Sigma}^{(ji)} v, \lambda^{(j)}, \dot{p}) &= 0 \quad \forall v \in U \\ \bar{\Sigma}^{(ji)} \dot{\Sigma}^{(ji)} &= \bar{\Sigma}^{(ji)} b^{(i)}(\dot{\lambda}^{(j)}) \end{align} |
model
\begin{align*}
&\begin{cases}
-\nabla \cdot{} m \nabla u^{(i)} = 0 \quad &\forall x \in \mathcal{R}\\
u^{(i)} = g^{(i)} \quad &\forall x \in \partial\mathcal{R}
\end{cases}\\
&\Sigma^{(i, j)} = \int_{\mathcal{R}} h^{(j)} u^{(i)} dx\\
&C = \frac{1}{2 IJ}\sum_{i, j=1}^{I,J} (\Sigma^{(i, j)} - \tilde{\Sigma}^{(i, j)})^2
\end{align*}
|
|
solutions $u^{(i)}$ |
measurements $\Sigma^{(i, j)}$ |
|
![]() |