YIC2025

Reducing Generalization and Training Errors of DeepONets for Approximating Parametric PDEs

  • Gonzalez-Sieiro, Jesus (Basque Center for Applied Mathematics (BCAM))
  • Pardo, David (University of Basque Country (UPV/EHU))
  • Calo, Victor (Curtin University)

Please login to view abstract download link

Solving parametric partial differential equations (PDEs) accurately and promptly is essential in multiple applications nowadays, like real-time prediction, optimal control, or inverse problems. Deep Learning (DL) has emerged recently as a promising alternative to classical methods for addressing this problem. In this context, Neural Operators are one of the most relevant approaches, being DeepONets their most popular architecture. Although the DeepONet approach was initially conceived for learning operators, it can approximate parametric PDEs. This method expresses the solution as a separate representation (or low-rank representation) composed of the dot product of some coefficients by some basis functions. The coefficients depend upon the parameters $p$ of the PDE and are generated by a neural network called branch $br(p)$. Meanwhile, the basis functions depend on the spatial coordinates $x$ and are computed by another neural network called trunk $tr(x)$. The main strength of the DeepONets is that after training these neural networks, solving the parametric PDE for a specific parameter value becomes computationally inexpensive as it only involves forward evaluating the model. However, although UAT guarantees a small approximation error, generalization and training errors often penalize the method's performance. We propose an alternative formulation to the conventional separated representation, adding an extra linear layer $\alpha$. The weights of $\alpha$ are computed by Least Squares, while the rest of the weights of the neural networks are obtained by Gradient Descent, resulting in a hybrid optimization Gradient Descent/Least Squares (GD-LS). This addition aims to enhance the model's training convergence and overall performance [6]. Moreover, we define a loss function that has two components: a physical and a derivative part. The physical part is related to the fulfillment of the equation, while the derivative one is based on the derivatives of the solution w.r.t. the parameters $p$ of the equation. The Hermite interpolation [7] inspires this extra term, which aims to enhance the interpolation capabilities of the model. We train the model with data generated from an automatic differentiable version of OpenFOAM, so that we can compute not only the gradients of the solution but also the desired derivatives w.r.t the parameters $p$. Particularly, we solve the convection-diffusion equation, where the velocity and the diffusivity are the PDE parameters.