A Mean-field Limit for Two-layer Neural Networks Trained with Consensus-based Optimization
Please login to view abstract download link
In recent decades, neural networks have been crucial in image recognition, speech recognition, autonomous driving, and large language models. The success is related to their ability to approximate arbitrary functions, known as the universal approximation theorem. Specifically, in [2], it is shown that two-layer neural networks can approximate functions up to arbitrary accuracy. In reality, these neural networks have to be trained to find the best approximation. However, the training is a tedious task, because the optimization landscape is high dimensional and non-convex. In this work, we consider two-layer neural networks, and train these with the consensus-based optimization (CBO) method [3]. The CBO method is a particle-based global optimization method, which allows for taking a mean-field limit as the number of optimization particles increases. We couple this with the mean-field limit of two-layer neural networks, where we let the number of hidden neurons approach infinity and obtain a continuous neural network. We demonstrate that the classical CBO formulation does not hold for continuous neural networks. Instead, we reformulate the CBO dynamics to an optimal transport formulation in the space of probabilities, following the work in [1]. We illustrate how to incorporate noise into the optimal transport formulation. Lastly, we show that by taking the number of optimization particles to infinity, we obtain a gradient flow on the space of probability measures. [1] . Borghi, M. Herty, and A. Stavitskiy. Dynamics of measure-valued agents in the space of probabilities. Preprint arXiv:2407.06389, July 2024. [2] W. E, C. Ma, and L. Wu. The Barron Space and the Flow-Induced Function Spaces for Neural Network Models. Constructive Approximation, 55(1):369–406, Feb. 2022 [3] R. Pinnau, C. Totzeck, O. Tse, and S. Martin. A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27(1):183–204, Jan. 2017