Session Speaker-Trusted AI
Fast Learning of Graph Neural Networks with Guaranteed Generalizability
Dr. Meng Wang (汪孟)
Associate Professor
Department of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute
Troy, NY, 12180
Biography:
Dr. Meng Wang is an Associate Professor in the Department of Electrical, Computer and Systems Engineering at Rensselaer Polytechnic Institute. She received B.S. and M.S. degrees from Tsinghua University, China, in 2005 and 2007, respectively. She received the Ph.D. degree from Cornell University, Ithaca, NY, USA, in 2012. Prior to joining RPI in 2012, she was a postdoc research scholar at Duke University. Her research areas involve
machine learning and data analytics, energy systems, signal processing, and optimization. She is a recipient of Young Investigator Program (YIP) Awards from Air Force Office of Scientific Research (AFOSR) in 2019 and Army Research Office (ARO) in 2017. She also received School of Engineering Research Excellence Award from Rensselaer. She has been an Associate Editor of IEEE Transactions on Smart Grids since 2020 and was a guest editor of IEEE Journal of Selected Topics in Signal Processing Special Issue on Signal and Information Processing for Critical Infrastructures in 2018.
Abstract:
Compared with the empirical success of neural networks, the theoretical guarantees of two main questions are much less understood. The first one is whether the training algorithm can return a model that minimizes the nonconvex training loss. The second one is whether the learned model has a small test error on the unseen test data. The theoretical understanding remains elusive especially for structured data such as those learned by convolutional neural networks (CNNs) and graph neural networks (GNNs). Our work provides the theoretical guarantee of achieving both zero training error and zero testing error on one-hidden-layer CNNs and GNNs. Specifically, we provide the sample complexity and convergence analyses of using mini-batch gradient descent together with tensor initialization. We directly characterize the dependence of learning performance on the properties of the graph structure. Moreover, we extend the analysis to pruned networks and prove that training on well-pruned networks enjoys a faster convergence rate and returns a better model compared with training the original dense networks, which is a theoretical validation of the Lottery Ticket Hypothesis.