This month we are highlighting Dr. Liping Liu, for his National Science Foundation funded project, “CRII: RI: Self-Attention through the Bayesian Lens.” Please see the abstract for this proposal below.
PI: Liping Liu
Funder: National Science Foundation
Title: CRII: RI: Self-Attention through the Bayesian Lens
Abstract: Self-attention, a recently introduced augmentation to neural network architectures, has greatly improved neural network performance in a range of applications, particularly natural language processing and computer vision. Based conceptually on the way the human brain processes complex visual information by learning to selectively focus on the most salient elements, self-attention improves network ability to capture long-range relations in data. However, there is limited study in quantifying uncertainties of the outputs of self-attention networks, though uncertainty quantification is critically important for reliable learning models. The uncertainty of a self-attention network highly depends on where the network pays its attention to. This project will model the uncertainties associated with attention placement, and thereby better quantify the uncertainty in network outputs. The research will also convert part of the architecture design to standard computational procedures by utilizing statistical methods, and thus facilitate the design of new network architectures. This project has a secondary aim of applying the self-attention mechanism to statistical inference for computational efficiency. Ultimately, this project will produce new network architectures that are more reliable and more broadly applicable. This research will also support the development of a deep learning course for both graduate and undergraduate students at Tufts University.
This project examines self-attention networks using a Bayesian approach and proposes a new modification - Bayesian Self-Attention Networks (BSANs). While self-attention networks use "attention weights" to take information from a special range of the data, BSANs assign probabilities to attention weights. By modeling uncertainties in the attention, BSANs naturally inherit desirable properties of Bayesian methods, such as better estimations of uncertainties and less overfitting of data. BSANs will automate the computation of attention probabilities as statistical inference procedures, simplifying the design of new attention-based neural networks, which will only need to determine where to place the attention structure. The study of BSANs will result in new network architectures, with the potential to improve reliability over a wide span of tasks in both natural language processing and graph data analysis. In addition to BSANs, this project will also use the self-attention mechanism to construct probability distributions that involve large numbers of variables. Being flexible and computationally efficient, the constructed distributions will be suitable for distribution approximation in large-scale statistical inference. This project will produce computationally efficient inference methods for Gaussian processes, a widely used model in machine learning and other related areas.