The roots of the renaissance of deep learning can be found in the tremendous success GPU-based object recognition demonstrated in 2014, outperforming all previous methods substantially. Previous methods were rooted in sound statistical and computational models, while deep learning methods remain largely heuristic.
The mathematical concepts used to reason about deep neural networks, such as “manifolds”, “mutual information”, “KL divergence”, “optimization”, “regularization”, etc. serve as useful heuristics and generators of ideas, but they have little predictive power about the error rates, architectures, or generalization of deep neural networks. The problem is compounded by the fact that many of the problems tackled with deep neural networks are themselves complex and not well-defined mathematical.
Practical consequences of the heuristic nature of deep neural networks are the inability to explain the existence of adversarial samples, the inability to provide performance guarantees, the inability to explain how a network arrives at a certain decision, difficulties in predicting the relationship between network architecture and performance, and the need for extensive hyperparameter searches.
In my work, I try to understand the mathematical foundations of deep learning and relate deep learning to optimality results from statistics and algorithmic learning theory.