Literature

1951

Robbins, H., & Monro, S. (1951). A stochastic approximation method. The annals of mathematical statistics, 400-407.

2001

Schölkopf, B., Herbrich, R., & Smola, A. J. (2001, July). A generalized representer theorem. In International conference on computational learning theory (pp. 416-426). Springer, Berlin, Heidelberg. (Referenced in Ye, 2022, p. 195, for showing how both regression and classification can be viewed as optimization problems.)

2011

Heaton, J. (2011). Introduction to the Math of Neural Networks (Beta-1). Heaton Research Inc. (From the author: “… sometimes you really do want to know what is going on behind the scenes. You do want to know that math that is involved. In this book you will learn what happens, behind the scenes, with a neural network. You will also be exposed to the math. I will present the material in mathematical terms.”)

2012

Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: Tricks of the trade (pp. 421-436). Springer, Berlin, Heidelberg. (Provides background material, explains why stochastic gradient descent is a good learning algorithm when the training set is large, and provides useful recommendations)

2014

Chen, X. W., & Lin, X. (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514-525.

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. R. (2012). Efficient backprop. In Neural networks: Tricks of the trade (pp. 9-48). Springer, Berlin, Heidelberg.

2015

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature521(7553), 436-444.

Schmidhuber, J. (2015). Deep learning in neural networks: An overviewNeural networks61, 85-117.

2016

Cohen, T., & Welling, M. (2016, June). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990-2999). PMLR.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Hardt, M., Recht, B., & Singer, Y. (2016, June). Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning (pp. 1225-1234). PMLR.

Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

2017

Bourely, A., Boueri, J. P., & Choromonski, K. (2017). Sparse neural networks topologies. arXiv preprint arXiv:1706.05683.

Buduma, N., & Locascio, N. (2017). Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms.  O’Reilly Media, Inc.

Lin, H. W., Tegmark, M., & Rolnick, D. (2017). Why does deep and cheap learning work so well? Journal of Statistical Physics, 168(6), 1223-1247.
Chicago

Neyshabur, B., Bhojanapalli, S., McAllester, D., & Srebro, N. (2017). Exploring generalization in deep learning. Advances in neural information processing systems, 30.

Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., & Liao, Q. (2017). Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a reviewInternational Journal of Automation and Computing14(5), 503-519.

Polson, N. G., & Sokolov, V. (2017). Deep learning: A Bayesian perspective. Bayesian Analysis, 12(4), 1275-1304.

Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.

Taylor, M. (2017), The Math of Neural Networks. Blue Windmill Media.

Vargas, R., Mosavi, A., & Ruiz, R. (2017). Deep learning: a review. Advances in Intelligent Systems and Computing.

Vidal, R., Bruna, J., Giryes, R., & Soatto, S. (2017). Mathematics of deep learning. arXiv preprint arXiv:1712.04741. (Article reviews recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.)

Wiatowski, T., & Bölcskei, H. (2017). A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Transactions on Information Theory, 64(3), 1845-1866.

Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530.

2018

Balestriero, R. (2018, July). A spline theory of deep learning. In International Conference on Machine Learning (pp. 374-383). PMLR.

Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2018). Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18, 1-43.

Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Rawle, W. (2018). The Mathematical Foundations of Artificial Intelligence. Deep Conversations on Deep Learning. IEEE 

Xu, J., Zhang, Z., Friedman, T., Liang, Y., & Broeck, G. (2018, July). A semantic loss function for deep learning with symbolic knowledge. In International conference on machine learning(pp. 5502-5511). PMLR.

Zhou, D. X. (2020). Universality of deep convolutional neural networks. Applied and computational harmonic analysis, 48(2), 787-794. (Author shows a deep convolutional neural network is universal, meaning it can be used to approximate any continuous function to an arbitrary accuracy when the depth of the neural network is large enough. This answers an open question in learning theory.)

2019

Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M., Van Essen, B.C., Awwal, A.A. and Asari, V.K. (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics, 8(3), p.292.

Balestriero, R., Cosentino, R., Aazhang, B., & Baraniuk, R. (2019). The geometry of deep networks: Power diagram subdivision. Advances in Neural Information Processing Systems, 32. (Understanding the geometry of the layer partition regions – and how the layer partition regions combine into the deep neural network input partition – is key to understanding the operation of deep neural networks.)

Bolcskei, H., Grohs, P., Kutyniok, G., & Petersen, P. (2019). Optimal approximation with sparsely connected deep neural networks. SIAM Journal on Mathematics of Data Science, 1(1), 8-45.

Elbrächter, D., Perekrestenko, D., Grohs, P., & Bölcskei, H. (2019). Deep neural network approximation theory. arXiv preprint arXiv:1901.02220.

Hanin, B., & Rolnick, D. (2019, May). Complexity of linear regions in deep networks. In International Conference on Machine Learning (pp. 2596-2604). PMLR.

Hanin, B., & Rolnick, D. (2019). Deep ReLU networks have surprisingly few activation patterns. Advances in neural information processing systems, 32.

Higham, C. F., & Higham, D. J. (2019). Deep learning: An introduction for applied mathematicians. Siam review, 61(4), 860-891. (Article provides a very brief introduction to basic ideas that underlie deep learning, from an applied mathematics perspective.)

Montanelli, H., & Du, Q. (2019). New error bounds for deep ReLU networks using sparse grids. SIAM Journal on Mathematics of Data Science, 1(1), 78-92.

Willett, R. (2019) Mathematical Foundations of Machine Learning. University of Chicago. (17 video lectures)

2020

Balestriero, R., & Baraniuk, R. G. (2020). Mad max: Affine spline insights into deep learning. Proceedings of the IEEE, 109(5), 704-727.

Calin, O. (2020). Deep learning architectures. Springer International Publishing.

Gühring, I., Raslan, M., & Kutyniok, G. (2020). Expressivity of deep neural networks. arXiv preprint arXiv:2007.04759.

Miconi, T., Rawal, A., Clune, J., & Stanley, K. O. (2020). Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. arXiv preprint arXiv:2002.10585.

Mukherjee, A. (2020). A study of the mathematics of deep learning (Doctoral dissertation, The Johns Hopkins University). (Author takes several steps towards building strong theoretical foundations for new paradigms of deep-learning, including understanding: neural function spaces, deep learning algorithms, and the risk of stochastic neural nets)

Petersen, P. C. (2020). Neural network theory. University of Vienna.2021

Ruthotto, L., & Haber, E. (2021). An introduction to deep generative modeling. GAMM‐Mitteilungen, 44(2), e202100008

Telgarsky, M. (2020). Deep learning theory lecture notes. (PDF. Web version here)

Vershynin, R. (2020). Memory capacity of neural networks with threshold and rectified linear unit activations. SIAM Journal on Mathematics of Data Science, 2(4), 1004-1033.

Wang, H., & Yeung, D. Y. (2020). A survey on Bayesian deep learning. ACM Computing Surveys (CSUR), 53(5), 1-37.

Wilson, A. G., & Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33, 4697-4708

Zhang, X., & Wu, D. (2020). Empirical studies on the properties of linear regions in deep neural networks. arXiv preprint arXiv:2001.01072.

Zhou, D. X. (2020). Theory of deep convolutional neural networks: Downsampling. Neural Networks, 124, 319-327.

Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2020). Gradient descent optimizes over-parameterized deep ReLU networks. Machine learning, 109(3), 467-492. (Authors show that with proper random weight initialization, gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under certain assumption on the training data.)

2021

Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M. and Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of big Data, 8(1), pp.1-74.

Bartlett, P. L., Montanari, A., & Rakhlin, A. (2021). Deep learning: a statistical viewpoint. Acta numerica, 30, 87-201.

Berner, J., Grohs, P., Kutyniok, G., & Petersen, P. (2021). The modern mathematics of deep learning. arXiv preprint arXiv:2105.04026. (Authors describe the new field of mathematical analysis of deep learning.)

Bronstein, M. M., Bruna, J., Cohen, T., & Veličković, P. (2021). Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478.

Cohen, T. S. (2021). Equivariant convolutional networks. PhD thesis, University of Amsterdam.

Dar, Y., Muthukumar, V., & Baraniuk, R. G. (2021). A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning. arXiv preprint arXiv:2109.02355.

Dash, T., Chitlangia, S., Ahuja, A., & Srinivasan, A. (2021). How to tell deep neural networks what we know. ArXiv abs/2107.10295.

Dawani, J. (2020). Hands-On Mathematics for Deep Learning: Build a solid mathematical foundation for training efficient deep neural networks. Packt Publishing Ltd.

Finzi, M., Welling, M., & Wilson, A. G. (2021, July). A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning (pp. 3318-3328). PMLR.

Gareth, J,, Witten D., Hastie T. and Tibshirani R. (2013) An introduction to statistical learning: with applications in R. 2nd Edition, Springer. (Chapter 10: Deep learning)

Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9), 5149-5169.

Hu, X., Chu, L., Pei, J., Liu, W., & Bian, J. (2021). Model complexity of deep learning: A survey. Knowledge and Information Systems, 63(10), 2585-2619.

Huisman, M., Van Rijn, J. N., & Plaat, A. (2021). A survey of deep meta-learning. Artificial Intelligence Review, 54(6), 4483-4541.

Lu, J., Shen, Z., Yang, H., & Zhang, S. (2021). Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5), 5465-5506.

Satorras, V. G., Hoogeboom, E., & Welling, M. (2021, July). E(n) equivariant graph neural networks. In International conference on machine learning (pp. 9323-9332). PMLR.

Schmidt-Hieber, J. (2021). The Kolmogorov–Arnold representation theorem revisited. Neural networks, 137, 119-126.

Shen, Z., Yang, H., & Zhang, S. (2021). Neural network approximation: Three hidden layers are enough. Neural Networks, 141, 160-173. (Authors show a combination of simple activation functions can create super approximation power.)

Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive into deep learning. arXiv preprint arXiv:2106.11342.

Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.

2022

Bohn, B., Griebel, M., & Kannan, D. (2022). Deep Neural Networks and PIDE Discretizations. SIAM Journal on Mathematics of Data Science, 4(3), 1145-1170.

Buduma, N., Buduma, N., & Papa, J. (2022). Fundamentals of deep learning. O’Reilly Media, Inc.

Dash, T., Chitlangia, S., Ahuja, A., & Srinivasan, A. (2022). A review of some techniques for inclusion of domain-knowledge into deep neural networks. Scientific Reports, 12(1), 1-15

Huang, S., Feng, W., Tang, C., & Lv, J. (2022). Partial Differential Equations Meet Deep Neural Networks: A Survey. arXiv preprint arXiv:2211.05567.

Jospin, L. V., Laga, H., Boussaid, F., Buntine, W., & Bennamoun, M. (2022). Hands-on Bayesian neural networks—A tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2), 29-48

Kutyniok, G. (2022). The Mathematics of Artificial Intelligence. arXiv preprint arXiv:2203.08890. This article is a good introduction to deep learning and to several areas of research in the mathematics of deep learning

Kutyniok, G. (2022). The Mathematics of Artificial Intelligence. International Congress of Mathematicians 2022. Section “Numerical Analysis and Scientific Computing”. Virtual Event July 6-14, 2022. (You Tube video)

Kutyniok, G., Petersen, P., Raslan, M., & Schneider, R. (2022). A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation, 55(1), 73-125.

Lim, L. H., & Nelson, B. J. (2022). What is an equivariant neural network? arXiv preprint arXiv:2205.07362.

Lu, J. (2022). Gradient Descent, Stochastic Optimization, and Other Tales. arXiv preprint arXiv:2205.00832.

Parhi, R., & Nowak, R. D. (2022). What kinds of functions do deep neural networks learn? Insights from variational spline theory. SIAM Journal on Mathematics of Data Science, 4(2), 464-489.

Pearce-Crump, E. (2022) Brauer’s Group Equivariant Neural Networks. arXiv:2212.08630

Xie, Z., Tang, Q. Y., He, Z., Sun, M., & Li, P. (2022). Rethinking the Structure of Stochastic Gradients: Empirical and Statistical Evidence. arXiv preprint arXiv:2212.02083

Ye, A. (2022). Modern Deep Learning Design and Application Development. Springer

Ye, J. C. (2022). Geometry of Deep Learning: A Signal Processing Perspective (Vol. 37). Springer Nature.