Data Driven Method: A Mathematics Tour

Informal Courses

School of mathematical sciences
Peiking University
Director: Yiping Lu
Email: luyiping9712 at pku dot edu dot cn
Homepage: http://about.2prime.cn


Course Information

A machine and deep learning tutorial from both Statistic and Computational Math perspective . The introduction is separate into two parts, one is basic machine learning knowledge(un-biased) and another is advanced topics(Biased).

Objective:

  • PAC-learning theory
  • Inverse Problem
  • Optimization
  • Bayesian Filtering
  • Control Theory
  • Computer Vision And Graphics

Outline:pdf

Text book and related meterial

  • Offconvex:link
  • Principled Approaches to Deep Learning Workshop(ICML2017):link
  • Theories of Deep Learning (STATS 385):Homepage
  • Spring 2016, Stats 212b (UC Berkeley): Topics on Deep Learning Github
  • Modern Trends in Nonconvex Optimization for Machine Learning ICML2018 Workshop homepage
  • Theoretical Foundations and Applications of Deep Generative Models ICML2018 Workshop homepage
  • Deep Learning for Physical Sciences NIPS2017 Workshop homepage
  • OPTML NIPS Workshop homepage
  • Bayesian Deep Learning NIPS Workshop homepage
  • Optimal Transport For Machine Learning NIPS2017 Workshop homepage
  • MIT 6.245:homepage
  • UCB ME223:lecture note
  • Optimal Control Blog:link

Syllabus and Course Schedule

Week 1

The Overview and fundamental learning theory

Objective
PAC Learning theory, Information theory.
Meterial
  • Slide: link
  • Concentration Inequality: pdf
Reference
Homework
  • Gregor, Karol, and Yann LeCun. "Learning fast approximations of sparse coding." Proceedings of the 27th International Conference on International Conference on Machine Learning. Omnipress, 2010.
  • Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization." arXiv preprint arXiv:1611.03530(2016).
  • Keskar, Nitish Shirish, et al. "On large-batch training for deep learning: Generalization gap and sharp minima." arXiv preprint arXiv:1609.04836 (2016).

Week 2

Convex Analysis

Objective
Convex analysis: Duality, Geometry view from optimal transport, Basic Optimization.
Slide
  • Convex Analysis:pdf
  • Optimal Transport: pdf
Reference
  • Prof. Zaiwen Wen's Course: link

Week 3

Optimization1: Differential Equation View

Objective
Differential Equation Modeling For Optimization Methods.
Meterial
  • Lecture Note. pdf
Reference
  • Deep Relaxation: partial differential equations for optimizing deep neural networks. arXiv:1704.04932v2
  • Stochastic modified equations and adaptive stochastic gradient algorithms. arXiv:1511.06251
  • A general analysis of the convergence of ADMM. arXiv:1507.06970
  • A Lyapunov Analysis of Momentum Methods in Optimization. arXiv:1611.02635
  • A Variational Perspective on Accelerated Methods in Optimization. arXiv:1603.04245
  • Asynchronous Coordinate Descent under More Realistic Assumptions. arXiv:1705.08494
  • Zhang J, Mokhtari A, Sra S, et al. Direct Runge-Kutta Discretization Achieves Acceleration. arXiv:1805.00521
  • B. Hu and L. Lessard. Dissipativity theory for nesterov’s accelerated method. arXiv preprint arXiv:1706.04381, 2017.
  • Convergence of the ADAM algorithm from a Dynamical System Viewpoint arXiv:1810.02263

Week 4

Optimization2: Different Views Of Acceleration Methods

Objective
Understanding Nesterov Acceleartion Methods.
Meterial
Reference
  • A Lyapunov Analysis of Momentum Methods in Optimization. arXiv:1611.02635
  • A Variational Perspective on Accelerated Methods in Optimization. arXiv:1603.04245
  • Zhang J, Mokhtari A, Sra S, et al. Direct Runge-Kutta Discretization Achieves Acceleration. arXiv:1805.00521
  • S. Bubeck, Y. T. Lee, and M. Singh. A geometric alternative to nesterov’s accelerated gradient descent. arXiv preprint arXiv:1506.08187, 2015.
  • L. Lessard, B. Recht, and A. Packard. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization, 26(1):57–95, 2016.
  • B. Hu and L. Lessard. Dissipativity theory for nesterov’s accelerated method. arXiv preprint arXiv:1706.04381, 2017.
  • Allen-Zhu, Zeyuan, and Lorenzo Orecchia. "Linear coupling: An ultimate unification of gradient and mirror descent." arXiv preprint arXiv:1407.1537 (2014).
  • Chambolle, Antonin, and Thomas Pock. "On the ergodic convergence rates of a first-order primal–dual algorithm." Mathematical Programming 159.1-2 (2016): 253-287.
  • Allen-Zhu, Zeyuan. "Katyusha: The first direct acceleration of stochastic gradient methods." Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 2017.
  • Dissipativity theory for accelerating stochastic variance reduction: A unified analysis of SVRG and Katyusha using semidefinite programs (arXiv) B. Hu, S. Wright, and L. Lessard. International Conference on Machine Learning, 2038–2047, Jul 2018. (Stockholm, Sweden)
  • Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, and Lei Zhang. A pid controller approach for stochastic optimization of deep networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • ANALYSIS OF OPTIMIZATION ALGORITHMS VIA INTEGRAL QUADRATIC CONSTRAINTS: NONSTRONGLY CONVEX PROBLEMS arXiv:1705.03615
  • Reading: The Best Things in Life Are Model Free link

Week 5

PAC-Learning Theory

Objective
VC-dimension, Rademenchar Complexity, PAC-Bayesian, Covering Number, Unifrom Stability.
Meterial
  • Foundations of Machine Learning
  • Basic Course Note: link
Reference
  • On the Discrimination-Generalization Tradeoff in GANs arxiv:1711.02771(ICLR2018)
  • Approximability of Discriminators Implies Diversity in GANsY Bai, T Ma, A Risteski arXiv preprint arXiv:1806.10586
  • Pan Zhou, Jiashi Feng "Understanding generalization and optimization performance of deep CNNs"ICML2018
  • On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt and Tuo Zhao arxiv1806.05159
  • Neyshabur, Behnam, et al. "Toward Understading the Role of Over-Parametrization in Generalization of Neural Networks" arXiv:1805.12076
  • Dropout Training, Data-dependent Regularization, and Generalization Bounds Wenlong Mou, Yuchen Zhou, Jun Gao, Liwei Wang ; PMLR 80:3642-3650(ICML2018)
  • PAC Bayesian thoery and flat minima(Neyshabur, Behnam, et al. "Exploring generalization in deep learning." NIPS2017.)
  • Stronger generalization bounds for deep nets via a compression approachS Arora, R Ge, B Neyshabur, Y Zhang - arXiv preprint arXiv:1802.05296, 2018 - arxiv.org
  • Data-dependent PAC-Bayes priors via differential privacy(arXiv:1802.09583 TBD)
Optimization And Generalization
  • Stability and Convergence Trade-off of Iterative Optimization AlgorithmsY Chen, C Jin, B YuarXiv preprint arXiv:1804.01619
  • Train faster, generalize better: Stability of stochastic gradient descentM Hardt, B Recht, Y Singer - arXiv preprint arXiv:1509.01240, 2015 - arxiv.org(ICML2016)

Week 6

Optimization3: Optimization Methods For Deep Learning

Objective
Optimization and Generalization, Gradient Vainishing.
Reference
  • Generalization From Learning Algorithm.
    • Train Faster, Generalize Better: Stability of Stochastic Gradient Descent. arXiv:1509.01240
    • Entropy-SGD: Biasing Gradient Descent Into Wide Valleys. arXiv:1611.01838
    • Deep Relaxation: partial differential equations for optimizing deep neural networks. arXiv:1704.04932v2
    • Towards Deep Learning Models Resistant to Adversarial Attacks. link
    • Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints arXiv:1707.05947
  • Make The Optimization Easier
    • Improving training of deep neural networks via singular value bounding. arXiv:1611.06013
    • All you need is beyond a good init: Exploring better solution for training exremely deep convolutional neural networks with orthornormality and modulation. arXiv:1703.01827
    • Riemannian approach to batch normalization. arXiv:1709.09603

Week 7

Bayesian Filtering

Objective
Kalman Filter, Particle Filter, Sampling.
Reference

Week 8

Sparsity

Objective
RIP Condition, Sparse Coding Neural Network.
Reference
Compressed Sensing
  • ELEN 6886 Sparse Representation and High-Dimensional Geometry:link
Bridge Compressed Sensing With Neural Network
  • On the Convergence of Learning-based Iterative Methods for Nonconvex Inverse Problems arXiv:1808.05331
  • Theoretical linear convergence of unfolded ISTA and its practial weight and thresholds Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin
  • Hao He, Bo Xin, Satoshi Ikehata, and David Wipf, "From Bayesian Sparsity to Gated Recurrent Nets," Advances in Neural Information Processing Systems (NIPS), 2017.
  • Bo Xin, Yizhou Wang, Wen Gao, Baoyuan Wang, and David Wipf, "Maximal Sparsity with Deep Networks?," Advances in Neural Information Processing Systems (NIPS), 2016.
  • Bora, Ashish, et al. "Compressed sensing using generative models." arXiv preprint arXiv:1703.03208 (2017).
  • Kabkab, Maya, Pouya Samangouei, and Rama Chellappa. "Task-Aware Compressed Sensing with Generative Adversarial Networks." arXiv preprint arXiv:1802.01284 (2018).

Class Project

  • TBD


© Yiping Lu | Last updated: 08/10/2017