Current Courses

Bayesian Methods for Surrogate Modeling and Dimensionality Reduction

University of Notre Dame, Spring 2019

T R - 12:30A - 1:45 PM (Lectures, DeBartolo 129)

Friday: 11:30A - 12:20 PM (Recitation, DeBartolo 102)

Professor Nicholas Zabaras

 


Lecture Notes and Videos

  1. Introduction to Machine Learning
    • Supervised and unsupervised learning, reinforcement learning; Regression and classification, Probabilistic predictions and point estimates; Examples, Document classification, Iris flower dataset, Image classification, Face detection and recognition; Supervised versus unsupervised learning; Unsupervised learning, Hidden/Latent variables, Discovering clusters, Dimensionality Reduction, Discovering graph structure, Matrix Completion; Parametric Vs. non parametric models; K-Nearest Neighbor Classifiers; Linear and Logistic Regression; Model selection, cross-validation and overfitting; The curse of dimensionality; No free lunch theorem. 
      [Video-Lecture] [Lecture Notes]
  2. Generative Bayesian Models for Discrete Data
    • Generative Models; Bayesian concept learning, Likelihood, Prior, Posterior, Posterior predictive distribution, Plug-in Approximation; The beta-binomial model, Likelihood, Prior, Posterior, Posterior predictive distribution, Blackswan paradoxes and Plug-in approximations, Outcome of multiple future trials, Beta-Binomial Distribution; The Dirichlet-multinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words. 
      [Video-Lecture] [Lecture Notes]
  3. Generative Bayesian Models for Discrete Data (continued)
    • The Dirichlet-multinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words; Bayesian Analysis of the Uniform Distribution; Naive Bayes classifiers, Examples, MLE for Naïve Bayes Classifier, Example for bag-of-words binary class model, Summary of the Algorithm, Bayesian Naïve Bayes, Using the model for prediction, The log-sum-exp trick, Feature selection using mutual information; Classifying documents using bag of words. 
      [Video-Lecture] [Lecture Notes]
  4. Generative Bayesian Models for Discrete Data (continued) and Summarizing Posterior Distributions and Bayesian Model Selection
    • Feature selection using mutual information; Classifying documents using bag of words; Summarizing Posterior Distributions, MAP Estimation, Reparametrization, Credible Intervals, HPD Intervals; Bayesian Inference for a Difference in Proportions Model Selection and Cross Validation, Bayesian Model Selection, Bayesian Occam’s Razor, Marginal Likelihood, Evidence Approximation; Bayes Factors and Jeffreys Scale of Evidence, Jeffreys-Lindley Paradox. 
      [Video-Lecture] [Lecture Notes]
  5. Bayesian Model Selection (continued) and Prior Models, Hierarchical Bayes, Empirical Bayes
    • Laplace Approximation to the Posterior and Model Evidence approximation, Bayesian Information Criterion, Akaike Information Criterion; Effect of the Prior, Empirical Bayes, Prior modeling; Conjugate priors , Exponential families, Mixture of conjugate priors, Non-informative priors; Translation and Scale invariance, Jeffrey’s non-informative prior, Robust Priors. 
      [Video-Lecture] [Lecture Notes]
  6. Prior Models, Hierarchical Bayes, Empirical Bayes (continued)
    • Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator. 
      [Video-Lecture] [Lecture Notes]
  7. Prior Models, Hierarchical Bayes, Empirical Bayes (continued) and Introduction to Decision Theory
    • Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator; Introduction to Bayesian Decision Theory, Bayes Estimator, Map Estimate and 0-1 Loss, Posterior Mean and Quadratic Loss, L1 Loss, MAP Estimator; Decision Theory for Regression, the Squared Loss Function, Alternate Approaches to Regression, The Minkowski Loss Function; Decision Theory in the Context of Classification, Minimizing the Misclassification Rate, Minimizing the Expected Loss, Reject Option. 
      [Video-Lecture] [Lecture Notes]
  8. Introduction to Decision Theory (continued)
    • Minimizing the Misclassification Rate, Reject Option, Inference and Decision (Generative and Discriminative Models), Unbalanced Class Priors, Combining Models, Naïve Bayes Model; False Positive vs. False Negative, ROC Curves, Precision Recall Curves, F-Scores, False Discovery Rates, Contextual Bandits. 
      [Video-Lecture] [Lecture Notes]
  9. Introduction to Linear Regression Models
    • Over fitting and MLE, Effect of Data Size, Training and Test Errors, Regularization and Model Complexity; Linear basis function models, MLE and Least Squares, Geometry of least squares, Sequential Learning, Robust Linear Regression, Ridge Regression, Lasso Regularizer and Sparse Solutions, Multi-output Regression; The Bias-Variance Decomposition. 
      [Video-Lecture] [Lecture Notes]
  10. Introduction to Linear Regression Models (continued)
    • The Bias-Variance Decomposition; MAP Estimate and Regularized Least Squares, Posterior Distribution, Predictive Distribution, Pointwise Uncertainty, Plug-in Approximation, Covariance between the Predictions, Equivalent Kernel Representation; Computing the Bias Parameter, Centered Data. Bayesian inference in linear regression when σ2 unknown; Zellner’s g-Prior, Uninformative (Semi-Conjugate) Prior; Evidence Approximation for Regression; Bayesian model selection. 
      [Video-Lecture] [Lecture Notes]
  11. Bayesian Linear Regression Models
    • Bayesian inference in linear regression when σ2 unknown; Zellner’s g-Prior, Uninformative (Semi-Conjugate) Prior; Evidence Approximation for Regression; Bayesian model selection. 
      [Video-Lecture] [Lecture Notes]
  12. Linear Models of Classifications
    • Linear models for classification,Generalized Linear Models, Discriminant Functions; Least Squares approach to Classification; Generative Vs Discriminative Classifiers; Fishers linear discriminant, Probabilistic Interpretation; Online Learning and Stochastic Optimization; The Perceptron algorithm, Bayesian perspective. 
      [Video-Lecture] [Lecture Notes]
  13. Generative Models, Stochastic Optimization, The Perceptron Algorithm
     
    • Probabilistic Generative Models for two Classes, Logistic Sigmoid, Models with K>2, Gaussian Class Conditionals; Maximum Likelihood Solution, Multiclass Case, Discrete Features; Exponential Family, Generalization to Exponential Family Class Conditionals; Online Learning and Stochastic Optimization, Robbins-Monro Algorithm, LMS Algorithm; The Perceptron algorithm, Bayesian perspective. 
      [Video-Lecture] [Lecture Notes]
  14. Discriminative Models, Logistic Regression
    • Probabilistic Discriminative Models; Nonlinear Basis in Classification Models; Logistic Regression and Generalized Linear Models; Cross Entropy Error, Sequential Update, Steepest Descent Method, Newton’s Method, BFGS, Regularization, Linearly Separable Data, Iterative Reweighted Least Squares; Multiclass Logistic Regression. 
      [Video-Lecture] [Lecture Notes]
  15. Probit Regression (Continued) and Generalized Linear Models and the Exponential Family
    • Probit Regression, IRLS for Probit Regression; The exponential family, Log partition function, MLE for the exponential family, Bayes for the exponential family, Maximum entropy derivation of the exponential family; Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradient-based optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multi-task learning, Hierarchical Bayes, Domain adaptation; Generalized linear mixed models; Learning to rank, Loss functions for ranking.  
      [Video-Lecture] [Lecture Notes]
  16. Generalized Linear Models and the Exponential Family (continued)
    • Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradient-based optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multi-task learning. 
      [Video-Lecture] [Lecture Notes]
  17. Expectation-Maximization, Gaussian Mixture Models
    • Latent Variable Models, Gaussian Mixtures K-Means Clustering and EM; Image Compression and Vector Quantization; Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data Log-Likelihood; EM Algorithms and K-Means. 
      [Video-Lecture] [Lecture Notes]

  18. Expectation-Maximization (Continued)
    • Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data Log-Likelihood; Initialization of EM; EM Algorithms and K-Means; Mixture of Bernoulli Distributions, MAP Estimation; EM for Bayesian Linear Regression; EM for Relevance Determination in Regression Models. 
      [Video-Lecture] [Lecture Notes]
  19. Expectation-Maximization (Continued)
    • Lower bound on evidence, Generalization of the EM Algorithm, EM in the Space of Parameters; Factorization of the Posterior P(Z|X,θ); EM for MAP Estimation; Computing MLE/MAP and non Convexity; Fitting Models with Missing Data; EM for the Student's Distribution, Mixture of Student's. 
      [Video-Lecture] [Lecture Notes]
  20. Continuous Latent Variables
    • Continuous Latent Variables; Factor Analysis, Principal Component Analysis, Maximum variance formulation, Minimum-error formulation, Applications of PCA, Introduction to Probabilistic PCA. 
      [Video-Lecture] [Lecture Notes]
  21. Continuous Latent Variables (Continued)
    • PCA for high-dimensional data; Probabilistic PCA, Maximum likelihood PCA, EM algorithm for PCA, Bayesian PCA; Nonlinear Latent Variable Models, Independent Component Analysis, Autoassociative Neural Networks; Modeling Nonlinear Manifolds, Multidimensional Scaling (MDS), ISOMAP, GTM, Self-organizing Maps, Locally Linear Embedding, Latent Trait Models, Density Network. 
      [Video-Lecture] [Lecture Notes]
  22. Kernel Methods
    • Keeping or discarding the training data, Kernel Function, Kernel Methods and Kernel Trick, Dual Representation, Constructing Kernels from Basis Functions, Constructing Kernels Directly; Combining Kernels, Gaussian Kernel, Kernels for Comparing Documents, Matern Kernel, String Kernels, Symbolic Input, Pyramid Match Kernels, Probabilistic Generative Models, Probabilistic Product Kernels, Mercer Kernels, Fisher Kernel, Sigmoidal Kernel, Radial Basis Functions; Kernel Machines, L1VM and RVM, Smoothing Kernels for Generative Modeling, Kernel Density Estimation, KNN Classifiers; Interpolation with Noisy Inputs, Nadaraya-Watson model, Kernel Regression, Locally Weighted Regression; The Kernel Trick, K-Medoids Clustering; Kernel PCA. 
      [Video-Lecture] [Lecture Notes]
  23. Sparse Kernel Machines
    • Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory; Relevance Vector Machines, RVM for regression, Analysis of sparsity, RVM for classification, RVM for Uncertainty quantification and Surrogate Modeling. 
      [Video-Lecture] [Lecture Notes]
  24. Support Vector Machines
    • Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory. 
      [Video-Lecture] [Lecture Notes]
  25. Support Vector Machines (Continued)
    • Support Vector Machine, Overlapping class distributions, Chunking, Decomposition Methods; Probabilistic predictions with SVM, Relation to Logistic Regression, Relation to logistic regression, Hinge Error Function, Choosing the SVM parameters; SVM for Regression, Dual problem and KKT conditions; Computational learning theory. 
      [Video-Lecture] [Lecture Notes]
  26. Sparse Linear Models
    • Bayesian Variable Selection, The Spike and Slab Model, L0 regularization, Algorithms, Greedy Search, Orthogonal Least Squares, Matching Pursuits and Backwards Selection, EM and Variational Inference; L1 regularization, Sparse Solutions, Optimality Conditions for Lasso. 
      [Video-Lecture] [Lecture Notes]
  27. Sparse Linear Models (Continued)
    • Comparison of Least Squares, Lasso, Ridge & Subset Selection, Regularization Path, Model Selection, Bolasso; L1Regularization Algorithms, Coordinate Descent, LARS, Proximal and Gradient Projection Algorithms, Proximal Operators, Proximal/Gradient Method, Nesterov’s Method, EM for Lasso, Group Lasso, Fused Lasso, Elastic Net; Non-convex regularization, Bridge regression, Hierarchical adaptive Lasso (HAL), EM for HAL, Other Hierarchical Priors; Automatic Relevance Determination, ARD for Linear Regression, Sparsity, Connection to MAP Estimation, EM for ARD, Fixed-point Algorithm, Iteratively Reweighted L1 Algorithm, ARD for Losistic Regression; Sparse Coding, Learning a Sparse Coding Dictionary, Application to Natural Image Patches, Compressed Sensing, Image Impainting and Denoising. 
      [Video-Lecture] [Lecture Notes]
  28. Introduction to Gaussian Processes
    • What is a Gaussian Process? Covariance and Mean Functions; Taking Samples from a Gaussian Process; GP Prior and Posterior; Computing the Hyperparameters, Relevance Determination; Gaussian Process Regression; Kernel Functions and Kernel Selection; Gaussian Process Classification; GPs and Neural Networks. 
      [Video-Lecture] [Lecture Notes]
  29. Introduction to Variational Inference
    • Outline of variational inference, Factorized distributions, Mean Field Approximation, Approximating a Gaussian using a factorized Gaussian; Alternative forms of KL divergence, Gaussian Approximation of p(x) by KL(p||q) minimization, Alpha family of divergences; Variational optimization and the EM algorithm, Mean Field for the Ising Model, Structured Mean Field, Variational Inference for the univariate Gaussian, Variational optimization and model selection; Variational Linear Regression, Predictive Distribution, Lower Bound, Selection of the order of the polynomial, Mixture of Gaussians, Variational Message Passing, Variational Lower Bound, Re-estimation equations using the variational lower bound, Predictive distribution, Case of large data set, Determining the number of mixture components, MAP Estimate versus MLE, Induced factorizations. 
      [Video-Lecture] [Lecture Notes]

Homework

  • January 31, Homework 1
  • February 14, Homework 2
    • Mixture of Conjugate Priors, Decision Theory, Bayesian Linear Regression, Model Evidence, Bayes factors 
      [Homework] [Solution] [Software]

Course Info and References

Credit: 4 Units

Lectures: Tuesdays and Thursdays 12:30 -- 1:45 pm, DeBartolo Hall 129.

Recitation: Fridays. 11:30 -- 12:15 pm, DeBartolo Hall 102.

Professor: Nicholas Zabaras, 311 I Cushing Hall, nzabaras@gmail.com

Teaching Fellow (Volunteer): Dr. Souvik Chakrabortycsouvik41@gmail.com

Office hours: Souvik Chakraborty, Tues. & Thur. 4:30 -- 5:30 p.m., 301 Riley Hall; N. Zabaras, Fridays 4:5:30 pm, 311 I Cushing and by appointment.

Course description: The course covers selective topics on Bayesian scientific computing relevant to high-dimensional data-driven engineering and scientific applications. An overview of Bayesian computational statistics methods will be provided including Monte Carlo methods, exploration of posterior distributions, model selection and validation, MCMC and Sequential MC methods and inference in probabilistic graphical models. Bayesian techniques for building surrogate models of expensive computer codes will be introduced including regression methods for uncertainty quantification, Gaussian process modeling and others. The course will demonstrate these techniques with a variety of scientific and engineering applications including among others inverse problems, dynamical system identification, tracking and control, uncertainty quantification of complex multiscale systems, physical modeling in random media, and optimization/design in the presence of uncertainties. The students will be encouraged to integrate the course tools with their own research topics.

Intended audience: Graduate Students in Mathematics/Statistics, Computer Science, Engineering, Physical/Chemical/Biological/Life Sciences.

References of General Interest: The course lectures will become available on the course web site. For in depth study, a list of articles and book chapters from the current literature will also be provided to enhance the material of the lectures. There is no required text for this course. Some important books that can be used for general background reading in the subject areas of the course include the following:

References on (Bayesian) Machine Learning:

Homework: assigned every three to four lectures. Most of the homework will require implementation and application of algorithms discussed in class. We anticipate between five to seven homework sets. All homework solutions and affiliated computer programs should be mailed by midnight of the due date to this Email address. All attachments should arrive on an appropriately named zipped directory (e.g. HW1_Submission_YourName.rar). We would prefer typed homework (include in your submission all original files e.g. Latex and a Readme file for compiling and testing your software).

Term project: A project is required in statistical and computational aspects of the course. Students are encouraged to investigate aspects of Machine Learning that extend topics covered in the lectures. We would like to see both methodological developments as well implementation of algorithms (new or available in the literature). Topics that are way out of line to the material covered in the course will not be acceptable even if they are related to the big picture of statistical learning (e.g. topics on Deep Learning will not be appropriate). Please submit an abstract (one page) by February 15th that includes a description of your project plans with appropriate references (provide links to them). A short written report (in the format of NIPS papers) is required as well as a presentation. Project presentations will be given at the end of the semester as part of a day or two long symposium. Appropriate projects will be those that educate the class in unexplored (during the lectures) Machine Learning topics and clearly demonstrate the depth of knowledge of statistical/machine learning acquired by each student.

Grading: Homework 60% and Project 40%.

Prerequisites: Linear Algebra, Probability theory, Introduction to Statistics and Programming (any language). The course will require significant effort especially from those not familiar with computational statistics. It is a course intended for those that value the role of Bayesian inference and machine learning on their research.


Syllabus

  1. Introduction to Machine Learning
    • Supervised and unsupervised machine learning
    • Parametric Vs. Non-Parametric Models
    • Examples, Linear regression, Logistic regression, K-nearest neighbors
    • Overfitting, Model Selection, No free lunch Theorem
    • Curse of Dimensionality
  2. Generative Models for Discrete Data
    • Bayesian concept learning
    • The Beta-Binomial model, the Dirichlet-Multinomial model
    • Naive Bayes classifiers
  3. Introduction to Bayesian Statistics
    • Bayes' rule, estimators and loss functions
    • Bayes' factors, prior/likelihood & posterior distributions
    • Priors, uninformative, Jeffreys & robust priors, Mixture of conjugate priors
    • Density estimation methods
    • Bayesian model validation
    • Empirical Bayes
    • Bayesian Decision Theory
  4. Linear Regression Models
    • MLE Estimation
    • Robust linear regression
    • Ridge regression
    • Bayesian linear regression
  5. Logistic Regression
    • Model specification and fitting
    • Bayesian logistic regression
    • Online learning, Stochastic optimization
    • Generative and discriminative classifiers
  6. Generalized Linear Models and the Exponential Family
    • The exponential family
    • Generalized linear models
    • Probit regression
    • Multi-task learning, Learning to rank
  7. Mixture Models and the EM Algorithm
    • Latent variable models
    • Mixture models, Parameter estimation
    • The EM Algorithm
    • Model Selection
    • Models with missing data
  8. Latent linear models
    • Factor analysis
    • Principal component analysis
    • PCA for Categorical and Multiview Data
    • Independent component analysis
  9. Sparse Linear Models
    • Bayesian variable selection
    • L1 regularization, Lasso
    • Group and fused Lasso, Elastic Net
    • Non-convex regularization
    • Automatic relevevance detetmination
    • Sparse coding
  10. Kernel Methods
    • Kernel functions
    • Kernel functions and GLMs
    • The kernel trick
    • Kernel PCA
    • Support vector machines
    • Automatic relevance determination (RVM)
    • Kernels for Generative models, Kernel density estimation, kernel regression, locally weighted regression
  11. Variational Inference Methods
    • Forward and reverse KL Divergence
    • The mean field approximation, Structured Mean Field
    • Variational Bayes
    • Variational Bayes EM
    • Local variational bounds
  12. Gaussian Processes as Surrogate Models
    • GPs for regression
    • GPs for classification
    • Connection with other methods (linear models, linear smoothers, SVMs, Neural nets, RKHS, etc.)
    • Gaussian Process latent variable model
    • Sparse Gaussian processes
    • Deep Gaussian Processes
  13. Adaptive Basis Function Models
    • Classification and regression trees
    • Generalized additive models
    • Boosting
    • Feedforward neural networks
    • Ensemble Learning