Bayesian Methods for Surrogate Modeling and Dimensionality Reduction
Bayesian Methods for Surrogate Modeling and Dimensionality Reduction
University of Notre Dame, Spring 2019
Professor Nicholas Zabaras
Lecture Notes and Videos
 Introduction to Machine Learning
 Supervised and unsupervised learning, reinforcement learning; Regression and classification, Probabilistic predictions and point estimates; Examples, Document classification, Iris flower dataset, Image classification, Face detection and recognition; Supervised versus unsupervised learning; Unsupervised learning, Hidden/Latent variables, Discovering clusters, Dimensionality Reduction, Discovering graph structure, Matrix Completion; Parametric Vs. non parametric models; KNearest Neighbor Classifiers; Linear and Logistic Regression; Model selection, crossvalidation and overfitting; The curse of dimensionality; No free lunch theorem.
[VideoLecture] [Lecture Notes]
 Supervised and unsupervised learning, reinforcement learning; Regression and classification, Probabilistic predictions and point estimates; Examples, Document classification, Iris flower dataset, Image classification, Face detection and recognition; Supervised versus unsupervised learning; Unsupervised learning, Hidden/Latent variables, Discovering clusters, Dimensionality Reduction, Discovering graph structure, Matrix Completion; Parametric Vs. non parametric models; KNearest Neighbor Classifiers; Linear and Logistic Regression; Model selection, crossvalidation and overfitting; The curse of dimensionality; No free lunch theorem.
 Generative Bayesian Models for Discrete Data
 Generative Models; Bayesian concept learning, Likelihood, Prior, Posterior, Posterior predictive distribution, Plugin Approximation; The betabinomial model, Likelihood, Prior, Posterior, Posterior predictive distribution, Blackswan paradoxes and Plugin approximations, Outcome of multiple future trials, BetaBinomial Distribution; The Dirichletmultinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words.
[VideoLecture] [Lecture Notes]
 Generative Models; Bayesian concept learning, Likelihood, Prior, Posterior, Posterior predictive distribution, Plugin Approximation; The betabinomial model, Likelihood, Prior, Posterior, Posterior predictive distribution, Blackswan paradoxes and Plugin approximations, Outcome of multiple future trials, BetaBinomial Distribution; The Dirichletmultinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words.
 Generative Bayesian Models for Discrete Data (continued)
 The Dirichletmultinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words; Bayesian Analysis of the Uniform Distribution; Naive Bayes classifiers, Examples, MLE for Naïve Bayes Classifier, Example for bagofwords binary class model, Summary of the Algorithm, Bayesian Naïve Bayes, Using the model for prediction, The logsumexp trick, Feature selection using mutual information; Classifying documents using bag of words.
[VideoLecture] [Lecture Notes]
 The Dirichletmultinomial model, Likelihood, Prior, Posterior, Posterior predictive, Language Model using Bag of Words; Bayesian Analysis of the Uniform Distribution; Naive Bayes classifiers, Examples, MLE for Naïve Bayes Classifier, Example for bagofwords binary class model, Summary of the Algorithm, Bayesian Naïve Bayes, Using the model for prediction, The logsumexp trick, Feature selection using mutual information; Classifying documents using bag of words.
 Generative Bayesian Models for Discrete Data (continued) and Summarizing Posterior Distributions and Bayesian Model Selection
 Feature selection using mutual information; Classifying documents using bag of words; Summarizing Posterior Distributions, MAP Estimation, Reparametrization, Credible Intervals, HPD Intervals; Bayesian Inference for a Difference in Proportions Model Selection and Cross Validation, Bayesian Model Selection, Bayesian Occam’s Razor, Marginal Likelihood, Evidence Approximation; Bayes Factors and Jeffreys Scale of Evidence, JeffreysLindley Paradox.
[VideoLecture] [Lecture Notes]
 Feature selection using mutual information; Classifying documents using bag of words; Summarizing Posterior Distributions, MAP Estimation, Reparametrization, Credible Intervals, HPD Intervals; Bayesian Inference for a Difference in Proportions Model Selection and Cross Validation, Bayesian Model Selection, Bayesian Occam’s Razor, Marginal Likelihood, Evidence Approximation; Bayes Factors and Jeffreys Scale of Evidence, JeffreysLindley Paradox.
 Bayesian Model Selection (continued) and Prior Models, Hierarchical Bayes, Empirical Bayes
 Laplace Approximation to the Posterior and Model Evidence approximation, Bayesian Information Criterion, Akaike Information Criterion; Effect of the Prior, Empirical Bayes, Prior modeling; Conjugate priors , Exponential families, Mixture of conjugate priors, Noninformative priors; Translation and Scale invariance, Jeffrey’s noninformative prior, Robust Priors.
[VideoLecture] [Lecture Notes]
 Laplace Approximation to the Posterior and Model Evidence approximation, Bayesian Information Criterion, Akaike Information Criterion; Effect of the Prior, Empirical Bayes, Prior modeling; Conjugate priors , Exponential families, Mixture of conjugate priors, Noninformative priors; Translation and Scale invariance, Jeffrey’s noninformative prior, Robust Priors.
 Prior Models, Hierarchical Bayes, Empirical Bayes (continued)
 Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator.
[VideoLecture] [Lecture Notes]
 Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator.
 Prior Models, Hierarchical Bayes, Empirical Bayes (continued) and Introduction to Decision Theory
 Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator; Introduction to Bayesian Decision Theory, Bayes Estimator, Map Estimate and 01 Loss, Posterior Mean and Quadratic Loss, L_{1} Loss, MAP Estimator; Decision Theory for Regression, the Squared Loss Function, Alternate Approaches to Regression, The Minkowski Loss Function; Decision Theory in the Context of Classification, Minimizing the Misclassification Rate, Minimizing the Expected Loss, Reject Option.
[VideoLecture] [Lecture Notes]
 Hierarchical Bayesian Models, Modeling Cancer Rates Example; Empirical Bayes, Evidence Approximation, James Stein Estimator; Introduction to Bayesian Decision Theory, Bayes Estimator, Map Estimate and 01 Loss, Posterior Mean and Quadratic Loss, L_{1} Loss, MAP Estimator; Decision Theory for Regression, the Squared Loss Function, Alternate Approaches to Regression, The Minkowski Loss Function; Decision Theory in the Context of Classification, Minimizing the Misclassification Rate, Minimizing the Expected Loss, Reject Option.
 Introduction to Decision Theory (continued)
 Minimizing the Misclassification Rate, Reject Option, Inference and Decision (Generative and Discriminative Models), Unbalanced Class Priors, Combining Models, Naïve Bayes Model; False Positive vs. False Negative, ROC Curves, Precision Recall Curves, FScores, False Discovery Rates, Contextual Bandits.
[VideoLecture] [Lecture Notes]
 Minimizing the Misclassification Rate, Reject Option, Inference and Decision (Generative and Discriminative Models), Unbalanced Class Priors, Combining Models, Naïve Bayes Model; False Positive vs. False Negative, ROC Curves, Precision Recall Curves, FScores, False Discovery Rates, Contextual Bandits.
 Introduction to Linear Regression Models
 Over fitting and MLE, Effect of Data Size, Training and Test Errors, Regularization and Model Complexity; Linear basis function models, MLE and Least Squares, Geometry of least squares, Sequential Learning, Robust Linear Regression, Ridge Regression, Lasso Regularizer and Sparse Solutions, Multioutput Regression; The BiasVariance Decomposition.
[VideoLecture] [Lecture Notes]
 Over fitting and MLE, Effect of Data Size, Training and Test Errors, Regularization and Model Complexity; Linear basis function models, MLE and Least Squares, Geometry of least squares, Sequential Learning, Robust Linear Regression, Ridge Regression, Lasso Regularizer and Sparse Solutions, Multioutput Regression; The BiasVariance Decomposition.
 Introduction to Linear Regression Models (continued)
 The BiasVariance Decomposition; MAP Estimate and Regularized Least Squares, Posterior Distribution, Predictive Distribution, Pointwise Uncertainty, Plugin Approximation, Covariance between the Predictions, Equivalent Kernel Representation; Computing the Bias Parameter, Centered Data. Bayesian inference in linear regression when σ^{2} unknown; Zellner’s gPrior, Uninformative (SemiConjugate) Prior; Evidence Approximation for Regression; Bayesian model selection.
[VideoLecture] [Lecture Notes]
 The BiasVariance Decomposition; MAP Estimate and Regularized Least Squares, Posterior Distribution, Predictive Distribution, Pointwise Uncertainty, Plugin Approximation, Covariance between the Predictions, Equivalent Kernel Representation; Computing the Bias Parameter, Centered Data. Bayesian inference in linear regression when σ^{2} unknown; Zellner’s gPrior, Uninformative (SemiConjugate) Prior; Evidence Approximation for Regression; Bayesian model selection.
 Bayesian Linear Regression Models
 Bayesian inference in linear regression when σ^{2} unknown; Zellner’s gPrior, Uninformative (SemiConjugate) Prior; Evidence Approximation for Regression; Bayesian model selection.
[VideoLecture] [Lecture Notes]
 Bayesian inference in linear regression when σ^{2} unknown; Zellner’s gPrior, Uninformative (SemiConjugate) Prior; Evidence Approximation for Regression; Bayesian model selection.
 Linear Models of Classifications
 Linear models for classification,Generalized Linear Models, Discriminant Functions; Least Squares approach to Classification; Generative Vs Discriminative Classifiers; Fishers linear discriminant, Probabilistic Interpretation; Online Learning and Stochastic Optimization; The Perceptron algorithm, Bayesian perspective.
[VideoLecture] [Lecture Notes]
 Linear models for classification,Generalized Linear Models, Discriminant Functions; Least Squares approach to Classification; Generative Vs Discriminative Classifiers; Fishers linear discriminant, Probabilistic Interpretation; Online Learning and Stochastic Optimization; The Perceptron algorithm, Bayesian perspective.
 Generative Models, Stochastic Optimization, The Perceptron Algorithm
 Probabilistic Generative Models for two Classes, Logistic Sigmoid, Models with K>2, Gaussian Class Conditionals; Maximum Likelihood Solution, Multiclass Case, Discrete Features; Exponential Family, Generalization to Exponential Family Class Conditionals; Online Learning and Stochastic Optimization, RobbinsMonro Algorithm, LMS Algorithm; The Perceptron algorithm, Bayesian perspective.
[VideoLecture] [Lecture Notes]
 Probabilistic Generative Models for two Classes, Logistic Sigmoid, Models with K>2, Gaussian Class Conditionals; Maximum Likelihood Solution, Multiclass Case, Discrete Features; Exponential Family, Generalization to Exponential Family Class Conditionals; Online Learning and Stochastic Optimization, RobbinsMonro Algorithm, LMS Algorithm; The Perceptron algorithm, Bayesian perspective.
 Discriminative Models, Logistic Regression
 Probabilistic Discriminative Models; Nonlinear Basis in Classification Models; Logistic Regression and Generalized Linear Models; Cross Entropy Error, Sequential Update, Steepest Descent Method, Newton’s Method, BFGS, Regularization, Linearly Separable Data, Iterative Reweighted Least Squares; Multiclass Logistic Regression.
[VideoLecture] [Lecture Notes]
 Probabilistic Discriminative Models; Nonlinear Basis in Classification Models; Logistic Regression and Generalized Linear Models; Cross Entropy Error, Sequential Update, Steepest Descent Method, Newton’s Method, BFGS, Regularization, Linearly Separable Data, Iterative Reweighted Least Squares; Multiclass Logistic Regression.
 Probit Regression (Continued) and Generalized Linear Models and the Exponential Family
 Probit Regression, IRLS for Probit Regression; The exponential family, Log partition function, MLE for the exponential family, Bayes for the exponential family, Maximum entropy derivation of the exponential family; Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradientbased optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multitask learning, Hierarchical Bayes, Domain adaptation; Generalized linear mixed models; Learning to rank, Loss functions for ranking.
[VideoLecture] [Lecture Notes]
 Probit Regression, IRLS for Probit Regression; The exponential family, Log partition function, MLE for the exponential family, Bayes for the exponential family, Maximum entropy derivation of the exponential family; Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradientbased optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multitask learning, Hierarchical Bayes, Domain adaptation; Generalized linear mixed models; Learning to rank, Loss functions for ranking.
 Generalized Linear Models and the Exponential Family (continued)
 Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradientbased optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multitask learning.
[VideoLecture] [Lecture Notes]
 Generalized linear models (GLMs), ML and MAP estimation, Bayesian inference; Probit regression, ML/MAP estimation using gradientbased optimization, Latent variable interpretation, Ordinal probit regression, Multinomial probit models; Multitask learning.
 ExpectationMaximization, Gaussian Mixture Models
 Latent Variable Models, Gaussian Mixtures KMeans Clustering and EM; Image Compression and Vector Quantization; Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data LogLikelihood; EM Algorithms and KMeans.
[VideoLecture] [Lecture Notes]
 Latent Variable Models, Gaussian Mixtures KMeans Clustering and EM; Image Compression and Vector Quantization; Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data LogLikelihood; EM Algorithms and KMeans.

ExpectationMaximization (Continued)
 Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data LogLikelihood; Initialization of EM; EM Algorithms and KMeans; Mixture of Bernoulli Distributions, MAP Estimation; EM for Bayesian Linear Regression; EM for Relevance Determination in Regression Models.
[VideoLecture] [Lecture Notes]
 Latent Variable View of Gaussian Mixture Models (GMM), EM for GMMs; Generalization, Maximizing the Expected Complete Data LogLikelihood; Initialization of EM; EM Algorithms and KMeans; Mixture of Bernoulli Distributions, MAP Estimation; EM for Bayesian Linear Regression; EM for Relevance Determination in Regression Models.
 ExpectationMaximization (Continued)
 Lower bound on evidence, Generalization of the EM Algorithm, EM in the Space of Parameters; Factorization of the Posterior P(ZX,θ); EM for MAP Estimation; Computing MLE/MAP and non Convexity; Fitting Models with Missing Data; EM for the Student's Distribution, Mixture of Student's.
[VideoLecture] [Lecture Notes]
 Lower bound on evidence, Generalization of the EM Algorithm, EM in the Space of Parameters; Factorization of the Posterior P(ZX,θ); EM for MAP Estimation; Computing MLE/MAP and non Convexity; Fitting Models with Missing Data; EM for the Student's Distribution, Mixture of Student's.
 Continuous Latent Variables
 Continuous Latent Variables; Factor Analysis, Principal Component Analysis, Maximum variance formulation, Minimumerror formulation, Applications of PCA, Introduction to Probabilistic PCA.
[VideoLecture] [Lecture Notes]
 Continuous Latent Variables; Factor Analysis, Principal Component Analysis, Maximum variance formulation, Minimumerror formulation, Applications of PCA, Introduction to Probabilistic PCA.
 Continuous Latent Variables (Continued)
 PCA for highdimensional data; Probabilistic PCA, Maximum likelihood PCA, EM algorithm for PCA, Bayesian PCA; Nonlinear Latent Variable Models, Independent Component Analysis, Autoassociative Neural Networks; Modeling Nonlinear Manifolds, Multidimensional Scaling (MDS), ISOMAP, GTM, Selforganizing Maps, Locally Linear Embedding, Latent Trait Models, Density Network.
[VideoLecture] [Lecture Notes]
 PCA for highdimensional data; Probabilistic PCA, Maximum likelihood PCA, EM algorithm for PCA, Bayesian PCA; Nonlinear Latent Variable Models, Independent Component Analysis, Autoassociative Neural Networks; Modeling Nonlinear Manifolds, Multidimensional Scaling (MDS), ISOMAP, GTM, Selforganizing Maps, Locally Linear Embedding, Latent Trait Models, Density Network.
 Kernel Methods
 Keeping or discarding the training data, Kernel Function, Kernel Methods and Kernel Trick, Dual Representation, Constructing Kernels from Basis Functions, Constructing Kernels Directly; Combining Kernels, Gaussian Kernel, Kernels for Comparing Documents, Matern Kernel, String Kernels, Symbolic Input, Pyramid Match Kernels, Probabilistic Generative Models, Probabilistic Product Kernels, Mercer Kernels, Fisher Kernel, Sigmoidal Kernel, Radial Basis Functions; Kernel Machines, L1VM and RVM, Smoothing Kernels for Generative Modeling, Kernel Density Estimation, KNN Classifiers; Interpolation with Noisy Inputs, NadarayaWatson model, Kernel Regression, Locally Weighted Regression; The Kernel Trick, KMedoids Clustering; Kernel PCA.
[VideoLecture] [Lecture Notes]
 Keeping or discarding the training data, Kernel Function, Kernel Methods and Kernel Trick, Dual Representation, Constructing Kernels from Basis Functions, Constructing Kernels Directly; Combining Kernels, Gaussian Kernel, Kernels for Comparing Documents, Matern Kernel, String Kernels, Symbolic Input, Pyramid Match Kernels, Probabilistic Generative Models, Probabilistic Product Kernels, Mercer Kernels, Fisher Kernel, Sigmoidal Kernel, Radial Basis Functions; Kernel Machines, L1VM and RVM, Smoothing Kernels for Generative Modeling, Kernel Density Estimation, KNN Classifiers; Interpolation with Noisy Inputs, NadarayaWatson model, Kernel Regression, Locally Weighted Regression; The Kernel Trick, KMedoids Clustering; Kernel PCA.
 Sparse Kernel Machines
 Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory; Relevance Vector Machines, RVM for regression, Analysis of sparsity, RVM for classification, RVM for Uncertainty quantification and Surrogate Modeling.
[VideoLecture] [Lecture Notes]
 Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory; Relevance Vector Machines, RVM for regression, Analysis of sparsity, RVM for classification, RVM for Uncertainty quantification and Surrogate Modeling.
 Support Vector Machines
 Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory.
[VideoLecture] [Lecture Notes]
 Sparse Kernel Machines, Maximum Margin Classifiers, Overlapping class distributions, Relation to logistic regression, Multiclass SVMs, SVMs for regression; Computational learning theory.
 Support Vector Machines (Continued)
 Support Vector Machine, Overlapping class distributions, Chunking, Decomposition Methods; Probabilistic predictions with SVM, Relation to Logistic Regression, Relation to logistic regression, Hinge Error Function, Choosing the SVM parameters; SVM for Regression, Dual problem and KKT conditions; Computational learning theory.
[VideoLecture] [Lecture Notes]
 Support Vector Machine, Overlapping class distributions, Chunking, Decomposition Methods; Probabilistic predictions with SVM, Relation to Logistic Regression, Relation to logistic regression, Hinge Error Function, Choosing the SVM parameters; SVM for Regression, Dual problem and KKT conditions; Computational learning theory.
 Sparse Linear Models
 Bayesian Variable Selection, The Spike and Slab Model, L_{0} regularization, Algorithms, Greedy Search, Orthogonal Least Squares, Matching Pursuits and Backwards Selection, EM and Variational Inference; L_{1} regularization, Sparse Solutions, Optimality Conditions for Lasso.
[VideoLecture] [Lecture Notes]
 Bayesian Variable Selection, The Spike and Slab Model, L_{0} regularization, Algorithms, Greedy Search, Orthogonal Least Squares, Matching Pursuits and Backwards Selection, EM and Variational Inference; L_{1} regularization, Sparse Solutions, Optimality Conditions for Lasso.
 Sparse Linear Models (Continued)
 Comparison of Least Squares, Lasso, Ridge & Subset Selection, Regularization Path, Model Selection, Bolasso; L_{1}Regularization Algorithms, Coordinate Descent, LARS, Proximal and Gradient Projection Algorithms, Proximal Operators, Proximal/Gradient Method, Nesterov’s Method, EM for Lasso, Group Lasso, Fused Lasso, Elastic Net; Nonconvex regularization, Bridge regression, Hierarchical adaptive Lasso (HAL), EM for HAL, Other Hierarchical Priors; Automatic Relevance Determination, ARD for Linear Regression, Sparsity, Connection to MAP Estimation, EM for ARD, Fixedpoint Algorithm, Iteratively Reweighted L_{1} Algorithm, ARD for Losistic Regression; Sparse Coding, Learning a Sparse Coding Dictionary, Application to Natural Image Patches, Compressed Sensing, Image Impainting and Denoising.
[VideoLecture] [Lecture Notes]
 Comparison of Least Squares, Lasso, Ridge & Subset Selection, Regularization Path, Model Selection, Bolasso; L_{1}Regularization Algorithms, Coordinate Descent, LARS, Proximal and Gradient Projection Algorithms, Proximal Operators, Proximal/Gradient Method, Nesterov’s Method, EM for Lasso, Group Lasso, Fused Lasso, Elastic Net; Nonconvex regularization, Bridge regression, Hierarchical adaptive Lasso (HAL), EM for HAL, Other Hierarchical Priors; Automatic Relevance Determination, ARD for Linear Regression, Sparsity, Connection to MAP Estimation, EM for ARD, Fixedpoint Algorithm, Iteratively Reweighted L_{1} Algorithm, ARD for Losistic Regression; Sparse Coding, Learning a Sparse Coding Dictionary, Application to Natural Image Patches, Compressed Sensing, Image Impainting and Denoising.
 Introduction to Gaussian Processes
 What is a Gaussian Process? Covariance and Mean Functions; Taking Samples from a Gaussian Process; GP Prior and Posterior; Computing the Hyperparameters, Relevance Determination; Gaussian Process Regression; Kernel Functions and Kernel Selection; Gaussian Process Classification; GPs and Neural Networks.
[VideoLecture] [Lecture Notes]
 What is a Gaussian Process? Covariance and Mean Functions; Taking Samples from a Gaussian Process; GP Prior and Posterior; Computing the Hyperparameters, Relevance Determination; Gaussian Process Regression; Kernel Functions and Kernel Selection; Gaussian Process Classification; GPs and Neural Networks.
 Introduction to Variational Inference
 Outline of variational inference, Factorized distributions, Mean Field Approximation, Approximating a Gaussian using a factorized Gaussian; Alternative forms of KL divergence, Gaussian Approximation of p(x) by KL(pq) minimization, Alpha family of divergences; Variational optimization and the EM algorithm, Mean Field for the Ising Model, Structured Mean Field, Variational Inference for the univariate Gaussian, Variational optimization and model selection; Variational Linear Regression, Predictive Distribution, Lower Bound, Selection of the order of the polynomial, Mixture of Gaussians, Variational Message Passing, Variational Lower Bound, Reestimation equations using the variational lower bound, Predictive distribution, Case of large data set, Determining the number of mixture components, MAP Estimate versus MLE, Induced factorizations.
[VideoLecture] [Lecture Notes]
 Outline of variational inference, Factorized distributions, Mean Field Approximation, Approximating a Gaussian using a factorized Gaussian; Alternative forms of KL divergence, Gaussian Approximation of p(x) by KL(pq) minimization, Alpha family of divergences; Variational optimization and the EM algorithm, Mean Field for the Ising Model, Structured Mean Field, Variational Inference for the univariate Gaussian, Variational optimization and model selection; Variational Linear Regression, Predictive Distribution, Lower Bound, Selection of the order of the polynomial, Mixture of Gaussians, Variational Message Passing, Variational Lower Bound, Reestimation equations using the variational lower bound, Predictive distribution, Case of large data set, Determining the number of mixture components, MAP Estimate versus MLE, Induced factorizations.
Homework
 January 31, Homework 1
 February 14, Homework 2
Course Info and References
Credit: 4 Units
Lectures: Tuesdays and Thursdays 12:30  1:45 pm, DeBartolo Hall 129.
Recitation: Fridays. 11:30  12:15 pm, DeBartolo Hall 102.
Professor: Nicholas Zabaras, 311 I Cushing Hall, nzabaras@gmail.com
Teaching Fellow (Volunteer): Dr. Souvik Chakraborty, csouvik41@gmail.com
Office hours: Souvik Chakraborty, Tues. & Thur. 4:30  5:30 p.m., 301 Riley Hall; N. Zabaras, Fridays 4:5:30 pm, 311 I Cushing and by appointment.
Course description: The course covers selective topics on Bayesian scientific computing relevant to highdimensional datadriven engineering and scientific applications. An overview of Bayesian computational statistics methods will be provided including Monte Carlo methods, exploration of posterior distributions, model selection and validation, MCMC and Sequential MC methods and inference in probabilistic graphical models. Bayesian techniques for building surrogate models of expensive computer codes will be introduced including regression methods for uncertainty quantification, Gaussian process modeling and others. The course will demonstrate these techniques with a variety of scientific and engineering applications including among others inverse problems, dynamical system identification, tracking and control, uncertainty quantification of complex multiscale systems, physical modeling in random media, and optimization/design in the presence of uncertainties. The students will be encouraged to integrate the course tools with their own research topics.
Intended audience: Graduate Students in Mathematics/Statistics, Computer Science, Engineering, Physical/Chemical/Biological/Life Sciences.
References of General Interest: The course lectures will become available on the course web site. For in depth study, a list of articles and book chapters from the current literature will also be provided to enhance the material of the lectures. There is no required text for this course. Some important books that can be used for general background reading in the subject areas of the course include the following:
References on (Bayesian) Machine Learning:

 C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2007.
 Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012 (a free ebook is also available from the author's web site).
 C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006 (a free ebook is also available from the Gaussian Processes web site).
Homework: assigned every three to four lectures. Most of the homework will require implementation and application of algorithms discussed in class. We anticipate between five to seven homework sets. All homework solutions and affiliated computer programs should be mailed by midnight of the due date to this Email address. All attachments should arrive on an appropriately named zipped directory (e.g. HW1_Submission_YourName.rar). We would prefer typed homework (include in your submission all original files e.g. Latex and a Readme file for compiling and testing your software).
Term project: A project is required in statistical and computational aspects of the course. Students are encouraged to investigate aspects of Machine Learning that extend topics covered in the lectures. We would like to see both methodological developments as well implementation of algorithms (new or available in the literature). Topics that are way out of line to the material covered in the course will not be acceptable even if they are related to the big picture of statistical learning (e.g. topics on Deep Learning will not be appropriate). Please submit an abstract (one page) by February 15th that includes a description of your project plans with appropriate references (provide links to them). A short written report (in the format of NIPS papers) is required as well as a presentation. Project presentations will be given at the end of the semester as part of a day or two long symposium. Appropriate projects will be those that educate the class in unexplored (during the lectures) Machine Learning topics and clearly demonstrate the depth of knowledge of statistical/machine learning acquired by each student.
Grading: Homework 60% and Project 40%.
Prerequisites: Linear Algebra, Probability theory, Introduction to Statistics and Programming (any language). The course will require significant effort especially from those not familiar with computational statistics. It is a course intended for those that value the role of Bayesian inference and machine learning on their research.
Syllabus
 Introduction to Machine Learning
 Supervised and unsupervised machine learning
 Parametric Vs. NonParametric Models
 Examples, Linear regression, Logistic regression, Knearest neighbors
 Overfitting, Model Selection, No free lunch Theorem
 Curse of Dimensionality
 Generative Models for Discrete Data
 Bayesian concept learning
 The BetaBinomial model, the DirichletMultinomial model
 Naive Bayes classifiers
 Introduction to Bayesian Statistics
 Bayes' rule, estimators and loss functions
 Bayes' factors, prior/likelihood & posterior distributions
 Priors, uninformative, Jeffreys & robust priors, Mixture of conjugate priors
 Density estimation methods
 Bayesian model validation
 Empirical Bayes
 Bayesian Decision Theory
 Linear Regression Models
 MLE Estimation
 Robust linear regression
 Ridge regression
 Bayesian linear regression
 Logistic Regression
 Model specification and fitting
 Bayesian logistic regression
 Online learning, Stochastic optimization
 Generative and discriminative classifiers
 Generalized Linear Models and the Exponential Family
 The exponential family
 Generalized linear models
 Probit regression
 Multitask learning, Learning to rank
 Mixture Models and the EM Algorithm
 Latent variable models
 Mixture models, Parameter estimation
 The EM Algorithm
 Model Selection
 Models with missing data
 Latent linear models
 Factor analysis
 Principal component analysis
 PCA for Categorical and Multiview Data
 Independent component analysis
 Sparse Linear Models
 Bayesian variable selection
 L_{1} regularization, Lasso
 Group and fused Lasso, Elastic Net
 Nonconvex regularization
 Automatic relevevance detetmination
 Sparse coding
 Kernel Methods
 Kernel functions
 Kernel functions and GLMs
 The kernel trick
 Kernel PCA
 Support vector machines
 Automatic relevance determination (RVM)
 Kernels for Generative models, Kernel density estimation, kernel regression, locally weighted regression
 Variational Inference Methods
 Forward and reverse KL Divergence
 The mean field approximation, Structured Mean Field
 Variational Bayes
 Variational Bayes EM
 Local variational bounds
 Gaussian Processes as Surrogate Models
 GPs for regression
 GPs for classification
 Connection with other methods (linear models, linear smoothers, SVMs, Neural nets, RKHS, etc.)
 Gaussian Process latent variable model
 Sparse Gaussian processes
 Deep Gaussian Processes
 Adaptive Basis Function Models
 Classification and regression trees
 Generalized additive models
 Boosting
 Feedforward neural networks
 Ensemble Learning