# Statistical Computing Methods for Scientists and Engineers

## Statistical Computing Methods for Scientists and Engineers

#### [Fall 2018]

#### Professor Nicholas Zabaras

#### Lecture Notes and Videos

1. Introduction to Statistical Computing and Probability and Statistics

- Introduction to the course, books and references, objectives, organization; Fundamentals of probability and statistics, laws of probability, independency, covariance, correlation; The sum and product rules, marginal and conditional distributions; Random variables, moments, discrete and continuous distributions; The univariate Gaussian distribution.

[Video-Lecture] [Lecture Notes]

2. Introduction to Probability and Statistics (Continued)

- Binomial, Bernoulli, Multinomial, Multinoulli, Poisson, Student's-T, Laplace, Gamma, Beta, Pareto, multivariate-Gaussian and Dirichlet distributions; Joint probability distributions; Transformation of random variables; Central limit theorem and basic Monte Carlo approximations; Probability inequalities; Information theory review, KL Divergence, Entropy, Mutual Information, Jensen's inequality.

[Video-Lecture] [Lecture Notes]

3. Information Theory, Multivariate Gaussian, MLE Estimation, Robbins-Monro algorithm

- Information theory, KL divergence, entropy, mutual information, Jensen's inequality (continued); Central limit theorem examples, Checking the Gaussian nature of a data set; Multivariate Gaussian, Mahalanobis distance, geometric interpretation; Maximum Likelihood Estimation (MLE) for the univariate and multivariate Gaussian, sequential MLE estimation; Robbins-Monro algorithm for sequential MLE estimation.

[Video-Lecture] [Lecture Notes]

4. Variance/Covariance, Correlation/Independence, Transformation of Densities, Multivariate Gaussian, Dirichlet and Student's T

- Variance and covariance, correlation, center normalized random variables, uncorrelated random variables; Multivariate random variables, covariance matrix; Independent Vs. Uncorrelated, mutual information; Marginal and conditional densities; Conditional expectations; Multivariate Gaussian; Transformation of probability densities; Multivariate Student's-T distribution; Dirichlet distribution.

[Video-Lecture] [Lecture Notes]

5. Central Limit Theorem, Inequalities, and Introduction to MC Methods

- Markov and Chebyshev inequalities; The law of large numbers; Empirical mean and empirical covariance; Central limit theorem; Introduction to Monte Carlo methods; Introduction to Information Theory.

[Video-Lecture] [Lecture Notes]

6. Introduction to Information Theory

- Entropy, Noiseless Shanon Coding Theorem; Statistical Mechanics Definition of Entropy; Maximum Entropy; Differential Entropy; KL Divergence, Jensen;s Inequality; Conditional Entropy; Mutual Information; Pointwise Mutual Information, Maximal Information Coefficient.

[Video-Lecture] [Lecture Notes]

7. Introduction to Bayesian Statistics

- Parametric modeling, Sufficiency principle, MLE and the Likelihood principles; MLE for a Univariate Gaussian, MLE for the Multivariate Gaussian; Bayesian Statistics, Bayes rule, Prior, Likelihood and Posterior, Posterior Point Estimates, Predictive Distribution, Univariate Gaussian with Unknown Mean, Inference of Precision with Known Mean, Inference on Mean and Precision; Inference for the Multivariate Gaussian, Unknown Mean, Unknown Precision, Unknown Mean and Precision, MAP Estimation and Shrinkage, Marginal Likelihood, Non-informative Prior, Wishart Distribution; Exponential Family, Bernoulli, Beta, Gamma, Gaussian, Conjugate Priors, Posterior Predictive Distribution.

[Video-Lecture] [Lecture Notes]

8. Bayesian Inference for the Mean and Precision for the Univariate and Multivariate Gaussian

- Bayesian Inference, Univariate Gaussian with Unknown Mean, Inference of Precision with Known Mean, Inference on Mean and Precision; Inference for the Multivariate Gaussian, Unknown Mean, Unknown Precision, Unknown Mean and Precision, MAP Estimation and Shrinkage, Marginal Likelihood, Non-informative Prior, Wishart Distribution; Exponential Family, Bernoulli, Beta, Gamma, Gaussian, Conjugate Priors, Posterior Predictive Distribution.

[Video-Lecture] [Lecture Notes]

9. Exponential Family of Distributions

- Exponential Family, Bernoulli, Beta, Gamma, Gaussian; Conjugate Priors, Posterior Predictive Distribution; Maximum Entropy and the Exponential Family; Generalized Linear Models and the Exponential Family.

[Video-Lecture] [Lecture Notes]

10. Generalized Linear Models and the Exponential Family, Conditional Gaussian Systems, Information Form of the Gaussian

- Generalized Linear Models and the Exponential Family; Iterative Reweighted Least Square, Sequential Estimation; Conditional Gaussian Systems, Matrix Inversion Lemma; The Marginal Distribution; Interpolating noise-free data; Data imputation.

[Video-Lecture] [Lecture Notes]

11. Prior Hierarchical Models

- Prior modeling, Conjugate priors , Exponential families, Linearity of the Posterior Mean, Example: Gaussian with unknown mean and variance, Extension to Multivariate Gaussians, Poisson with unknown mean; Mixture of conjugate priors, Limitations of Conjugate Priors; MaxEnt priors, Non-informative priors; Translation and Scale invariance; Improper priors, Jeffrey’s prior, Pros and Cons of improper priors, Lack of robustness of the normal prior; Hierarchical Bayesian Models, Empirical Bayes.

[Video-Lecture] [Lecture Notes]

12. Introduction to Bayesian Linear Regression

- Motivation to Bayesian inference via a regression example, Over fitting, Effect of Data Size, Model Selection, Over fitting and MLE, Regularization and Model Complexity; Bayesian Inference and Prediction, Frequentist Vs Bayesian Paradigm, Bias in MLE (Gaussian Example); A Probabilistic View of Regression, MAP Estimate and Regularized Least Squares, Posterior Distribution, Predictive Distribution; Model Selection and Cross Validation, AIC Information Criterion, Bayesian Model Selection, Bayesian Occam’s Razor.

[Video-Lecture] [Lecture Notes]

13. Bayesian Model Selection

- Model Selection and Cross Validation, AIC Information Criterion, Bayesian Model Selection, Bayesian Occam’s Razor, Marginal Likelihood, Evidence Approximation, Examples; Laplace Approximation, Bayesian Information Criterion, Akaike Information Criterion; Effect of the Prior, Empirical Bayes, Bayes Factors and Jeffreys Scale of Evidence, Examples, Jeffreys-Lindley Paradox.

[Video-Lecture] [Lecture Notes]

14. Bayesian Linear Regression (Continued)

- Linear basis function models, sequential learning, multiple outputs, data centering, Bayesian inference when σ
^{2}is unknown, Zellner's g-prior, uninformative semi-conjugate priors, introduction to relevance determination for Bayesian regression.

[Video-Lecture] [Lecture Notes]

15. Implementation of Bayesian Regression and Variable Selection

- The caterpillar regression problem; Conjugate priors, conditional and marginal posteriors, predictive distribution, influence of the conjugate prior; Zellner's G prior, marginal posterior mean and variance, credible intervals; Jeffrey's non-informative prior, Zellner's non-informative G prior, point null hypothesis and calculation of Bayes factors for the selection of explanatory input variables; Variable selection, model comparison, variable selection prior, sampling search for the most probable model, Gibb's sampling for variable selection; Implementation details.

[Video-Lecture] [Lecture Notes]

16. Implementation of Bayesian Regression and Variable Selection (Continued)

- Jeffrey's non-informative prior, Zellner's non-informative G prior, point null hypothesis and calculation of Bayes factors for the selection of explanatory input variables; Variable selection, model comparison, variable selection prior, sampling search for the most probable model, Gibb's sampling for variable selection; Implementation details.

[Video-Lecture] [Lecture Notes]

17. The evidence approximation, Variable and (Regression) model selection

- The evidence approximation, Limitations of fixed basis functions, equivalent kernel approach to regression, Gibb's sampling for variable selection, variable and model selection

[Video-Lecture] [Lecture Notes]

18. Introduction to Monte Carlo Methods, Sampling from Discrete and Continuum Distributions

- Review of the Central Limit Theorem, Law of Large Numbers; Calculation of π, Indicator functions and Monte Carlo error estimates; Monte Carlo estimators, properties, coefficient of variation, convergence, MC and the curse of dimensionality; MC Integration in high dimensions, optimal number of MC samples; Sample representation of the MC estimator; Bayes factors estimation with Monte Carlo; Sampling from discrete distributions; Reverse sampling from continuous distributions; Transformation methods, the Box-Muller algorithm, sampling from the multivariate Gaussian.

[Video-Lecture] [Lecture Notes]

19. Reverse Sampling, Transformation Methods, Composition Methods, Accept-Reject Methods, Stratified/Systematic Sampling

- Sampling from a discrete distribution; Reverse sampling for continuous distributions; Transformation methods, Box-Muller algorithm, sampling from the multivariate Gaussian; Simulation by composition, accept-reject sampling; Conditional Monte Carlo; Stratified sampling and systematic sampling.

[Video-Lecture] [Lecture Notes]

20. Accept-Reject Methods, Stratified/Systematic Sampling and Introduction to Importance Sampling

- Sampling from a discrete distribution; Reverse sampling for continuous distributions; Transformation methods, Box-Muller algorithm, sampling from the multivariate Gaussian; Simulation by composition, accept-reject sampling; Conditional Monte Carlo; Stratified sampling and systematic sampling. Importance sampling methods, sampling from a Gaussian mixture; Optimal importance sampling distribution, normalized importance sampling; Asymptotic variance/Delta method, asymptotic bias; Applications to Bayesian inference; Importance sampling in high dimensions.

[Video-Lecture] [Lecture Notes]

21. Importance Sampling

- Importance sampling methods, sampling from a Gaussian mixture; Optimal importance sampling distribution, normalized importance sampling; Asymptotic variance/Delta method, asymptotic bias; Applications to Bayesian inference; Importance sampling in high dimensions, importance sampling vs rejection sampling; Solving Ax=b with importance sampling, computing integrals with singularities, other examples.

[Video-Lecture] [Lecture Notes]

22. Gibbs Sampling

- Review of Importance sampling, Solving Ax=b with importance sampling, Sampling Importance Resampling (Continued); Gibbs Sampling, Systematic and Random scans, Block and Metropolized Gibbs, Application to Variable Selection in Bayesian Regression; MCMC, Metropolis-Hastings, Examples.

[Video-Lecture] [Lecture Notes]

23. Markov Chain Monte Carlo and Metropolis-Hasting Algorithm

- MCMC, Averaging along the chain, Ergodic Markov chains; Metropolis algorithm, Metropolis-Hastings, Examples; Random Walk Metropolis-Hastings, Independent Metropolis-Hastings; Metropolis-adjusted Langevin algorithm; Combinations of Transition Kernels, Simulated Annealing.

[Video-Lecture] [Lecture Notes]

24. Introduction to State Space Models and Sequential Importance Sampling

- The state space model; Examples, Tracking problem, Speech enhancement, volatility model; The state space model with observations, examples; Bayesian inference in state space models, forward filtering, forward-backward filtering; Online parameter estimation; Monte Carlo for the state space model, optimal importance distribution, sequential importance sampling.

[Video-Lecture] [Lecture Notes]

25. Sequential Importance Sampling (continued)

- Bayesian inference in state space models, forward filtering, forward-backward filtering; Online parameter estimation; Monte Carlo for the state space model, optimal importance distribution, sequential importance sampling.

[Video-Lecture] [Lecture Notes]

26. Sequential Importance Sampling with Resampling

- Sequential importance sampling (Continued); Optimal Importance distribution, locally optimal importance distribution, suboptimal importance distribution; Importance distribution by linearization. Examples, Robot localization, Tracking, Stochastic volatility; Resampling, Effective sample size, multinomial resampling, sequential importance sampling with resampling, Various examples; Rao-Blackwellised particle filter, mixture of Kalman filters, switching LG-SSMs, Fast Slam; Error estimates, degeneracy, convergence.

[Video-Lecture] [Lecture Notes]

27. Sequential Importance Sampling with Resampling (Continued)

- General framework for Sequential Importance Sampling Resampling; Growing a polymer in two dimensions; Sequential Monte Carlo for Static Problems; Online parameter estimation; SMC for Smoothing.

[Video-Lecture] [Lecture Notes]

28. Sequential Monte Carlo (Continued) and Conditional Linear Gaussian Models

- Online parameter estimation; SMC for Smoothing; Kalman filter review for linear Gaussian models; Sequential Monte Carlo for conditional linear Gaussian models, Rao-Blackwellized particle filter, applications; Time Series models; Partially observed linear Gaussian models; Dynamic Tobit and Dynamic Probit models.

[Video-Lecture] [Lecture Notes]

#### Homework

- Sept. 11, Homework 1
- Posterior for (μ,σ
^{2}) for a Gaussian likelihood with conjugate prior, the Wishart distribution and its moments, exponential family distributions, MLE for the Gaussian and Gamma distributions.

[Homework] [Software/Data Resource] [Solution] [Software Solution]

- Posterior for (μ,σ
- Sept. 21, Homework 2

- Hierarchical Bayesian models, Jeffrey's prior and maximum entropy prior, Laplace approximation, Monte Carlo integration, Bayesian information criterion. MLE and MAP estimates.

[Homework] [Software/Data Resource] [Solution] [Software Solution]

- Hierarchical Bayesian models, Jeffrey's prior and maximum entropy prior, Laplace approximation, Monte Carlo integration, Bayesian information criterion. MLE and MAP estimates.
- Oct. 8, Homework 3
- Bayesian linear regression, Variable and model selection, Gibbs sampling for variable selection, Informative Zellner's G Prior, Jeffreys' non-informative Prior, Zellner's non-informative G Prior.

[Homework] [Software/Data Resource] [Solution] [Software Solution]

- Bayesian linear regression, Variable and model selection, Gibbs sampling for variable selection, Informative Zellner's G Prior, Jeffreys' non-informative Prior, Zellner's non-informative G Prior.
- Oct. 24, Homework 4
- Monte Carlo Methods, Accept-Reject Sampling, Sampling from the Gamma distribution with Cauchy distribution as proposal, Metropolis-Hastings and Gibbs sampling, Hamiltonian MC methods, applications to Bayesian Regression.

[Homework] [Software/Data Resource] [Solution] [Software Solution]

- Monte Carlo Methods, Accept-Reject Sampling, Sampling from the Gamma distribution with Cauchy distribution as proposal, Metropolis-Hastings and Gibbs sampling, Hamiltonian MC methods, applications to Bayesian Regression.

#### Course Info and References

**Credit: **3 Units

**Lectures: **Tuesdays and Thursdays 12:30 -- 1:45 pm, DeBartolo Hall 126

**Recitation: **Friday. 12:30 -- 1:45 pm, DeBartolo Hall 126.

**Professor: **Nicholas Zabaras, 311 I Cushing Hall, nzabaras@gmail.com

**Teaching Associate: **Nicholas Geneva: ngeneva@nd.edu, Govinda Anantha-Padmanabha: ganantha@nd.edu, Navid Shervani-Tabar: nshervan@nd.edu

**Office hours: **(NZ) Mondays and Fridays, 1-2 pm (also by appointment), 311 I Cushing; (TAs) Mondays 5:00 -- 7:00 p.m., 125 DeBartolo Hall.

**Course description:** The course covers selective topics on Bayesian scientific computing relevant to high-dimensional data-driven engineering and scientific applications. An overview of Bayesian computational statistics methods will be provided including Monte Carlo methods, exploration of posterior distributions, model selection and validation, MCMC and Sequential MC methods and inference in probabilistic graphical models. Bayesian techniques for building surrogate models of expensive computer codes will be introduced including regression methods for uncertainty quantification, Gaussian process modeling and others. The course will demonstrate these techniques with a variety of scientific and engineering applications including among others inverse problems, dynamical system identification, tracking and control, uncertainty quantification of complex multiscale systems, physical modeling in random media, and optimization/design in the presence of uncertainties. The students will be encouraged to integrate the course tools with their own research topics.

**Intended audience:** Graduate Students in Mathematics/Statistics, Computer Science, Engineering, Physical/Chemical/Biological/Life Sciences.

**References of General Interest: **The course lectures will become available on the course web site. For in depth study, a list of articles and book chapters from the current literature will also be provided to enhance the material of the lectures. There is no required text for this course. Some important books that can be used for general background reading in the subject areas of the course include the following:

**References on Bayesian fundamentals:**

- C.P. Robert and G. Casella,
*The Bayesian Choice: from Decision-Theoretic Motivations to Computational Implementation, Springer-Verlag, New York, 2001*(also available as ebook, complete list of slides based on the book also available). - A Gelman, JB Carlin, HS Stern and DB Rubin,
*Bayesian Data Analysis, Chapman & Hall CRC, 2nd edition, 2003.*

**References on Bayesian computation:**

- JS Liu,
*Monte Carlo Strategies in Scientific Computing, Springer Series in Statistics, 2001.* - CP. Robert,
*Monte Carlo Statistical Methods, Springer Series in Statistics, 2nd edition, 2004*(complete list of slides based on the book also available). - W. R. Gilks, et al.,
*Markov Chain Monte Carlo in Practice, Chapman & Hall/CRC, 1995*. - J. Kaipio and E. Somersalo,
*Statistical and Computational Inverse Problems, Springer, 2004*(for Notre Dame students an ebook can be downloaded from here).

**References on Probability Theory and Information Science:**

- E T Jaynes,
*Probability Theory: The Logic of Science, Cambridge University Press, 2003*(a free ebook is also available). - David JC MacKay,
*Information Theory: Inference and Learning Algorithms, Cambridge University Press, 2003*(a free ebook is also available from the author's web site).

**References on (Bayesian) Machine Learning:**

- C. M. Bishop,
*Pattern Recognition and Machine Learning*, Springer, 2007. - Kevin P. Murphy,
*Machine Learning: A Probabilistic Perspective*, MIT Press, 2012 (a free ebook is also available from the author's web site). - David Barber,
*Bayesian Reasoning and Machine Learning*, Cambridge University Press, 2012 (a free ebook is also available from the author's web site). - C. E. Rasmussen & C. K. I. Williams,
*Gaussian Processes for Machine Learning*, MIT Press, 2006 (a free ebook is also available from the Gaussian Processes web site). - Michael I. Jordan,
*Learning in Graphical Models*, MIT Press, 1998. - Daphne Koller and Nir Friedman,
*Probabilistic Graphical Models*, MIT Press, 2009.

**Homework:** assigned every three to four lectures. Most of the homework will require implementation and application of algorithms discussed in class. We anticipate between five to seven homework sets. All homework solutions and affiliated computer programs should be mailed by midnight of the due date to this Email address. All attachments should arrive on an appropriately named zipped directory (e.g. HW1_Submission_YourName.rar). We would prefer typed homework (include in your submission all original files e.g. Latex and a Readme file for compiling and testing your software).

**Term project:** A project is required in mathematical or computational aspects of the course. Students are encouraged to investigate aspects of Bayesian computing relevant to their own research. A short written report (in the format of NIPS papers) is required as well as a presentation. Project presentations will be given at the end of the semester as part of a day or two long symposium.

**Grading:** Homework 60% and Project 40%.

**Prerequisites: **Linear Algebra, Probability theory, Introduction to Statistics and Programming (any language). The course will require significant effort especially from those not familiar with computational statistics. It is a course intended for those that value the role of Bayesian inference and machine learning on their research.

#### Syllabus

- Review of probability and statistics
- Laws of probability, Bayes' Theorem, Independency, Covariance, Correlation, Conditional probability, Random variables, Moments
- Markov and Chebyshev Inequalities, transformation of PDFs, Central Limit Theorem, Law of Large Numbers
- Parametric and non-parametric estimation
- Operations on Multivariate Gaussians, computing marginals and conditional distributions, curse of dimensionality

- Introduction to Bayesian Statistics
- Bayes' rule, estimators and loss functions
- Bayes' factors, prior/likelihood & posterior distributions
- Density estimation methods
- Bayesian model validation.

- Introduction to Monte Carlo Methods
- Importance sampling
- Variance reduction techniques

- Markov Chain Monte Carlo Methods
- Metropolis-Hastings
- Gibbs sampling
- Hybrid algorithms
- Trans-dimensional sampling

- Sequential Monte Carlo Methods and applications
- Target tracking/recognition
- Estimation of signals under uncertainty
- Inverse problems
- Optimization (simulated annealing)

- Uncertainty Quantification Methods
- Regression models in high dimensions
- Gaussian process modeling
- Forward uncertainty propagation
- Uncertainty propagation in time dependent problems
- Bayesian surrogate models
- Inverse/Design uncertainty characterization
- Optimization and optimal control problems under uncertainty

- Uncertainty Quantification using Graph Theoretic Approaches
- Data-driven multiscale modeling
- Nonparametric Bayesian formulations