OS4140 Predictive Modeling

This course develops the predictive modeling skills needed to answer operationally relevant questions, such as forecasting personnel attrition, detecting fraud, or anticipating emerging risks, by building models for continuous and categorical outcomes. Students learn how to construct, interpret, and evaluate multivariate linear regression, logistic regression, penalized regression methods, tree‑based models, and ensemble approaches, with emphasis on model validation, performance assessment, and responsible use in real analytical settings. The course provides a rigorous foundation for thesis work and equips students with practical tools they can apply throughout their careers.

Prerequisite

This is a second course in statistics and data analysis. Students are expected to have prior knowledge of foundational concepts at the level of OS3160, OS3170, or OS3180. The course also involves coding so a basic programming background is required (e.g., OA2801, CS2020).

Lecture Hours

4

Lab Hours

1

Course Learning Outcomes

By the end of this course, students will be able to:


  1. Analyze raw, real-world datasets to diagnose data quality issues
  2. Apply appropriate preprocessing techniques (e.g., imputation, outlier handling, transformations) to prepare data for modeling.
  3. Select and compute appropriate performance metrics for a given task (e.g., RMSE, MAE, AUC, accuracy)
  4. Evaluate models using cross-validation.
  5. Build and interpret multiple linear and logistic regression models
  6. Explain coefficients, p-values, and confidence intervals in an applied context.
  7. Implement and tune penalized regression models (ridge, LASSO)
  8. Analyze how regularization influences model complexity and feature selection.
  9. Construct and evaluate non-linear and ensemble models (e.g., decision trees, random forests, gradient boosting, basic neural networks) using a reproducible workflow.
  10. Compare and justify a final model choice by assessing trade-offs among interpretability, predictive performance, robustness, and operational constraints.