Statistical methods for risk prediction and prognostic models - Short Course
Program start date | Application deadline |
2024-05-21 | - |
2024-05-22 | - |
2024-05-23 | - |
Program Overview
This 3-day online course in Statistical Methods for Risk Prediction and Prognostic Models provides the foundation to develop and validate statistical models for individual risk prediction, with a focus on binary and time-to-event outcomes. Participants will gain a deep understanding of model development and validation techniques, including variable selection, overfitting control, performance measures, external validation, and sample size calculations. The course emphasizes the practical application of methods in R or Stata, with hands-on computer practicals and opportunities for faculty interaction.
Program Outline
Degree Overview:
This online course provides a thorough foundation of statistical methods for developing and validating risk prediction and prognostic models in healthcare research. It is delivered over 3 days and focuses on key principles for model development, internal validation, and external validation. The course mainly focuses on binary and time-to-event outcomes, though continuous outcomes is also covered in special topics.
Objectives:
By the end of the course, participants will:
- Understand phases of prediction model research
- Know the core statistical methods for developing a prediction model, and be able to apply them in R or Stata
- Understand the differences between models for binary and time-to-event outcomes
- Understand the use of logistic regression, Cox regression, and flexible parametric survival models in the context of prediction modelling
- Understand how to model non-linear relationships for continuous variables using splines or fractional polynomials
- Understand the issue of overfitting and how to limit and examine this
- Know the role of penalisation and shrinkage methods, including uniform shrinkage, the lasso and elastic net
- Know how to internally validate a prediction model after model development, using bootstrapping or cross-validation in R or Stata
- Understand how to produce optimism-adjusted estimates of model performance
- Know the importance and role of discrimination, calibration and clinical utility measures, and how to derive them in R or Stata
- Understand how to undertake an external validation study
- Appreciate different approaches to variable selection, including lasso and elastic net, and the instability of these approaches
- Recognise the importance of the TRIPOD reporting guideline and different formats for presentation of a model
- Appreciate methods for handling missing data, competing risks, pseudo-observations and continuous outcomes
Outline:
Day 1:
- Overview of the rationale and phases of prediction model research
- Model development topics:
- Identifying candidate predictors
- Handling of missing data
- Modelling continuous predictors using fractional polynomials or restricted cubic splines for non-linear functions
- Variable selection procedures
Day 2:
- Overfitting of models and how they often do not generalise to other datasets
- Internal validation strategies to identify and adjust for overfitting:
- Cross-validation
- Bootstrapping
- Estimating optimism and shrinking model coefficients
- LASSO and elastic net
- Statistical measures of model performance:
- Discrimination (C-statistic and D-statistic)
- Calibration (calibration-in-the-large, calibration plots, calibration slope, calibration curve)
- Sample size considerations for model development and validation
- New software to implement sample size calculations
Day 3:
- External validation to assess model generalisability
- Framework for different types of external validation studies
- Model updating strategies (re-calibration techniques)
- Novel topics:
- Pseudo-values for calibration curves in a survival model setting
- Model development and validation using large datasets (e-health records or multiple studies)
- Meta-analysis methods for summarising model performance across multiple studies or clusters
- Practical guidance on presenting prediction and prognostic models
Teaching:
- Teaching is via a combination of recorded lectures, live computer practicals, and live question and answer sessions following each lecture/session.
- Opportunities to meet with faculty to ask specific questions about personal research queries.
- Previous experience of using R or Stata for data analysis is also highly recommended, though computer code is already written in the practicals.
- The course is not accredited.
- Participants receive a Certificate of completion confirming hours of completed study.
- All course material (e.g. lecture videos, computer practicals etc) will be made available a week in advance and for 2 weeks afterwards, to provide plenty of time and flexibility for participants to work through the material in their own time.