Program Overview
Lancaster University's Data Science PgCert provides a solid foundation in data analysis, statistical modeling, and programming. With both full-time and part-time options, this program offers core modules in data mining, data science fundamentals, and programming for data scientists, while also allowing for specialization through optional modules such as clinical trials, forecasting, and epidemiology. Graduates are equipped for careers in data-related fields and enjoy high earning potential.
Program Outline
Degree Overview:
The Data Science PgCert is a program that provides a foundation in statistical modelling and data analytical skills, the theories underpinning statistical modelling, programming, data mining, and data science as a process for gaining insight from data. You can study either full-time or part-time. Upon successful completion of the modules, you may wish to progress to a PGDip or an MSc in Data Science. Students learn about data primer, data processing, classification, dynamic data space partitioning, evolving clustering, data clouds, and quality monitoring of self-learning systems. They also gain skills in developing software scripts for advanced data representation and processing, analyzing trade-offs in performance and complexity, and designing practical solutions for data representation and processing challenges.
- Data Science Fundamentals: Explores the data science role, responsibilities, and daily tasks within an organization. It covers research methodology, including hypothesis formulation, research strategies, data processing, preparation, integration, and applying data science solutions in industrial settings. Communication of research findings to stakeholders is also emphasized.
- Programming for Data Scientists: Designed for both beginners and experienced programmers, this module aims to equip students with high-level programming skills for handling complex data science problems. Beginners learn programming fundamentals, while experienced programmers refine their skills. Students are taught data-processing techniques, including visualization and statistical data analysis, problem-solving, and development of graphical applications. Both R and Python programming languages are used. It introduces basic statistical modelling terminology and compares statistical and machine learning approaches. Topics covered include sampling uncertainty, statistical inference, model fitting, linear regression, and generalized linear models. The statistical software package R is used for implementation. Students completing this module and pursuing the Environmental, Health, or Societal Pathways in Term II must also complete the "Statistical Foundations II" module. It builds on undergraduate-level mathematics, statistics (hypothesis testing and linear regression), and probability (univariate discrete and continuous distributions, expectations, variances, covariances, and the multivariate normal distribution). GLMs are introduced as an extension of the linear regression model.
Optional Modules:
- Applied Data Mining: Expands on the "Fundamentals of Data" module and provides students with knowledge about current applications of data in industry and research. It delves into data processing and application on a large scale across various domains. Students learn about different areas of science and their relation to big data, addressing large-scale challenges with current techniques. The module explores the Social Web, social network theory and analysis, user-generated content, crowd-sourced data, and recommendation systems (collaborative filtering, content recommendation challenges, and friend recommendation/link prediction). Upon completion, students can create scalable solutions for problems involving data from the semantic, social, and scientific web, process networks, and perform network analysis to identify key factors in information flow.
- Building Big Data Systems: Examines the architectural approaches, techniques, and technologies that underlie Big Data system infrastructure, particularly large-scale enterprise systems. It is one of two modules in the Systems stream of the Computer Science MSc, providing a comprehensive understanding of systems architecture.
- Clinical Trials: Focuses on planned experiments on human beings designed to assess the effectiveness of treatments. It covers the advantages and disadvantages of different medical study types, defines and estimates treatment effects, explores cross-over trials, sample size determination, equivalence trials, flexible trial designs, meta-analysis, and accommodating confounding factors. Students learn the basics of clinical trials, principles of good study design, analysis and interpretation of study results, and making accurate scientific inferences.
- Distributed Artificial Intelligence: Explores the fundamental concepts of distributed artificial intelligence in contemporary data analysis. It highlights the role of distributed approaches for fault tolerance and robustness in systems involving multiple software agents, humans, or robots. Students learn to manage distributed systems, either fully or partially, and make decisions about component behavior for achieving desired goals.
- Forecasting: This module addresses the vital role of forecasting in enhancing managerial decision-making. It introduces forecasting in organizations, explores time series patterns, and covers simple forecasting methods (naïve and moving averages). Extrapolative methods like exponential smoothing and ARIMA models are examined, followed by a detailed analysis of causal modelling. Assessment includes a report focusing on causal modelling and time series analysis.
- Methods for Missing Data: Addresses the issue of missing data, which is common in various datasets. It covers different ways missing data can arise and how to handle it to minimize its impact on data analysis. Topics include single imputation methods, Bayesian imputation, multiple imputation (Rubin's rules, chained equations, multivariate methods, and diagnostics), and modelling dropout in longitudinal modelling.
- Optimisation and Heuristics: Focuses on optimization techniques, also known as mathematical programming, and their applications in fields like operational research, computer science, statistics, finance, engineering, and physical sciences. The module emphasizes the use of optimization techniques for solving business problems, introducing different problem formulations and algorithmic methods to guide decision-making in businesses and other organizations.
- Principles of Epidemiology: Introduces epidemiology as the study of the distribution and determinants of disease in human populations, presenting its principles and statistical methods. The module covers fundamental measures of disease, such as incidence, prevalence, risk, rates, and indices of morbidity and mortality. Students learn about epidemiologic study design, including ecological studies, surveys, cohort and case-control studies, diagnostic test studies, bias and confounding, matching and stratification, calculation of rates, standardisation and adjustment, and issues in screening. This module provides an overview of epidemiology, study design strategies, methods for analyzing rates and disease risk, and skills in critically appraising literature. It equips students to understand key statistical issues in ecological studies, surveys, case-control studies, cohort studies, and randomized controlled trials (RCTs), recognizing their advantages and disadvantages.
- Survival and Event History Analysis: Addresses a range of topics related to survival data, covering censoring, hazard functions, Kaplan-Meier plots, parametric models, and likelihood construction. Students explore the Cox proportional hazard model, partial likelihood, Nelson-Aalen estimation, survival time prediction, counting processes, diagnostic methods, frailty models, and effects. This module equips students with an understanding of survival and event history data analysis challenges, non-parametric methods for identifying modelling strategies, and the range of survival techniques that can be implemented in statistical software. Students also develop skills in expressing scientific problems mathematically, improving scientific writing, and enhancing computing skills for data manipulation and analysis. Upon completion, students can apply statistical techniques to survival and event history data using statistical software, interpret outputs from survival models, construct likelihood functions for censored data, and identify appropriate models through diagnostics and model building strategies.
Assessment:
Assessment varies across modules and may include laboratory reports, essays, exercises, literature reviews, short tests, poster sessions, oral presentations, and formal examinations.
Teaching:
The program offers various learning environments, including traditional lectures, computer laboratories, and workshops. There is a focus on timely feedback for submitted work and projects.
Careers:
Graduates from the program are prepared for a range of data-related positions, including data scientists, statisticians, and data analysts. Starting salaries are highly competitive, ranging from £36,000 - £55,000. Our programme opens the door to many possible careers, including Data scientist or data science consultant; Financial modeller; Clinical and pharmaceutical analyst; or Data technologies specialist. Lancaster University offers lifetime support to all students, including one-to-one career advice, work experience guidance, and employability skills development.
Location
Additional costs
There may be extra costs related to your course for items such as books, stationery, printing, photocopying, binding and general subsistence on trips and visits. Following graduation, you may need to pay a subscription to a professional body for some chosen careers. Specific additional costs for studying at Lancaster are listed below.
College fees
Lancaster is proud to be one of only a handful of UK universities to have a collegiate system. Every student belongs to a college, and all students pay a small College Membership Fee which supports the running of college events and activities. Students on some distance-learning courses are not liable to pay a college fee. For students starting in 2023 and 2024, the fee is £40 for undergraduates and research students and £15 for students on one-year courses. Fees for students starting in 2025 have not yet been set.
Computer equipment and internet access
To support your studies, you will also require access to a computer, along with reliable internet access. You will be able to access a range of software and services from a Windows, Mac, Chromebook or Linux device. For certain degree programmes, you may need a specific device, or we may provide you with a laptop and appropriate software - details of which will be available on relevant programme pages. A dedicated IT support helpdesk is available in the event of any problems. The University provides limited financial support to assist students who do not have the required IT equipment or broadband support in place.
Application fees and tuition fee deposits
For most taught postgraduate applications there is a non-refundable application fee of £40. We will let you know in your offer letter if a deposit is required and you will be given a deadline date when this is due to be paid. The fee that you pay will depend on whether you are considered to be a home or international student. Read more about fees in subsequent years.