The University of Sheffield
Department of Computer Science

COM6018 Data Science with Python

 
Summary This module starts with a rapid review of basic background mathematics and statistics, and an introduction to Python. The module will then introduce students to a range of statistical and programming techniques and give practice in their implementation and interpretation using Python. It aims to help students develop the knowledge and experience to select and use appropriate techniques for a variety of problems. The emphasis will be on practical application of techniques and knowledge of their scope rather than development of theoretical underpinnings. Areas to be covered include: exploratory data analysis, simple checks on data, statistical data modelling, programming and optimization. Students will also learn the fundamentals of robust data management and reproducible scientific analysis. 
Session Autumn 2023/24
Credits 15
Assessment
  • Jupyter Notebook 
  • Analysis Project 
  •  
Lecturer(s) Prof Jon Barker
Resources
Aims

This unit aims to

  • Introduce students to a range of data analysis techniques using the latest Python modules and tools. 
  • To help students develop the knowledge and experience needed for selecting the appropriate techniques for a variety of problems. 
  • To develop an understanding of how to present data science clearly, rigorously and reproducibly that can be applied when presenting new analyses or when evaluating published works. 
Learning Outcomes

By the end of the module, the student will be able to:

  • Understand introductory methods for statistical analysis 
  • Apply the statistical techniques dealt with in the module by using Python 
  • Analyse a dataset by applying appropriate statistical techniques 
  • Apply mainstream data visualisation techniques using Python 
  • Create clear and well-structured Python programs 
  • Create clear data analysis reports using Jupyter notebooks or similar technology. 
Content
  • Getting started with Python and Jupyter 
  • Dealing with data with native Python  
  • Numerical computing with NumPy 
  • Data wrangling with Pandas 
  • Exploratory Data Analysis and Visualization 
  • Probability, Distributions and Sampling 
  • Statistical Testing for Data Science
  • Introduction to Machine Learning with sklearn
  • Machine learning techniques for Classification
  • Evaluating ML models
  • Machine learning techniques for Regression
  • Optimizing ML models 
Teaching Method  Lecturers and lab sessions.
Feedback
  • Model solutions to lab classes will be made available 
  • Feedback will be provided for both assessments