21:219:105 Everyday Data (3) This project-based hands-on course will guide students through the 6 phases of the data science workflow (Hypothesize, Acquire, Explore, Deep Dive, Communicate, Implement). The course is divided into 3 data science projects. With the completion of each project, students learn more rigorous data science techniques- a deeper dive into the data by expanding their Python coding, statistical, collaboration and communication skills.

21:219:106 Everyday Data II (3) This project-based hands-on course is the second in the Everyday Data series. It provides students with a deeper understanding of the data quality assurance (QA) and exploratory data analysis steps in the data science workflow (Hypothesize, Acquire, Explore, Deep Dive, Communicate, Implement) and why practicing data scientists spend 70 to 80 percent of their time on these 2 workflow steps. Students will leverage real world datasets used in EDD1 and future data science classes to develop a robust understanding of research and the data science field

21:219:200 Introduction to Ethical Issues in Data Science Through Digital Media (3) Modern life is heavily influenced by the collection of personal data and the algorithms trained on these data. As data scientists in training, it is important to learn of the many ethical aspects that arise from the manipulation of data. Issues of data privacy, transparency, and an individual's ability to consent (exert power over their data) are fundamental to the ethical use of data. Over the past few years, several books and movies have raised awareness of these issues, highlighting how the misuse of data, combined with persuasive technologies, poor decision-making, and a lack of regulation are threatening individual health and livelihood, and society. The purpose of this course is to think of the ethical considerations a data scientist must make through real-life examples of the misuse of data. Students will be exposed to ethical issues in data science through a combination of books, digital media (e.g., documentaries), and select readings that span from the history of ethics in research to the new challenges brought force by big tech and modern technologies. Lectures will focus on the consumption of these media and class discussions.

21:219:216 Measurement and Testing (3) This course introduces undergraduate students to the measurement and analysis of data, with an emphasis on the application of statistical concepts to data commonly encountered in Data Science. The course is divided into four sequential modules, each fundamental to a successful understanding and analysis of data: Exploratory Data Analysis, Data and Sampling Distributions, Experiments and Statistical Significance, Association and Prediction.As a whole, this course aims to offer a well-rounded view of data measurement, and testing, motivate students to investigate the statistical attributes of their variables and critically think on how to best evaluate and disseminate their findings.

21:219:220 Fundamentals of Data Visualization (3) This course introduces undergraduate students to data visualization. The course is intended to teach students how to create meaningful charts and figures that can simultaneously convey useful information and be pleasing to the eye. Students will learn to use the programming language R to develop graphics. The course is divided into three general themes: 1. Research Methods and Statistics; 2. Programming in R; 3. Generating Meaningful and Insightful Graphics. The course aims to offer an interactive environment where students feel comfortable to generate and share ideas.

21:219:240 Mathematics for Data Science I (3) This course introduces algebra, and basic linear algebra concepts relevant to data and computer science and provides a basis for further study in statistical machine learning and data science.

21:219:300 Fundamentals of Popular Data Science Modeling Techniques (3) What do access to credit, insurance rates, prison recidivism and the types of digital advertisement you see on a Web platform have in common? They are all powered by algorithms. These algorithms allow organizations and companies to predict and possibly guide human behavior and policy. This hands-on Python course is designed to provide students with a guided introduction to popular algorithms used by data scientists to predict human actions and generate profiles that classify people, organizations, and objects for varied purposes.

21:219:328 Data Visualization in R (3) As the culmination stage of the analysis pipeline, effective data visualization is crucial to the successful communication of experimental and/or statistical findings. This data visualization course teaches students how to generate effective visualizations for data using the powerful graphic properties of the R programming language. Visualizations explored in this course include: Univariate, bivariate and multivariate graphs, maps, time series and statistical models. Effective data visualization requires mastery of all elements in the analysis pipeline (data import and wrangling, descriptive and inferential statistics). Prospective students should be familiar with the basics of statistics and programming prior to registering for this course.

21:219:329 Statistics and Machine Learning (3) Basic concepts in statistical learning and implementation in Python or R are introduced. Course covers linear regression, logistic regression, ensemble methods, optimization methods for model learning, and various advanced topics such as deep neural networks, kernel learning, and Gaussian processes.

21:219:330 Ethical Issues in Data Science (3) This course will prepare students to think critically so they may confront normative questions that will arise in their future work as practicing computer or data scientists. Students will gain a detailed understanding of some of the most important ethical issues relevant to the field of data science. The course will explore questions about the possibility of bias in automated decision-making systems; about what constitutes appropriate collection, aggregation, and use of personal information about the users of technology services; and about collective and individual responsibility for the social impacts of newly developed technologies. The course will use real-life examples to explore questions involving (1) bias in machine learning; (2) conflicts between preserving customer privacy and corporations collecting consumer data; and (3) corporate and individual responsibility for the harms caused by new technologies.

21:219:340 Mathematics for Data Science II (3) Data scientists extract statistical patterns from raw data such as customer transaction records, x-ray images of patients with certain diseases, customer review comments, aggregated data from sales of houses in some region, etc. As such, they are able to build, for example, recommending systems sending valuable coupons to potential customers, AI-doctors capable of telling preliminary diagnoses, etc. The necessary knowledge of geometry, algebra, and calculus, which allows you to deal with large amounts of high dimensional data and build simple statistical models, e.g., linear and logistic regression, will be reviewed in this class. If time permits, we also learn how to reason the decisions made by these models by quantifying the uncertainty and risk.

21:219:400 Deconstructing Machine Learning Bias (3) This course is designed to use contemporary case studies in algorithmic bias to teach students to identify and deconstruct machine learning (ML) bias. Students will learn how to combine critical reasoning and their understanding of both the modeling process and ML techniques to identify different types of bias, to assess the impact of technical bias on the model (outcome), and discuss the social and economic impact of deploying a biased model. In the second half of the semester, students will apply and critique a statistical-ML solution to mitigate algorithmic bias in a case study dataset.

21:219:410 Ethical Issues in Data Science (3) In performing his or her role in the workplace, the practicing computer or data scientist confronts a number of important ethical questions: questions about the possibility of bias in automated decision-making systems, about what constitutes appropriate collection, aggregation, and use of personal information about the users of technology services, and about collective and individual responsibility for the social impacts of newly developed technologies. The purpose of this course is to consider these questions directly in order to prepare students to think critically about the ethical questions which will arise in connection with their work later in their lives.

21:219:420 Agile IOS Design and Development (3) This unique hands on industry partnership course has been designed to expose students to the end-to-end product (application) design and Agile Scrum development process used by tech companies and entrepreneurs. Students will draft technical documents outlining their design and development process and write code to deliver a functioning application (product) with a data feedback loop for a final grade. All developed products will be presented.

21:219:430 Introduction to Statistical Models in Science and Engineering (3) This is an in-depth survey of simple but useful statistical models employed by scientists and engineers in various domains. We shall first review the essential knowledge of probability and statistics. The statistical models allow us to make data-driven decisions in the framework of Bayesian inference. The models we shall review include: Bayesian linear regression model, logistic regression model, Boltzmann machine, Ising model, Hopfield network, Bayesian neural network, and Gaussian Process.

21:219:450 Independent Study in Data Science (3) The objectives and goals of each independent study will vary, but will typically consist of activities including reading, programming, modelling, experimentation, and writing. Students registered in an independent studies course in data science will be prepared to 1. contribute to a rapidly evolving field by acquiring a thorough grounding in the core principles and foundations of data science, 2. acquire a deeper understanding on (elective) topics of more specialized interest, and be able to critically review, assess, and communicate recent developments in Data Science, and 3. progress to the next step in their career by having, for example, completed an exhaustive review of a critical area of research in Data Science or completed a research project.