Data 101: Data Engineering

UC Berkeley, Fall 2023

Ed Datahub Lecture Recordings Gradescope Extenuating Circumstances Regrade requests

Professor Lisa Yan

Professor Lisa Yan

She/Her/Hers

yanlisa@berkeley.edu

Course contact: data101@berkeley.edu

Schedule

Jump to current week

Week 01

Week 02

Week 03

Mon 9/4
Labor Day (no discussion)
Tue 9/5
Lecture 4Subqueries, Aggregation
Notes, code, code HTML
Thu 9/7
Lecture 5Window functions, Sampling, String manipulation
Notes, code, code HTML
Fri 9/8
Project 1SQL
Due Thu 9/21, 5pm

Week 04

Mon 9/11
Discussion 2Relational Algebra, Views, Subqueries
Solution
Tue 9/12
Lecture 6 DML, DDL, Constraints
Notes, code, code HTML
Thu 9/14
Lecture 7Foreign Keys, Index Selection
Notes, code, code HTML
Fri 9/15
MultiVitamin 1Multivitamin 1
Due Thu 9/28, 5pm

Week 05

Mon 9/18
Discussion 3String Manipulation, DML, DDL and Windowing
Solution
Tue 9/19
Lecture 8Query processing, Optimization I
Notes, code, code HTML
Thu 9/21
Lecture 9Query processing, Optimization II
Notes, code HTML

Project 1 Due, 5pm

Fri 9/22
Project 2Query Performance
Due Thu 10/5, 5pm

Week 06

Mon 9/25
Discussion 4Query Optimization, Logical Plans, Join Orders, and Indexes
Solution
Tue 9/26
Lecture 10[Guest Lecture] Amy Wang, Databricks
Q&A Responses
Thu 9/28
Lecture 11Data Models: Relations, Tensors, Dataframes
Notes, code, code HTML

MultiVitamin 1 Due, 5pm

Fri 9/29
MultiVitamin 2Multivitamin 2 Release
Due Thu 10/12, 5pm

Week 07

Mon 10/2
Discussion 5PostgreSQL Exercises, Data Models
Solution, code
Tue 10/3
Lecture 12Data Preparation I: Structural
Notes, code, code HTML
Thu 10/5
Lecture 13Data Preparation II: Numerical, Granularity
Notes, code, code HTML

Project 2Due, 5pm

Fri 10/6
MultiVitamin 3Multivitamin 3 Release
Due Thu 10/19, 5pm

Week 08

Mon 10/9
Discussion 6Data Preparation, Pivoting, and MDL
Solution, code
Tue 10/10
Lecture 14Data Preparation III: Outliers
Notes, code, code HTML
Thu 10/12
Lecture 15Data Preparation IV: Imputation and Entity Resolution
Notes, code, code HTML
MultiVitamin 2 Due, 5pm
Fri 10/13
Project 3Data Transformation
Due Thu 10/26 Thu 11/2, 5pm

Week 09

Mon 10/16
Discussion 7Hampel X84, Entity Resolution, and Data Granularity
Solution, code
Tue 10/17
Lecture 16ER Diagrams + Normalization
Mid-semester Survey, Notes
Thu 10/19
Lecture 17Semistructured Data: NoSQL, JSON, XML
Notes
MultiVitamin 3 Due, 5pm
Fri 10/20
MultiVitamin 4Multivitamin 4 Release
Due Thu 11/2 Thu 11/9, 5pm

Week 10

Week 11

Mon 10/30
Discussion 9MongoDB Operations; ERD Review
Solution
Tue 10/31
Lecture 20MongoDB II
Notes, code, code HTML
Thu 11/2
Lecture 21Transactions and TCL
Notes
Project 3 Due
Fri 11/3
MultiVitamin 5Multivitamin 5 Release
Due Thu 11/16, 5pm
Project 4Mongo
Due Thu 11/30, 5pm

Week 12

Mon 11/6
Discussion 10MQL; Transactions and Concurrency Control
Solution
Tue 11/7
Lecture 22Transactions II, Parallel/Distributed Computing
Notes
Thu 11/9
Lecture 23 MapReduce, Data Pipelines
Notes
MultiVitamin 4 Due, 5pm

Week 13

Mon 11/13
Discussion 11Transactions, Parallel Processing, and Data Pipelines
Solution
Tue 11/14
Lecture 24BI, OLAP, and Summarization
Notes
Thu 11/16
Lecture 25Sampling
Notes, code, code HTML, [extra] Sketches
MultiVitamin 5 Due, 5pm

Week 14

Mon 11/20
Discussion 12MapReduce, Data Cubes, and OLAP Systems
Solution
Tue 11/21
Lecture 26 [Guest Lecture] Shane Knapp and Balaji Alwar, DataHub
Thu 11/23
Thanksgiving Day (no class)

Week 15

Mon 11/27
Discussion 13 Project 4 OH
Tue 11/28
Lecture 27Spreadsheets
Notes, Drive
Thu 11/30
Lecture 28 Security/Privacy, Closing Thoughts
Notes
Project 4 Final due, 5pm

RRR Week

Tue 12/5
RRR Week (no class)
Thu 12/7
RRR Week (no class)

Finals Week

Fri 12/15
Final Exam Final Exam (8-11am)