| 1(4) |
Lecture 1: 2023-01-24
- Class introduction/organization
- Moore's Law
- Transistor density and power limit
- Parallel computing
- Flynn's taxonomy
- Overview of parallelism treated in
class: DLP, ILP, TLP, shared memory
and distributed memory
|
Lecture 2: 2023-01-26
- Computer architecture
- von Neumann architecture
- Memory pyramid
- Linux process anatomy
- Introduction to compute cluster:
access, job submission
- Reading: Leiserson paper
|
Sign-up:
Select one of the offered lab session days according to
your schedule
|
Note:
The "Reading"
assignments are relevant for the lecture and due on the day
of the lecture!
Questions may be asked to
individual students.
-
Lab section preferences
submitted on my.harvard
(2023-01-27)
|
| 2(5) |
Lecture 3: 2023-01-31
- Cache memories: why are they there, how they work
- Cache lines and the 3 C's
- What is temporal and spatial locality
- Cache associativity: fully, n-way, direct mapped
- Memory access patterns (differences row-major / column-major)
|
Lecture 4: 2023-02-02
- Shared memory introduction
- Examples of concurrency and concurrent memory access
- Why is shared memory programming
hard: what is a race condition and why/how does it happen
- Quiz 1
|
Lab 1:
Accessing
cluster, SLURM, Linux, compiler and C++
tutorials.
|
- HW1
release
(2023-01-31)
|
|
3(6)
|
Lecture 5: 2023-02-07
- Memory model for shared memory
programming and its implications on
compilers
- Sequential consistency
- Mutual exclusion / critical sections / locks
- Overview of thread libraries
|
Lecture 6: 2023-02-09
- Introduction to OpenMP: why OpenMP and how to use it in new or existing codes
- OpenMP: fork/join parallel regions
- OpenMP: work sharing constructs
- Reading: OpenMP specification 5.1
Chap. 1 (until 1.4 inclusive)
|
|
-
Lab 1
due
(2023-02-10)
- Project
team formation
due
(2023-02-07)
|
|
4(7)
|
Lecture 7: 2023-02-14
- OpenMP: data environment
- OpenMP: synchronization constructs
- OpenMP: library routines
- OpenMP: environment variables
|
Lecture 8: 2023-02-16
- OpenMP: data environment
- OpenMP: synchronization constructs
- OpenMP: library routines
- OpenMP: environment variables
- Quiz 2
|
Lab 2:
OpenMP locks, critical sections and atomic
clauses.
|
- HW1
due
(2023-02-14)
- HW2
release
(2023-02-14)
|
|
5(8)
|
Lecture 9: 2023-02-21
- UMA/NUMA memory architectures and processor affinity
- What is cache coherency and why is it required in shared memory programming
- Cache coherency protocols (focus on MESI)
- False sharing
|
Lecture 10: 2023-02-23
- Performance analysis (single node)
- Relationship of compute performance (flop) to memory bandwidth
- Roofline model
- Reading: Williams paper
|
Lab 3:
False sharing and cache thrashing.
|
-
Lab 2
due
(2023-02-24)
- Project
high-level description
due
(2023-02-21)
|
|
6(9)
|
Lecture 11: 2023-02-28
- Introduction to distributed programming (recap Flynn's taxonomy)
- What is the Message Passing Interface (MPI)
- Simple parallel MPI program example
|
Lecture 12: 2023-03-02
- MPI: blocking point-to-point
- MPI: blocking collective
- Reading:
MPI 4.0 Standard 3.1, 3.2, 3.4, 3.5
|
|
|
|
7(10)
|
Lecture 13:
2023-03-07
- MPI: non-blocking point-to-point
- MPI: non-blocking collective
- Reading:
MPI 4.0 Standard 3.7, 6.1
|
Lecture 14:
2023-03-09
- MPI: I/O file management
- MPI: I/O read and write routines
- Parallel I/O for data compression example
- Quiz 3
|
Lab 4:
MPI reductions and scans.
|
- HW2
due
(2023-03-07)
- HW3
release
(2023-03-07)
-
Lab 3
due
(2023-03-10)
|
|
8(11)
|
Spring break: 2023-03-14
|
Spring break: 2023-03-16
|
|
|
|
9(12)
|
Presentations for project
proposals:
2023-03-21
|
Presentations for project
proposals:
2023-03-23
|
|
-
Lab 4
due
(2023-03-24)
- Project
proposals due
|
|
10(13)
|
Lecture 15: 2023-03-28
- Parallel scaling analysis
- Strong scaling / Amdahl's law
- Weak scaling
|
Lecture 16: 2023-03-30
- Instruction set architecture (ISA) / RISC / CISC
- Processor pipelining (ILP)
- Reading:
Hennessy and Patterson Turing
lecture
|
Lab 5:
Linking your code with third party
libraries. Examples for BLAS and LAPACK.
|
- HW3
due
(2023-03-28)
- HW4
release
(2023-03-28)
|
|
11(14)
|
Lecture 17: 2023-04-04
- Assembly language (
x86-64)
- Recap Flynn's taxonomy: SIMD
- Instruction set architecture extensions
- What is vectorization and why is it important
- Floating-point operations in
x86-64
|
Lecture 18: 2023-04-06
- Memory alignment and relation to cache lines
- Manual vectorization
- Intel intrinsics
|
|
-
Lab 5
due
(2023-04-07)
|
|
12(15)
|
Presentations for project
designs: 2023-04-11
|
Presentations for project
designs: 2023-04-13
|
|
- Project
designs due
|
|
13(16)
|
Lecture 19: 2023-04-18
- Intel intrinsics
- Compiler auto vectorization
- Examples for vectorization
and performance impact (DLP in
roofline)
|
Lecture 20: 2023-04-20
- SPMD programming model
- Intel ISPC compiler
- Reading: Pharr paper
- Quiz 4
|
Lab 6:
Understanding machine instructions by
learning how to debug code.
|
- HW4
due
(2023-04-18)
- HW5
release
(2023-04-18)
|
|
14(17)
|
Lecture 21: 2023-04-25
- GPU computing (outlook):
- Streaming processors
- Main difference between CPU and GPU
architectures
- SIMD and SIMT
- Streaming multiprocessor and
Little's Law
- Introduction to CUDA
- CUDA warps and threads
- Class summary
|
Reading period: 2023-04-27 |
|
- HW5
due
(2023-04-30)
-
Lab 6
due
(2023-04-28)
|
|
15(18)
|
Reading
period: 2023-05-02 |
Exam
period: 2023-05-04
- Project final presentations:
|
|
- Project
deliverables due
(2023-05-04 8:00AM)
- Project
final presentations due
(2023-05-04/05)
|
|
16(19)
|
Exam period: 2023-05-09
|
Exam period: 2023-05-11
|
|
|