1(4) |
Lecture 1: 2022-01-25
- Class introduction/organization
- Moore's Law
- Transistor density and power limit
- Parallel computing
- Flynn's taxonomy
- Overview of parallelism treated in
class: DLP, ILP, TLP, shared memory
and distributed memory
|
Lecture 2: 2022-01-27
- Computer architecture
- von Neumann architecture
- Memory pyramid
- Linux process anatomy
- Introduction to compute cluster:
access, job submission
- Reading: Leiserson paper
|
Sign-up:
Select one of the offered lab session days according to
your schedule
|
Note:
The "Reading"
assignments are relevant for the lecture and due on the day
of the lecture!
Questions may be asked to
individual students.
-
Doodle for lab day selection
due
(2022-01-28)
|
2(5) |
Lecture 3: 2022-02-01
- Cache memories: why are they there, how they work
- Cache lines and the 3 C's
- What is temporal and spatial locality
- Cache associativity: fully, n-way, direct mapped
- Memory access patterns (differences row-major / column-major)
|
Lecture 4: 2022-02-03
- Shared memory introduction
- Examples of concurrency and concurrent memory access
- Why is shared memory programming
hard: what is a race condition and why/how does it happen
- Quiz 1
|
Lab 1:
Accessing
cluster, SLURM, Linux, compiler and C++
tutorials.
|
- HW1
release
(2022-02-01)
|
3(6)
|
Lecture 5: 2022-02-08
- Memory model for shared memory
programming and its implications on
compilers
- Sequential consistency
- Mutual exclusion / critical sections / locks
- Overview of thread libraries
|
Lecture 6: 2022-02-10
- Introduction to OpenMP: why OpenMP and how to use it in new or existing codes
- OpenMP: fork/join parallel regions
- OpenMP: work sharing constructs
- Reading: OpenMP specification 5.1
Chap. 1 (until 1.4 inclusive)
|
|
-
Lab 1
due
(2022-02-11)
- Project
team formation
due
(2022-02-08)
|
4(7)
|
Lecture 7: 2022-02-15
- OpenMP: data environment
- OpenMP: synchronization constructs
- OpenMP: library routines
- OpenMP: environment variables
|
Lecture 8: 2022-02-17
- OpenMP: data environment
- OpenMP: synchronization constructs
- OpenMP: library routines
- OpenMP: environment variables
- Quiz 2
|
Lab 2:
OpenMP locks, critical sections and atomic
clauses.
|
- HW1
due
(2022-02-15)
- HW2
release
(2022-02-15)
|
5(8)
|
Lecture 9: 2022-02-22
- UMA/NUMA memory architectures and processor affinity
- What is cache coherency and why is it required in shared memory programming
- Cache coherency protocols (focus on MESI)
- False sharing
|
Lecture 10: 2022-02-24
- Performance analysis (single node)
- Relationship of compute performance (flop) to memory bandwidth
- Roofline model
- Reading: Williams paper
|
Lab 3:
False sharing and cache thrashing.
|
-
Lab 2
due
(2022-02-25)
- Project
high-level description
due
(2022-02-22)
|
6(9)
|
Lecture 11: 2022-03-01
- Introduction to distributed programming (recap Flynn's taxonomy)
- What is the Message Passing Interface (MPI)
- Simple parallel MPI program example
|
Lecture 12: 2022-03-03
- MPI: blocking point-to-point
- MPI: blocking collective
- Reading:
MPI 4.0 Standard 3.1, 3.2, 3.4, 3.5
|
|
-
Lab 3
due
(2022-03-04)
|
7(10)
|
Lecture 13:
2022-03-08
- MPI: non-blocking point-to-point
- MPI: non-blocking collective
- Reading:
MPI 4.0 Standard 3.7
|
Lecture 14:
2022-03-10
- MPI: I/O file management
- MPI: I/O read and write routines
- Parallel I/O for data compression example
- Quiz 3
|
Lab 4:
MPI reductions and scans.
|
- HW2
due
(2022-03-08)
- HW3
release
(2022-03-08)
|
8(11)
|
Spring break: 2022-03-15
|
Spring break: 2022-03-17
|
|
|
9(12)
|
Presentations for project
proposals:
2022-03-22
|
Presentations for project
proposals:
2022-03-24
|
|
-
Lab 4
due
(2022-03-25)
- Project
proposals due
|
10(13)
|
Lecture 15: 2022-03-29
- Parallel scaling analysis
- Strong scaling / Amdahl's law
- Weak scaling
- Hybrid MPI and OpenMP (tentative)
- Overhead associated with sending messages (tentative)
- Message packing (tentative)
|
Lecture 16: 2022-03-31
- Instruction set architecture (ISA) / RISC / CISC
- Processor pipelining (ILP)
- Reading:
Hennessy and Patterson Turing
lecture
|
Lab 5:
Linking your code with third party
libraries. Examples for BLAS and LAPACK.
|
- HW3
due
(2022-03-29)
- HW4
release
(2022-03-29)
|
11(14)
|
Lecture 17: 2022-04-05
- Assembly language (
x86-64 )
- Recap Flynn's taxonomy: SIMD
- Instruction set architecture extensions
- What is vectorization and why is it important
- Floating-point operations in
x86-64
|
Lecture 18: 2022-04-07
- Memory alignment and relation to cache lines
- Manual vectorization
- Intel intrinsics
|
|
-
Lab 5
due
(2022-04-08)
|
12(15)
|
Presentations for project
designs: 2022-04-12
|
Presentations for project
designs: 2022-04-14
|
|
- Project
designs due
|
13(16)
|
Lecture 19: 2022-04-19
- Intel intrinsics
- Compiler auto vectorization
- Examples for vectorization
and performance impact (DLP in
roofline)
- Quiz 4
|
Lecture 20: 2022-04-21
- SPMD programming model
- Intel ISPC compiler
- Reading:
Pharr paper
|
Lab 6:
Understanding machine instructions by
learning how to debug code.
|
- HW4
due
(2022-04-19)
- HW5
release
(2022-04-19)
|
14(17)
|
Lecture 21: 2022-04-26
- GPU computing:
- Streaming processors
- Main difference between CPU and GPU
architectures
- SIMD and SIMT
- Streaming multiprocessor and
Little's Law
- Introduction to CUDA
- CUDA warps and threads
- Class summary
|
Reading period: 2022-04-28 |
|
- HW5
due
(2022-05-01)
-
Lab 6
due
(2022-04-29)
|
15(18)
|
Reading
period: 2022-05-03 |
Exam
period: 2022-05-05
- Project final presentations
|
|
- Project
deliverables due
(2022-05-05)
- Project
final presentations due
(2022-05-05)
|
16(19)
|
Exam period: 2022-05-10
|
Exam period: 2022-05-12
|
|
|