Schedule

Due events are indicated in red in the column on the right. All due events with a given date are due on 11:59pm that day.

Week Tuesday Thursday Labs Events
1(4) Lecture 1: 2022-01-25
  • Class introduction/organization
  • Moore's Law
  • Transistor density and power limit
  • Parallel computing
  • Flynn's taxonomy
  • Overview of parallelism treated in class: DLP, ILP, TLP, shared memory and distributed memory
Lecture 2: 2022-01-27
  • Computer architecture
  • von Neumann architecture
  • Memory pyramid
  • Linux process anatomy
  • Introduction to compute cluster: access, job submission
  • Reading: Leiserson paper
Sign-up:

Select one of the offered lab session days according to your schedule

Note:

The "Reading" assignments are relevant for the lecture and due on the day of the lecture! Questions may be asked to individual students.

  1. Doodle for lab day selection due
    (2022-01-28)
2(5) Lecture 3: 2022-02-01
  • Cache memories: why are they there, how they work
  • Cache lines and the 3 C's
  • What is temporal and spatial locality
  • Cache associativity: fully, nn-way, direct mapped
  • Memory access patterns (differences row-major / column-major)
Lecture 4: 2022-02-03
  • Shared memory introduction
  • Examples of concurrency and concurrent memory access
  • Why is shared memory programming hard: what is a race condition and why/how does it happen
  • Quiz 1
Lab 1:

Accessing cluster, SLURM, Linux, compiler and C++ tutorials.


  1. HW1 release
    (2022-02-01)
3(6) Lecture 5: 2022-02-08
  • Memory model for shared memory programming and its implications on compilers
  • Sequential consistency
  • Mutual exclusion / critical sections / locks
  • Overview of thread libraries
Lecture 6: 2022-02-10
  • Introduction to OpenMP: why OpenMP and how to use it in new or existing codes
  • OpenMP: fork/join parallel regions
  • OpenMP: work sharing constructs
  • Reading: OpenMP specification 5.1 Chap. 1 (until 1.4 inclusive)

  1. Lab 1 due
    (2022-02-11)
  2. Project team formation due
    (2022-02-08)
4(7) Lecture 7: 2022-02-15
  • OpenMP: data environment
  • OpenMP: synchronization constructs
  • OpenMP: library routines
  • OpenMP: environment variables
Lecture 8: 2022-02-17
  • OpenMP: data environment
  • OpenMP: synchronization constructs
  • OpenMP: library routines
  • OpenMP: environment variables
  • Quiz 2
Lab 2:

OpenMP locks, critical sections and atomic clauses.


  1. HW1 due
    (2022-02-15)
  2. HW2 release
    (2022-02-15)
5(8) Lecture 9: 2022-02-22
  • UMA/NUMA memory architectures and processor affinity
  • What is cache coherency and why is it required in shared memory programming
  • Cache coherency protocols (focus on MESI)
  • False sharing
Lecture 10: 2022-02-24
  • Performance analysis (single node)
  • Relationship of compute performance (flop) to memory bandwidth
  • Roofline model
  • Reading: Williams paper
Lab 3:

False sharing and cache thrashing.


  1. Lab 2 due
    (2022-02-25)
  2. Project high-level description due
    (2022-02-22)
6(9) Lecture 11: 2022-03-01
  • Introduction to distributed programming (recap Flynn's taxonomy)
  • What is the Message Passing Interface (MPI)
  • Simple parallel MPI program example
Lecture 12: 2022-03-03
  • MPI: blocking point-to-point
  • MPI: blocking collective
  • Reading: MPI 4.0 Standard 3.1, 3.2, 3.4, 3.5

  1. Lab 3 due
    (2022-03-04)
7(10) Lecture 13: 2022-03-08
  • MPI: non-blocking point-to-point
  • MPI: non-blocking collective
  • Reading: MPI 4.0 Standard 3.7
Lecture 14: 2022-03-10
  • MPI: I/O file management
  • MPI: I/O read and write routines
  • Parallel I/O for data compression example
  • Quiz 3
Lab 4:

MPI reductions and scans.


  1. HW2 due
    (2022-03-08)
  2. HW3 release
    (2022-03-08)
8(11) Spring break: 2022-03-15 Spring break: 2022-03-17
9(12) Presentations for project proposals:
2022-03-22
Presentations for project proposals:
2022-03-24

  1. Lab 4 due
    (2022-03-25)
  2. Project proposals due
10(13) Lecture 15: 2022-03-29
  • Parallel scaling analysis
  • Strong scaling / Amdahl's law
  • Weak scaling
  • Hybrid MPI and OpenMP (tentative)
  • Overhead associated with sending messages (tentative)
  • Message packing (tentative)
Lecture 16: 2022-03-31
  • Instruction set architecture (ISA) / RISC / CISC
  • Processor pipelining (ILP)
  • Reading: Hennessy and Patterson Turing lecture
Lab 5:

Linking your code with third party libraries. Examples for BLAS and LAPACK.


  1. HW3 due
    (2022-03-29)
  2. HW4 release
    (2022-03-29)
11(14) Lecture 17: 2022-04-05
  • Assembly language (x86-64)
  • Recap Flynn's taxonomy: SIMD
  • Instruction set architecture extensions
  • What is vectorization and why is it important
  • Floating-point operations in x86-64
Lecture 18: 2022-04-07
  • Memory alignment and relation to cache lines
  • Manual vectorization
  • Intel intrinsics

  1. Lab 5 due
    (2022-04-08)
12(15) Presentations for project designs:
2022-04-12
Presentations for project designs:
2022-04-14

  1. Project designs due
13(16) Lecture 19: 2022-04-19
  • Intel intrinsics
  • Compiler auto vectorization
  • Examples for vectorization and performance impact (DLP in roofline)
  • Quiz 4
Lecture 20: 2022-04-21
  • SPMD programming model
  • Intel ISPC compiler
  • Reading: Pharr paper
Lab 6:

Understanding machine instructions by learning how to debug code.


  1. HW4 due
    (2022-04-19)
  2. HW5 release
    (2022-04-19)
14(17) Lecture 21: 2022-04-26
  • GPU computing:
    • Streaming processors
    • Main difference between CPU and GPU architectures
    • SIMD and SIMT
    • Streaming multiprocessor and Little's Law
    • Introduction to CUDA
    • CUDA warps and threads
  • Class summary
Reading period: 2022-04-28
  1. HW5 due
    (2022-05-01)
  2. Lab 6 due
    (2022-04-29)
15(18) Reading period: 2022-05-03 Exam period: 2022-05-05
  • Project final presentations

  1. Project deliverables due
    (2022-05-05)
  2. Project final presentations due
    (2022-05-05)
16(19) Exam period: 2022-05-10 Exam period: 2022-05-12