Chapter 1

Why do we need
parallelism?

Moore's Law kept transistor counts growing for 50+ years — but around 2000, clock speeds stopped improving. The transistors had to go somewhere. They went parallel.

30M× transistor growth 1975→2023

0× clock speed growth since 2000

Moore's Law — transistors vs clock speed (1975–2023)

Transistors (log) Clock speed (MHz)

The wall: Clock speed hit ~3 GHz around 2000 and has barely moved since. Going faster requires more voltage → exponentially more heat → chips would melt. The "free lunch" of waiting for the next CPU to make your code faster was over.

Latency vs Throughput — the key distinction

Latency — time for one operation

Float multiply: ~2ns in 2000. Still ~2ns today. Not improving.

Time 0: start multiply
Time 4: result ready
// 4 clock cycles latency

Throughput — operations per second

Modern CPU: 64 ops/cycle on 4 cores. Massively improving.

Cycle 0: start op A
Cycle 1: start op B (A in progress)
Cycle 2: start op C (A,B in progress)
// pipeline: 1 result/cycle

The Aalto University analogy: A Master's degree takes 2 years (latency). But Aalto graduates 1,960 students per year (throughput) — because thousands of students are in the pipeline simultaneously. Modern CPUs work the same way.

The fundamental requirement — independent operations

Parallelism only works if operations don't depend on each other. This is the core concept the entire course builds on.

❌ Dependent — inherently serial

a1 *= a0;
a2 *= a1; // needs a1
a3 *= a2; // needs a2
a4 *= a3; // needs a3

Like following a linked list — each step blocks the next. Latency-bound.

✓ Independent — can parallelise

b1 *= a1; // independent
b2 *= a2; // independent
b3 *= a3; // independent
b4 *= a4; // independent

All can run simultaneously. Throughput-bound — uses full hardware.

Chapter 2: V0→V7 →

Why do we needparallelism?

Why do we need
parallelism?