Chapter 1
Why do we need
parallelism?
Moore's Law kept transistor counts growing for 50+ years — but around 2000, clock speeds stopped improving. The transistors had to go somewhere. They went parallel.
30M×
transistor growth 1975→2023
0×
clock speed growth since 2000
Moore's Law — transistors vs clock speed (1975–2023)
Transistors (log)
Clock speed (MHz)
The wall: Clock speed hit ~3 GHz around 2000 and has barely moved since. Going faster requires more voltage → exponentially more heat → chips would melt. The "free lunch" of waiting for the next CPU to make your code faster was over.
Latency vs Throughput — the key distinction
Latency — time for one operation
Float multiply: ~2ns in 2000. Still ~2ns today. Not improving.
Time 0: start multiply
Time 4: result ready
// 4 clock cycles latency
Throughput — operations per second
Modern CPU: 64 ops/cycle on 4 cores. Massively improving.
Cycle 0: start op A
Cycle 1: start op B (A in progress)
Cycle 2: start op C (A,B in progress)
// pipeline: 1 result/cycle
The Aalto University analogy: A Master's degree takes 2 years (latency). But Aalto graduates 1,960 students per year (throughput) — because thousands of students are in the pipeline simultaneously. Modern CPUs work the same way.
The fundamental requirement — independent operations
Parallelism only works if operations don't depend on each other. This is the core concept the entire course builds on.
❌ Dependent — inherently serial
a1 *= a0;
a2 *= a1; // needs a1
a3 *= a2; // needs a2
a4 *= a3; // needs a3
Like following a linked list — each step blocks the next. Latency-bound.
✓ Independent — can parallelise
b1 *= a1; // independent
b2 *= a2; // independent
b3 *= a3; // independent
b4 *= a4; // independent
All can run simultaneously. Throughput-bound — uses full hardware.