Parallelism and Concurrency

Module Objectives

By completing this module you will be able to:

Understand the difference between parallelism and concurrency
Use threading for I/O-bound tasks
Use multiprocessing for CPU-bound tasks
Implement asynchronous code with asyncio

Why Do Multiple Things at Once?

Your program works. It does what it’s supposed to do. But… could it go faster if it did several things at once?

Imagine a restaurant. A single waiter can serve 10 tables: takes an order, brings it to the kitchen, serves another table, bills another… It works, but if the restaurant fills with 100 customers, that waiter can’t keep up. You need more staff or a more efficient system.

In programming the same thing happens:

Downloading 100 files one by one can take hours. Downloading them in parallel, minutes
Processing thousands of images sequentially is slow. With multiple cores working, much faster
An app that waits for the server to respond stays “frozen”. If it does other things while waiting, it keeps responding to the user

The good news is Python gives you three tools for this: threading, multiprocessing, and asyncio. The key is knowing which to use in each situation.

Concurrency vs Parallelism

Before looking at the tools, you need to understand the difference between these two concepts. They’re similar but they’re not the same.

The Chef Metaphor

Concurrency is like a chef preparing several dishes at once, but only has two hands. They put the rice on to boil, and while waiting, cut the vegetables. Then go back to the rice, stir, and take the opportunity to heat up the pan. They’re alternating between tasks, but at any moment they’re only doing one thing.

Parallelism is having several chefs in the kitchen, each working on their dish at the same time. One does the rice, another the salad, another the dessert. Truly simultaneous work.

graph TD
    subgraph Concurrency["Concurrency: 1 chef, several dishes"]
        C1["Start rice"]
        C2["Cut vegetables"]
        C3["Stir rice"]
        C4["Dress salad"]
        C1 --> C2 --> C3 --> C4
    end

    subgraph Parallelism["Parallelism: several chefs"]
        P1["Chef 1: Rice"]
        P2["Chef 2: Salad"]
        P3["Chef 3: Dessert"]
    end

The GIL: Python’s Kitchen

Here comes the important part. Python has something called the Global Interpreter Lock (GIL), which is like saying Python’s kitchen only has one powerful stove.

With threading, you can have several chefs, but only one can use the stove at a time. The others have to wait their turn. This means threading doesn’t speed up calculations (everyone needs the stove), but it works well when chefs spend a lot of time waiting (for water to boil, for an order to arrive…).

With multiprocessing, each chef has their own complete kitchen. They can work truly in parallel, but setting up separate kitchens has a cost.

Golden Rule

Does your code wait a lot? (downloads, files, APIs) → Use threading or asyncio
Does your code calculate a lot? (math, processing) → Use multiprocessing

Threading: The Juggler

Threading is like a juggler keeping several balls in the air. They can’t catch them all at once, but they alternate so fast between them that it seems like they’re doing everything simultaneously.

When to use threading? When your program spends a lot of time waiting:

Waiting for a server to respond
Waiting for a file to download
Waiting for a database to return data

While you wait for one thing, you can do another. It’s like when you cook: while the water boils (waiting), you cut vegetables (useful work).

Create Basic Threads

A “thread” is like an assistant who can do a task while you do another. Here’s how to create them:

 1import threading
 2import time
 3
 4def task(name, duration):
 5    print(f"[{name}] Starting...")
 6    time.sleep(duration)  # Simulates waiting (download, API, etc.)
 7    print(f"[{name}] Completed!")
 8
 9# Create two threads (two assistants)
10thread1 = threading.Thread(target=task, args=("Task1", 2))
11thread2 = threading.Thread(target=task, args=("Task2", 1))
12
13# Start both (they start working in parallel)
14thread1.start()
15thread2.start()
16
17# Wait for both to finish
18thread1.join()
19thread2.join()
20
21print("All tasks completed")

Without threading, this would take 3 seconds (2 + 1). With threading, it takes only 2 seconds because both tasks “wait” at the same time.

ThreadPoolExecutor: The Download Team

When you have many similar tasks (download 100 files, for example), you don’t want to create 100 threads manually. ThreadPoolExecutor is like having a download team ready to work:

 1from concurrent.futures import ThreadPoolExecutor
 2import time
 3
 4def download(url):
 5    print(f"Downloading {url}...")
 6    time.sleep(1)  # Simulate download
 7    return f"Content from {url}"
 8
 9urls = ["url1.com", "url2.com", "url3.com", "url4.com"]
10
11# Create a team of 4 workers
12with ThreadPoolExecutor(max_workers=4) as executor:
13    results = list(executor.map(download, urls))
14
15for result in results:
16    print(result)

The 4 downloads happen “at once” (while one waits, another advances), so it takes ~1 second instead of ~4.

Synchronization with Lock: The Bathroom Latch

Here comes a problem. If two threads try to modify the same variable at the same time, they can step on each other. It’s like if two people tried to write on the same paper at the same time: the result would be a mess.

The solution is a Lock. It works like a bathroom latch: only one person can enter at a time. Others wait their turn.

 1import threading
 2
 3counter = 0
 4lock = threading.Lock()
 5
 6def increment():
 7    global counter
 8    for _ in range(100000):
 9        with lock:  # "I lock the latch" - only I can touch the counter
10            counter += 1
11        # Here "I unlock the latch" - the next can enter
12
13threads = [threading.Thread(target=increment) for _ in range(4)]
14for t in threads:
15    t.start()
16for t in threads:
17    t.join()
18
19print(f"Final counter: {counter}")  # 400000 (correct!)

Without Lock

Without the Lock, the counter might give an incorrect number (for example, 387432 instead of 400000). This is called a race condition and is one of the hardest bugs to detect.

Multiprocessing: Hire More Workers

If threading is a juggler, multiprocessing is hiring more employees. Each process is an independent worker with their own “kitchen” (memory, resources). They can work truly in parallel, without sharing the stove.

When to use multiprocessing? When your program does heavy calculations:

Process thousands of images
Calculate large prime numbers
Train machine learning models
Compress/decompress files

graph LR
    subgraph Sequential["Sequential: 1 worker"]
        S1["Task 1"] --> S2["Task 2"] --> S3["Task 3"] --> S4["Task 4"]
    end

    subgraph Parallel["Parallel: 4 workers"]
        P1["Task 1"]
        P2["Task 2"]
        P3["Task 3"]
        P4["Task 4"]
    end

Basic Processes

Each process has its own ID (PID), like each employee has their ID number:

 1from multiprocessing import Process
 2import os
 3
 4def task(name):
 5    print(f"[{name}] PID: {os.getpid()}")
 6    # CPU-intensive work goes here
 7
 8if __name__ == "__main__":
 9    processes = []
10    for i in range(4):
11        p = Process(target=task, args=(f"Process-{i}",))
12        processes.append(p)
13        p.start()
14
15    for p in processes:
16        p.join()

Important: if name

The if __name__ == "__main__": is mandatory in multiprocessing. Without it, each process would try to create more processes, creating an infinite loop. It’s like telling your employees: “Only the boss can hire new people”.

Process Pool: The Calculation Team

Just like with threading, you don’t want to create processes manually for hundreds of tasks. Pool gives you a team of ready workers:

 1from multiprocessing import Pool
 2import time
 3
 4def calculate_square(n):
 5    time.sleep(0.1)  # Simulate heavy calculation
 6    return n ** 2
 7
 8if __name__ == "__main__":
 9    numbers = list(range(20))
10
11    # Sequential: one worker does everything
12    start = time.time()
13    results_seq = [calculate_square(n) for n in numbers]
14    print(f"Sequential: {time.time() - start:.2f}s")  # ~2 seconds
15
16    # Parallel: 4 workers share the task
17    start = time.time()
18    with Pool(4) as pool:
19        results_par = pool.map(calculate_square, numbers)
20    print(f"Parallel: {time.time() - start:.2f}s")  # ~0.5 seconds

ProcessPoolExecutor: The Modern Interface

ProcessPoolExecutor works the same as ThreadPoolExecutor, making it easy to switch between threads and processes:

 1from concurrent.futures import ProcessPoolExecutor
 2import math
 3
 4def is_prime(n):
 5    if n < 2:
 6        return False
 7    for i in range(2, int(math.sqrt(n)) + 1):
 8        if n % i == 0:
 9            return False
10    return True
11
12if __name__ == "__main__":
13    numbers = range(100000, 100020)
14
15    with ProcessPoolExecutor(max_workers=4) as executor:
16        results = list(executor.map(is_prime, numbers))
17
18    primes = [n for n, is_p in zip(numbers, results) if is_p]
19    print(f"Primes found: {primes}")

How many workers?

A good rule is to use as many workers as your CPU has cores. You can get this number with os.cpu_count(). Using more workers than cores doesn’t speed anything up, it just adds overhead.

Asyncio: The Efficient Waiter

Asyncio is like a super efficient waiter in a restaurant. They don’t stand around waiting for the kitchen to prepare the dish. As soon as they place an order, they go to another table to take notes. Then serve drinks, bill another table, and when the dish is ready, they pick it up and serve it. A single waiter can serve many tables because they never wait doing nothing.

graph LR
    A["Take order Table 1"] --> B["Take order Table 2"]
    B --> C["Serve drink Table 3"]
    C --> D["Pick up dish Table 1"]
    D --> E["Serve dish Table 1"]

When to use asyncio? When you have many network operations:

Download hundreds of web pages
Make requests to multiple APIs
Handle many simultaneous connections (chat, websockets)

The Magic Words: async and await

async: “This function can pause and continue later”
await: “This is going to take a while, so while I wait, do something else”

 1import asyncio
 2
 3async def task(name, duration):
 4    print(f"[{name}] Starting...")
 5    await asyncio.sleep(duration)  # "I wait, but don't block"
 6    print(f"[{name}] Completed!")
 7    return name
 8
 9async def main():
10    # gather() executes all tasks "at once"
11    results = await asyncio.gather(
12        task("A", 2),
13        task("B", 1),
14        task("C", 3)
15    )
16    print(f"Results: {results}")
17
18# Entry point for async code
19asyncio.run(main())

The three tasks take 2, 1, and 3 seconds. How long does the program take? Only 3 seconds (not 6), because while one waits, the others progress.

Asynchronous Downloads with aiohttp

For real downloads you need aiohttp (the async version of requests):

 1import asyncio
 2import aiohttp  # pip install aiohttp
 3
 4async def download_url(session, url):
 5    async with session.get(url) as response:
 6        return await response.text()
 7
 8async def download_all(urls):
 9    async with aiohttp.ClientSession() as session:
10        tasks = [download_url(session, url) for url in urls]
11        results = await asyncio.gather(*tasks)
12        return results
13
14urls = [
15    "https://httpbin.org/delay/1",
16    "https://httpbin.org/delay/2",
17    "https://httpbin.org/delay/1"
18]
19
20# Takes ~2 seconds instead of ~4
21results = asyncio.run(download_all(urls))

Semaphores: Don’t Crash the Server

If you download 1000 URLs at once, you might crash the server (or get blocked). A semaphore is like a maximum capacity sign: it only allows N simultaneous tasks.

 1import asyncio
 2
 3async def limited_task(semaphore, name):
 4    async with semaphore:  # "I wait my turn if it's crowded"
 5        print(f"[{name}] Executing...")
 6        await asyncio.sleep(1)
 7        print(f"[{name}] Completed")
 8
 9async def main():
10    semaphore = asyncio.Semaphore(3)  # Maximum 3 at once
11
12    tasks = [
13        limited_task(semaphore, f"Task-{i}")
14        for i in range(10)
15    ]
16
17    await asyncio.gather(*tasks)
18
19asyncio.run(main())

The 10 tasks execute, but never more than 3 simultaneously. It’s like a fitting room: even if there’s a queue, only 3 people can be trying on clothes at once.

When to Use Each Approach

Situation	Solution	Real example
Download files	threading or asyncio	Download 50 images from internet
Call APIs	asyncio + aiohttp	Query prices at 10 stores
Read/write files	threading	Process 100 logs simultaneously
Math calculations	multiprocessing	Calculate primes up to 10 million
Process images	multiprocessing.Pool	Resize 1000 photos
Web server	asyncio	Handle thousands of connections

Quick Comparison

Aspect	Threading	Multiprocessing	Asyncio
Memory	Shared	Separate	Shared
Complexity	Medium	High	High
Best for	I/O, few tasks	CPU, calculations	I/O, many tasks
Limitation	GIL	Process overhead	Only async code

Common Mistakes

Traps to Avoid

Forgetting if __name__ == "__main__": in multiprocessing
- Your program will enter an infinite loop creating processes
Not using Lock when sharing data between threads
- Race conditions: incorrect and unpredictable results
Using threading for heavy calculations
- The GIL prevents it from being faster. Use multiprocessing
Mixing synchronous code in asyncio
- time.sleep() blocks everything. Use await asyncio.sleep()
- requests.get() blocks. Use aiohttp
Creating too many processes/threads
- More processes than cores = overhead without benefit
- Thousands of threads = excessive memory consumption

Quick Decision Guide

Don’t know what to use? Follow this diagram:

graph TD
    A["What does your code do?"] --> B{"Waits a lot?<br/>(network, files, APIs)"}
    B -->|Yes| C{"How many tasks?"}
    B -->|No, calculates| D["multiprocessing"]
    C -->|Few| E["threading"]
    C -->|Many| F["asyncio"]

80/20 Rule

80% of cases are solved like this:

Downloads/APIs: asyncio with aiohttp
Data processing: multiprocessing.Pool
Simple parallel tasks: ThreadPoolExecutor

Practical Exercises

Exercise 1: Parallel Downloader

Create a function that downloads multiple URLs in parallel using ThreadPoolExecutor.

See solution

 1from concurrent.futures import ThreadPoolExecutor
 2import urllib.request
 3import time
 4
 5def download_url(url):
 6    try:
 7        with urllib.request.urlopen(url, timeout=5) as response:
 8            return len(response.read())
 9    except Exception as e:
10        return f"Error: {e}"
11
12def download_parallel(urls, max_workers=4):
13    with ThreadPoolExecutor(max_workers=max_workers) as executor:
14        results = dict(zip(urls, executor.map(download_url, urls)))
15    return results
16
17urls = [
18    "https://www.python.org",
19    "https://www.google.com",
20    "https://www.github.com"
21]
22
23start = time.time()
24results = download_parallel(urls)
25print(f"Time: {time.time() - start:.2f}s")
26for url, size in results.items():
27    print(f"{url}: {size} bytes")

Exercise 2: Parallel Prime Calculation

Use multiprocessing to find all primes up to N in parallel.