Parallelism and Concurrency
By completing this module you will be able to:
- Understand the difference between parallelism and concurrency
- Use threading for I/O-bound tasks
- Use multiprocessing for CPU-bound tasks
- Implement asynchronous code with asyncio
Why Do Multiple Things at Once?
Your program works. It does what it’s supposed to do. But… could it go faster if it did several things at once?
Imagine a restaurant. A single waiter can serve 10 tables: takes an order, brings it to the kitchen, serves another table, bills another… It works, but if the restaurant fills with 100 customers, that waiter can’t keep up. You need more staff or a more efficient system.
In programming the same thing happens:
- Downloading 100 files one by one can take hours. Downloading them in parallel, minutes
- Processing thousands of images sequentially is slow. With multiple cores working, much faster
- An app that waits for the server to respond stays “frozen”. If it does other things while waiting, it keeps responding to the user
The good news is Python gives you three tools for this: threading, multiprocessing, and asyncio. The key is knowing which to use in each situation.
Concurrency vs Parallelism
Before looking at the tools, you need to understand the difference between these two concepts. They’re similar but they’re not the same.
The Chef Metaphor
Concurrency is like a chef preparing several dishes at once, but only has two hands. They put the rice on to boil, and while waiting, cut the vegetables. Then go back to the rice, stir, and take the opportunity to heat up the pan. They’re alternating between tasks, but at any moment they’re only doing one thing.
Parallelism is having several chefs in the kitchen, each working on their dish at the same time. One does the rice, another the salad, another the dessert. Truly simultaneous work.
graph TD
subgraph Concurrency["Concurrency: 1 chef, several dishes"]
C1["Start rice"]
C2["Cut vegetables"]
C3["Stir rice"]
C4["Dress salad"]
C1 --> C2 --> C3 --> C4
end
subgraph Parallelism["Parallelism: several chefs"]
P1["Chef 1: Rice"]
P2["Chef 2: Salad"]
P3["Chef 3: Dessert"]
endThe GIL: Python’s Kitchen
Here comes the important part. Python has something called the Global Interpreter Lock (GIL), which is like saying Python’s kitchen only has one powerful stove.
With threading, you can have several chefs, but only one can use the stove at a time. The others have to wait their turn. This means threading doesn’t speed up calculations (everyone needs the stove), but it works well when chefs spend a lot of time waiting (for water to boil, for an order to arrive…).
With multiprocessing, each chef has their own complete kitchen. They can work truly in parallel, but setting up separate kitchens has a cost.
- Does your code wait a lot? (downloads, files, APIs) → Use threading or asyncio
- Does your code calculate a lot? (math, processing) → Use multiprocessing
Threading: The Juggler
Threading is like a juggler keeping several balls in the air. They can’t catch them all at once, but they alternate so fast between them that it seems like they’re doing everything simultaneously.
When to use threading? When your program spends a lot of time waiting:
- Waiting for a server to respond
- Waiting for a file to download
- Waiting for a database to return data
While you wait for one thing, you can do another. It’s like when you cook: while the water boils (waiting), you cut vegetables (useful work).
Create Basic Threads
A “thread” is like an assistant who can do a task while you do another. Here’s how to create them:
1import threading
2import time
3
4def task(name, duration):
5 print(f"[{name}] Starting...")
6 time.sleep(duration) # Simulates waiting (download, API, etc.)
7 print(f"[{name}] Completed!")
8
9# Create two threads (two assistants)
10thread1 = threading.Thread(target=task, args=("Task1", 2))
11thread2 = threading.Thread(target=task, args=("Task2", 1))
12
13# Start both (they start working in parallel)
14thread1.start()
15thread2.start()
16
17# Wait for both to finish
18thread1.join()
19thread2.join()
20
21print("All tasks completed")Without threading, this would take 3 seconds (2 + 1). With threading, it takes only 2 seconds because both tasks “wait” at the same time.
ThreadPoolExecutor: The Download Team
When you have many similar tasks (download 100 files, for example), you don’t want to create 100 threads manually. ThreadPoolExecutor is like having a download team ready to work:
1from concurrent.futures import ThreadPoolExecutor
2import time
3
4def download(url):
5 print(f"Downloading {url}...")
6 time.sleep(1) # Simulate download
7 return f"Content from {url}"
8
9urls = ["url1.com", "url2.com", "url3.com", "url4.com"]
10
11# Create a team of 4 workers
12with ThreadPoolExecutor(max_workers=4) as executor:
13 results = list(executor.map(download, urls))
14
15for result in results:
16 print(result)The 4 downloads happen “at once” (while one waits, another advances), so it takes ~1 second instead of ~4.
Synchronization with Lock: The Bathroom Latch
Here comes a problem. If two threads try to modify the same variable at the same time, they can step on each other. It’s like if two people tried to write on the same paper at the same time: the result would be a mess.
The solution is a Lock. It works like a bathroom latch: only one person can enter at a time. Others wait their turn.
1import threading
2
3counter = 0
4lock = threading.Lock()
5
6def increment():
7 global counter
8 for _ in range(100000):
9 with lock: # "I lock the latch" - only I can touch the counter
10 counter += 1
11 # Here "I unlock the latch" - the next can enter
12
13threads = [threading.Thread(target=increment) for _ in range(4)]
14for t in threads:
15 t.start()
16for t in threads:
17 t.join()
18
19print(f"Final counter: {counter}") # 400000 (correct!)Without the Lock, the counter might give an incorrect number (for example, 387432 instead of 400000). This is called a race condition and is one of the hardest bugs to detect.
Multiprocessing: Hire More Workers
If threading is a juggler, multiprocessing is hiring more employees. Each process is an independent worker with their own “kitchen” (memory, resources). They can work truly in parallel, without sharing the stove.
When to use multiprocessing? When your program does heavy calculations:
- Process thousands of images
- Calculate large prime numbers
- Train machine learning models
- Compress/decompress files
graph LR
subgraph Sequential["Sequential: 1 worker"]
S1["Task 1"] --> S2["Task 2"] --> S3["Task 3"] --> S4["Task 4"]
end
subgraph Parallel["Parallel: 4 workers"]
P1["Task 1"]
P2["Task 2"]
P3["Task 3"]
P4["Task 4"]
endBasic Processes
Each process has its own ID (PID), like each employee has their ID number:
1from multiprocessing import Process
2import os
3
4def task(name):
5 print(f"[{name}] PID: {os.getpid()}")
6 # CPU-intensive work goes here
7
8if __name__ == "__main__":
9 processes = []
10 for i in range(4):
11 p = Process(target=task, args=(f"Process-{i}",))
12 processes.append(p)
13 p.start()
14
15 for p in processes:
16 p.join()The if __name__ == "__main__": is mandatory in multiprocessing. Without it, each process would try to create more processes, creating an infinite loop. It’s like telling your employees: “Only the boss can hire new people”.
Process Pool: The Calculation Team
Just like with threading, you don’t want to create processes manually for hundreds of tasks. Pool gives you a team of ready workers:
1from multiprocessing import Pool
2import time
3
4def calculate_square(n):
5 time.sleep(0.1) # Simulate heavy calculation
6 return n ** 2
7
8if __name__ == "__main__":
9 numbers = list(range(20))
10
11 # Sequential: one worker does everything
12 start = time.time()
13 results_seq = [calculate_square(n) for n in numbers]
14 print(f"Sequential: {time.time() - start:.2f}s") # ~2 seconds
15
16 # Parallel: 4 workers share the task
17 start = time.time()
18 with Pool(4) as pool:
19 results_par = pool.map(calculate_square, numbers)
20 print(f"Parallel: {time.time() - start:.2f}s") # ~0.5 secondsProcessPoolExecutor: The Modern Interface
ProcessPoolExecutor works the same as ThreadPoolExecutor, making it easy to switch between threads and processes:
1from concurrent.futures import ProcessPoolExecutor
2import math
3
4def is_prime(n):
5 if n < 2:
6 return False
7 for i in range(2, int(math.sqrt(n)) + 1):
8 if n % i == 0:
9 return False
10 return True
11
12if __name__ == "__main__":
13 numbers = range(100000, 100020)
14
15 with ProcessPoolExecutor(max_workers=4) as executor:
16 results = list(executor.map(is_prime, numbers))
17
18 primes = [n for n, is_p in zip(numbers, results) if is_p]
19 print(f"Primes found: {primes}")A good rule is to use as many workers as your CPU has cores. You can get this number with os.cpu_count(). Using more workers than cores doesn’t speed anything up, it just adds overhead.
Asyncio: The Efficient Waiter
Asyncio is like a super efficient waiter in a restaurant. They don’t stand around waiting for the kitchen to prepare the dish. As soon as they place an order, they go to another table to take notes. Then serve drinks, bill another table, and when the dish is ready, they pick it up and serve it. A single waiter can serve many tables because they never wait doing nothing.
graph LR
A["Take order Table 1"] --> B["Take order Table 2"]
B --> C["Serve drink Table 3"]
C --> D["Pick up dish Table 1"]
D --> E["Serve dish Table 1"]When to use asyncio? When you have many network operations:
- Download hundreds of web pages
- Make requests to multiple APIs
- Handle many simultaneous connections (chat, websockets)
The Magic Words: async and await
async: “This function can pause and continue later”await: “This is going to take a while, so while I wait, do something else”
1import asyncio
2
3async def task(name, duration):
4 print(f"[{name}] Starting...")
5 await asyncio.sleep(duration) # "I wait, but don't block"
6 print(f"[{name}] Completed!")
7 return name
8
9async def main():
10 # gather() executes all tasks "at once"
11 results = await asyncio.gather(
12 task("A", 2),
13 task("B", 1),
14 task("C", 3)
15 )
16 print(f"Results: {results}")
17
18# Entry point for async code
19asyncio.run(main())The three tasks take 2, 1, and 3 seconds. How long does the program take? Only 3 seconds (not 6), because while one waits, the others progress.
Asynchronous Downloads with aiohttp
For real downloads you need aiohttp (the async version of requests):
1import asyncio
2import aiohttp # pip install aiohttp
3
4async def download_url(session, url):
5 async with session.get(url) as response:
6 return await response.text()
7
8async def download_all(urls):
9 async with aiohttp.ClientSession() as session:
10 tasks = [download_url(session, url) for url in urls]
11 results = await asyncio.gather(*tasks)
12 return results
13
14urls = [
15 "https://httpbin.org/delay/1",
16 "https://httpbin.org/delay/2",
17 "https://httpbin.org/delay/1"
18]
19
20# Takes ~2 seconds instead of ~4
21results = asyncio.run(download_all(urls))Semaphores: Don’t Crash the Server
If you download 1000 URLs at once, you might crash the server (or get blocked). A semaphore is like a maximum capacity sign: it only allows N simultaneous tasks.
1import asyncio
2
3async def limited_task(semaphore, name):
4 async with semaphore: # "I wait my turn if it's crowded"
5 print(f"[{name}] Executing...")
6 await asyncio.sleep(1)
7 print(f"[{name}] Completed")
8
9async def main():
10 semaphore = asyncio.Semaphore(3) # Maximum 3 at once
11
12 tasks = [
13 limited_task(semaphore, f"Task-{i}")
14 for i in range(10)
15 ]
16
17 await asyncio.gather(*tasks)
18
19asyncio.run(main())The 10 tasks execute, but never more than 3 simultaneously. It’s like a fitting room: even if there’s a queue, only 3 people can be trying on clothes at once.
When to Use Each Approach
| Situation | Solution | Real example |
|---|---|---|
| Download files | threading or asyncio | Download 50 images from internet |
| Call APIs | asyncio + aiohttp | Query prices at 10 stores |
| Read/write files | threading | Process 100 logs simultaneously |
| Math calculations | multiprocessing | Calculate primes up to 10 million |
| Process images | multiprocessing.Pool | Resize 1000 photos |
| Web server | asyncio | Handle thousands of connections |
Quick Comparison
| Aspect | Threading | Multiprocessing | Asyncio |
|---|---|---|---|
| Memory | Shared | Separate | Shared |
| Complexity | Medium | High | High |
| Best for | I/O, few tasks | CPU, calculations | I/O, many tasks |
| Limitation | GIL | Process overhead | Only async code |
Common Mistakes
-
Forgetting
if __name__ == "__main__":in multiprocessing- Your program will enter an infinite loop creating processes
-
Not using Lock when sharing data between threads
- Race conditions: incorrect and unpredictable results
-
Using threading for heavy calculations
- The GIL prevents it from being faster. Use multiprocessing
-
Mixing synchronous code in asyncio
time.sleep()blocks everything. Useawait asyncio.sleep()requests.get()blocks. Useaiohttp
-
Creating too many processes/threads
- More processes than cores = overhead without benefit
- Thousands of threads = excessive memory consumption
Quick Decision Guide
Don’t know what to use? Follow this diagram:
graph TD
A["What does your code do?"] --> B{"Waits a lot?<br/>(network, files, APIs)"}
B -->|Yes| C{"How many tasks?"}
B -->|No, calculates| D["multiprocessing"]
C -->|Few| E["threading"]
C -->|Many| F["asyncio"]
80% of cases are solved like this:
- Downloads/APIs:
asynciowithaiohttp - Data processing:
multiprocessing.Pool - Simple parallel tasks:
ThreadPoolExecutor
Practical Exercises
Create a function that downloads multiple URLs in parallel using ThreadPoolExecutor.
Use multiprocessing to find all primes up to N in parallel.