Python Asyncio: From Simplicity to Complexity

Reading Note:

Python is a crappy language whose structure is based on whitespace characters (indentation), which handles copy-paste very poorly. Once whitespace is lost, it becomes very difficult to fix the code, especially since the code still works, but badly.

When converting this article to HTML, I realized that some indentations were lost in the code examples. If some snippets don't work for you, let me know and I'll fix them.

The Processor and Sequential Execution

To understand why asyncio exists and how it works, we must start from the beginning: your machine's processor. Very roughly speaking, a processor can only do one thing at a time (to simplify, I'm ignoring the case of multiprocessors). It executes one instruction, then the next, then the one after, in a strictly sequential order.

This physical reality may seem obvious, but it has profound implications on how your programs execute. When you write:

print("Instruction 1")
print("Instruction 2")
print("Instruction 3")

The processor literally executes these three calls one after another. There's no parallelism, no simultaneous execution. It's a linear sequence of operations.

We can make an analogy with a chef working alone in his kitchen. He can be very fast and very efficient, but he can physically only do one action at a time: cut vegetables, or light the fire, or stir the sauce. Never two actions simultaneously.

This fundamental constraint brings us to the notion of "execution thread" or "thread of execution". At the processor level, there exists only one thread: the one executing the current instruction. Everything else — multitasking, applications that seem to run in parallel, interfaces that remain responsive while a download is taking place — all of this is an illusion carefully orchestrated by the operating system.

This illusion relies on context switching. The operating system regularly interrupts the process currently executing, saves its state (registers, stack, program counter), then hands control to another process. This operation repeats thousands of times per second, giving the impression that several programs are executing simultaneously.

The OS maintains what's called the "execution context" of each process: all the information necessary to resume execution exactly where it had stopped. It's as if our chef had an assistant who, every second, took a photo of the kitchen's state, put everything away, set up the ingredients for another dish, then resumed cooking the first dish exactly in the photographed state.

This is where we understand why we shouldn't ignore the operating system when we want to understand asyncio. Python doesn't run in a vacuum: it executes on top of an OS that already manages multitasking, processes, threads, and all the complexity of running multiple programs on a single processor. Asyncio comes on top of this layer, with its own task management mechanisms.

This superposition of task management systems — the OS on one side, asyncio on the other — can create subtle and sometimes surprising interactions. Understanding that your asyncio code ultimately executes on a sequential processor, managed by an OS that does preemptive multitasking, is essential to avoid pitfalls and understand the limits of asynchronous programming.

The Problem of Blocking Operations

A blocking operation is an instruction that suspends program execution while waiting for an external event to occur. The most common examples are:

Disk I/O: we passively wait for the storage device to provide the data, which is thousands of times slower than RAM access, even with SSD
Network operations: we wait for the remote server to process our request and send back its response
Timeouts: we simply wait for time to pass (time.sleep())

Let's take a concrete example with Python that makes, among other things, a request to httpbin:

httpbin.org is a free HTTP service specifically designed for testing HTTP requests. It offers various useful endpoints for development: /delay/n to simulate latency, /json to receive JSON, /status/code to test error codes, etc. It's the perfect tool for examples because it allows faithful reproduction of real network behaviors without depending on unpredictable third-party services.

import time
import requests

def process_data():
	print("Starting processing")

	# Blocking operation: processor waits 2 seconds
	time.sleep(2)
	print("End of wait")

	# Blocking operation: HTTP request
	response = requests.get("https://httpbin.org/delay/1")
	print(f"Response received: {response.status_code}")

	# Blocking operation: file reading
	with open("/etc/passwd", "r") as f:
		content = f.read()
	print(f"File read: {len(content)} characters")

# Let's time the execution
start = time.time()
process_data()
end = time.time()
print(f"Total time: {end - start:.2f} seconds")

When executed, this program takes about 3 seconds (2s + 1s + disk reading time). During all this time, the processor remains largely idle.

To understand the magnitude of the resource waste problem, we must grasp the orders of magnitude of execution times:

CPU instruction: ~0.3 nanoseconds (3 GHz)
RAM memory access: ~100 nanoseconds
SSD disk access: ~0.1 milliseconds
Local network request: ~1 milliseconds
Internet request: ~50-200 milliseconds

A disk read therefore corresponds to about 300,000 CPU instructions. While an HTTP request executes, the processor could theoretically process 150 million instructions.

Beyond the purely technical aspect, there's an economic issue: whether you rent a cloud server by the hour or bought your hardware, the cost remains the same whether your CPU is used at 1% or 100%. Letting a processor wait passively when it could process other tasks (like web requests in parallel) represents direct financial waste.

The fundamental problem is that our traditional execution model (sequential) doesn't match the reality of modern applications. We have:

A fast processor that can process millions of instructions per second
Slow I/O operations that take milliseconds or seconds
A sequential model that forces the processor to wait passively

It's exactly like having an ultra-performing chef who, after putting a dish in the oven, stands planted in front of it waiting for it to cook instead of preparing other orders.

Types of Multitasking and Asynchronous Programming

The problem of blocking tasks can be solved in several fundamentally different ways. We can create multiple threads or processes to execute tasks in parallel, or adopt an approach where tasks yield control, voluntarily or not. Each solution has its own implications in terms of complexity, performance and robustness.

Contrary to what one might think, there isn't a single way to manage multitasking. The operating system and applications can adopt radically different strategies. But all must coexist.

Preemptive Multitasking: When the OS Takes Control

Preemptive multitasking is the operating mode that most developers know without even realizing it. In this model, it's the OS that decides when to interrupt a task to execute another. The processor has a hardware timer that regularly generates interruptions (typically every millisecond), and at each interruption, the OS takes back control, notably to give execution to another thread.

import threading
import time

def long_task(name):
	for i in range(5):
		print(f"Task {name}: step {i}")
		time.sleep(1)  # Work simulation

# Creating two threads
thread1 = threading.Thread(target=long_task, args=("A",))
thread2 = threading.Thread(target=long_task, args=("B",))

thread1.start()
thread2.start()

Here, even though each task does a time.sleep(1), the two threads execute in parallel because the OS can interrupt and resume them at will. This is preemptive multitasking: tasks don't have a say in when they're interrupted.

This example works and gives the illusion of parallelism, but we must understand that in reality, Python's GIL (Global Interpreter Lock) prevents parallel execution of pure Python code in multiple threads. The visible alternation here comes from the fact that time.sleep() delegates waiting to the underlying C layer, temporarily releasing the GIL and allowing the other thread to execute. This example should be seen as an illustration of the concept of preemptive multitasking in a logical world, rather than as a faithful demonstration of Python parallelism. We don't delve deeper into this point because it's not the subject of this article, but this nuance partly explains why asyncio represents an interesting alternative approach.

We can convince ourselves by replacing the time.sleep()-based wait with CPU-intensive code. This code should theoretically use 100% of each available CPU core and execute truly in parallel on a multi-core machine. But the GIL, through a system of locks, guarantees that only one thread can execute Python code at a time, artificially limiting execution to a single core:

import threading
import time

def long_task(name):
	start = time.time()
	for i in range(5):
		for j in range(30000000):
			pass
		step = time.time()
		print(f"Task {name}: step {i}, duration: {step - start}")
		start = step

# Creating two threads
thread1 = threading.Thread(target=long_task, args=("A",))
thread2 = threading.Thread(target=long_task, args=("B",))
thread3 = threading.Thread(target=long_task, args=("C",))

thread1.start()
# thread2.start()
# thread3.start()

The result clearly shows execution time differences proportional to the number of concurrent threads. Simply uncomment the threadX.start() lines to verify. Here are the results on my machine:

1 thread:

Task A: step 0, duration: 2.0162436962127686
Task A: step 1, duration: 2.0153567790985107
Task A: step 2, duration: 2.015307903289795
Task A: step 3, duration: 2.0152580738067627
Task A: step 4, duration: 2.0151071548461914

2 threads:

Task A: step 0, duration: 4.170198678970337
Task B: step 0, duration: 4.172067642211914
Task A: step 1, duration: 4.193373680114746
Task B: step 1, duration: 4.213354825973511
Task A: step 2, duration: 4.180951118469238
Task B: step 2, duration: 4.169952869415283
Task A: step 3, duration: 4.282079219818115
Task B: step 3, duration: 4.27399754524231
Task B: step 4, duration: 4.311825513839722
Task A: step 4, duration: 4.361750841140747

3 threads:

Task B: step 0, duration: 4.5046913623809814
Task A: step 0, duration: 7.235995769500732
Task C: step 0, duration: 7.3113625049591064
Task B: step 1, duration: 4.408862352371216
Task A: step 1, duration: 6.340449810028076
Task C: step 1, duration: 6.8771936893463135
Task B: step 2, duration: 6.176913261413574
Task C: step 2, duration: 4.872439384460449
Task B: step 3, duration: 4.644377708435059
Task A: step 2, duration: 7.488765239715576
Task C: step 3, duration: 6.49780011177063
Task A: step 3, duration: 5.1964192390441895
Task B: step 4, duration: 6.820744752883911
Task A: step 4, duration: 4.06696629524231
Task C: step 4, duration: 4.835111856460571

With 1 thread, each step takes ~2 seconds. With 2 threads, each step takes ~4 seconds (doubling of time). With 3 threads, times become even more erratic and long. This linear degradation shows that Python threads never truly execute in parallel for CPU-intensive code.

Under Linux, we can observe this difference by looking at context switch statistics. Each process has counters for voluntary and involuntary switches:

# Look at context switches of a process
$ cat /proc/<PID>/status | grep ctxt
voluntary_ctxt_switches:	1523
nonvoluntary_ctxt_switches: 892

The nonvoluntary_ctxt_switches represent forced interruptions by the OS - this is preemptive multitasking in action.

Cooperative Programming: Politeness First

On the opposite end, cooperative programming relies on a simple principle: each task voluntarily yields control when it has nothing more to do. There's no forced interruption. If a task decides to never yield control, it can monopolize the processor indefinitely.

This approach may seem fragile, but it presents considerable advantages. Since tasks can only be interrupted at points where they agree to yield control, the most pernicious race conditions cannot exist. No need for complex synchronization mechanisms such as mutexes.

A race condition occurs when multiple threads simultaneously access a shared resource, and the final result depends on the unpredictable execution order of threads. For example, if two threads increment a counter at the same time, the result can be incorrect because the "read-modify-write" operations can interleave.

In asyncio, race conditions by arbitrary interruption are avoided: simple operations like counter += 1 are atomic because no suspension can occur in the middle. However, as soon as an await separates a read from a write, a race condition becomes possible again.

A mutex (mutual exclusion) is a lock that allows only one thread to access a critical resource at a time. They introduce their own problems: deadlocks when two threads wait for each other, and contention when multiple threads compete for the same lock, degrading performance.

The voluntary_ctxt_switches in Linux statistics correspond to this model: the task explicitly requests to be suspended, generally because it's waiting for a resource.

It should be noted that within a Linux kernel, both switching modes coexist and are not mutually exclusive. The same process can undergo preemptive switches (when its time quantum expires) and cooperative switches (when it makes a blocking syscall). This coexistence explains why we find both counters in system statistics.

From System Level to Application Level

These two modes of multitasking at the system level have their equivalents in programming models. Preemptive multitasking corresponds to thread programming, where the OS can interrupt any thread at any time. Cooperative mode corresponds to asynchronous programming, where tasks explicitly yield control at waiting points.

This correspondence is not trivial. A well-designed asynchronous program harmonizes naturally with kernel mechanisms. Instead of suffering forced interruptions, it voluntarily yields control when waiting for resources (I/O, network, etc.). The OS will then register many more voluntary_ctxt_switches and many fewer nonvoluntary_ctxt_switches.

This cooperation between the program and the OS significantly optimizes performance. Voluntary switches are less costly because they occur at predictable moments, allowing the kernel to better optimize resource management. The program avoids wasting unnecessary timer interruptions and reduces contention on the kernel scheduler.

The Restaurant Server Analogy

To intuitively understand the difference, imagine a server in a restaurant who must manage several tables.

In the preemptive model, the server would have an authoritarian chef who shouts at him every 30 seconds: "Change tables!" Regardless of whether he's taking an order or explaining the menu, he must immediately abandon his current table and move to the next one. This system works, but it's chaotic and inefficient.

In the cooperative model, the server manages his own schedule. He takes the order at table 1, goes to transmit it to the kitchen, and while the cooks are preparing (a "blocking" operation), he naturally goes to see table 2. When he returns from table 2, he can check if the dishes from table 1 are ready. If they're not, he can serve table 3. The server never remains idle waiting for a single table.

This analogy reveals the essence of asynchronous programming: optimizing waiting times by doing something else. The server (our event loop) coordinates multiple tasks, but there is only one server (a single execution thread).

Advantages of Cooperation

Why choose cooperation over preemption and threads? The advantages are substantial:

No race conditions: Since a task can only be interrupted at points where it agrees to yield control, there is no corruption of shared data.

No locks: More predictability and freedom from classic lock problems like deadlocks and contention. The code escapes the complexities of mutexes and semaphores.

Performance: Context switches are less costly because they are planned and correspond to moments when the program naturally waits for a resource.

Scalability: You can manage thousands of concurrent tasks with a single thread, whereas system threads are limited by memory and kernel resources.

But this approach also has its pitfalls. A single poorly designed task that never yields control can block the entire system. This is why understanding the underlying mechanisms is crucial for writing robust asynchronous code.

The next step is to understand how the OS helps us implement this cooperation efficiently, notably through non-blocking polling mechanisms.

The OS and Polling Mechanisms

To understand how asyncio can function, you must first understand how the operating system manages input/output operations. This is where the fundamental mechanism that allows Python not to remain blocked while waiting for a file to load or a network request to complete is found.

Non-blocking System Calls

To solve this problem, Unix operating systems offer an alternative: non-blocking system calls. Instead of waiting indefinitely, these calls immediately return a result, even if the operation is not finished.

import socket
import errno

# Creating a TCP socket to communicate over the network
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Enabling non-blocking mode: guarantees that no
# call will be blocking
sock.setblocking(False)

try:
	sock.connect(('httpbin.org', 80))
except Exception as e:
	print(e)  # [Errno 36] Operation now in progress

This code perfectly illustrates non-blocking behavior: sock.connect() returns immediately with an "Operation now in progress" error, which means the connection is being established in the background. The program doesn't wait for the connection to be established.

This is the principle of "I'll come back to check later" - our restaurant server analogy who doesn't stand planted in front of a table waiting for the customer to decide what to order. Here, the program launches the connection and can do something else while it's being established.

Polling: Monitoring Multiple Operations

The problem with non-blocking calls is that you have to constantly check if operations are ready. This is where polling comes in: a mechanism to efficiently monitor multiple I/O operations simultaneously.

The oldest and most widespread of these mechanisms on POSIX systems (Unix, Linux, BSD, macOS) is select(). It allows monitoring multiple file descriptors (sockets, files, pipes) and knowing which ones are ready for reading or writing. select() is very fast with a few hundred file descriptors, but its performance degrades beyond that.

import select
import socket

# Creating a socket and launching the connection
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setblocking(False)

# Connection attempt
try:
	sock.connect(('httpbin.org', 80))  #
	is_connected = True  # Immediate connection (rare)
except BlockingIOError:
	is_connected = False  # Connection in progress
except Exception as e:
	print(f"Connection error: {e}")
	sock.close()
	exit(1)

# If the connection is not immediate, use select()
# to wait
while not is_connected:
	# Monitor the socket for writing: it becomes ready
	# when the connection succeeds
	_, ready_to_write, error_sockets = select.select([],
		[sock], [sock], 5.0)

	if error_sockets:
		print("Connection error")
		break
	elif ready_to_write:
		# Check if the connection succeeded
		try:
			error = sock.getsockopt(socket.SOL_SOCKET,
				socket.SO_ERROR)
			if error:
				print(f"Connection failed: {error}")
				break
			else:
				print("Connection established!")
				is_connected = True
		except Exception as e:
			print(f"Error during verification: {e}")
			break
			else:
				print("Connection still in progress...")

sock.connect(('httpbin.org', 80)) silently performs a DNS resolution that is blocking. To simplify this article, we gloss over this complexity, but in a real asynchronous application, DNS resolution must also be non-blocking.

The select() call blocks until at least one of the monitored file descriptors becomes ready, or until the timeout expires. This is different from blocking on a single operation: here, we block waiting for any of the monitored operations to become ready.

The contrast is striking: replacing a simple sock.connect() with its non-blocking version multiplies the code complexity by ten. This explosion of complexity for an efficiency gain perfectly illustrates asyncio's dilemma: performance versus simplicity.

For the needs of this article, we focus on select() which perfectly illustrates the polling principle. Other more modern mechanisms exist (epoll on Linux, kqueue on BSD), but the principle remains identical.

Extension of the Restaurant Analogy

To return to our restaurant server: imagine he has a "beeper" system - each table has a small device that sounds when it needs something. Instead of constantly going around all the tables, the server can sit and wait for one of the beepers to sound. As soon as a beeper goes off, he knows exactly which table demands his attention.

select() works exactly like this beeper system: it allows the program to efficiently wait for one of the monitored operations to become ready, without having to check them one by one in a loop.

These low-level polling mechanisms are the technical foundation on which modern asynchronous programming rests. They allow a single thread to efficiently monitor hundreds of simultaneous I/O operations, only dealing with those that are actually ready to progress.

Python Generators: Introduction to `yield`

Before diving into the subtleties of asyncio, you must understand a fundamental Python mechanism that constitutes its technical base: generators. Without this understanding, asyncio's functioning will be very obscure. With it, everything becomes clear.

What is a Generator?

A generator in Python is a function that can suspend its execution and resume it later, exactly where it left off. This suspension/resumption capability is at the heart of asyncio's functioning.

Let's take a simple example:

def count():
	print("Start of generator")
	yield 1
	print("After the first yield")
	yield 2
	print("After the second yield")
	yield 3
	print("End of generator")

# Create the generator (doesn't execute it yet)
gen = count()
print(f"Generator type: {type(gen)}")

Generator type: <class 'generator'>

Already, we observe something unusual: calling count() doesn't trigger the function's execution. Python detects the presence of the yield keyword and automatically transforms the function into a generator. It's this detection that makes all the difference.

Yes, it's indeed the simple presence of the yield keyword that changes the fundamental nature of the function. No matter if this yield is in a never-executed branch or after a return - Python automatically transforms the function into a generator as soon as it performs syntactic analysis. This implicit transformation based on keyword detection may seem confusing for developers coming from other languages, and that's quite normal, even desirable.

Now, let's use the generator:

print("First next():")
value1 = next(gen)
print(f"Received value: {value1}")

print("\nSecond next():")
value2 = next(gen)
print(f"Received value: {value2}")

print("\nThird next():")
value3 = next(gen)
print(f"Received value: {value3}")

print("\nFourth next() (triggers StopIteration):")
try:
	next(gen)
except StopIteration as e:
	print(f"StopIteration raised: {e}")

First next():
Start of generator
Received value: 1

Second next():
After the first yield
Received value: 2

Third next():
After the second yield
Received value: 3

Fourth next() (triggers StopIteration):
End of generator
StopIteration raised:

Execution stops at each yield and resumes at exactly the same place during the next next(). All local variables, the execution stack state, everything is preserved between calls.

Here's a fundamental technical detail that few Python developers know, but which is important for understanding asyncio. Let's observe what happens when a generator uses return:

def generator_with_return():
	yield 1
	yield 2
	return "return value"
	yield 3  # This yield will never be reached

gen = generator_with_return()

print("First yield:", next(gen))
print("Second yield:", next(gen))

try:
	next(gen)  # Here, the return will be executed
except StopIteration as e:
	print(f"StopIteration with value: {e.value}")

First yield: 1
Second yield: 2
StopIteration with value: return value

So, when a generator executes return value, Python raises a StopIteration exception with this value stored in the exception's value attribute. This is how Python transmits the "return value" of a generator.

This exception mechanism is the very foundation on which asyncio relies to make coroutines communicate with the event loop.

Generators preserve their complete state between suspensions. Demonstration:

def counter_with_state():
	x = 0
	while x < 3:
		x += 1
		yield f"Counter at {x}"
		print(f"Resuming, x now equals {x}")

gen = counter_with_state()
print(next(gen))
print("--- Pause in execution ---")
print(next(gen))
print("--- Another pause ---")
print(next(gen))

Counter at 1
--- Pause in execution ---
Resuming, x now equals 1
Counter at 2
--- Another pause ---
Resuming, x now equals 2
Counter at 3

The variable x keeps its value between each suspension. Python preserves the generator's complete state: local variables, position in the code, execution stack.

The fundamental difference between a generator and a normal function is that the normal function executes from beginning to end and returns a value. The generator becomes an object that can be "awakened" and can resume its execution.

Connection to asynchronous programming: the suspension capability

This suspension and resumption capability is exactly what we need for asynchronous programming. Imagine that instead of yield, we have await for an I/O operation:

# Concept (not yet real asyncio)
def conceptual_task():
	print("Task start")
	# yield "I'm waiting for an I/O operation"
	print("Resume after I/O")
	# yield "I'm waiting for another operation"
	print("Task end")

The generator can suspend while an I/O operation is in progress, let the event loop handle other tasks, then resume its execution when the operation is completed.

This is exactly the principle of asyncio: coroutines are based on generators, and await works like a sophisticated yield that communicates with the event loop.

From generators to coroutines

Now that we understand generators and their fundamental suspension/resumption capability, we can address their evolution toward coroutines. This transition is not just a simple syntactic improvement: it represents a paradigm shift in how Python handles asynchronous programming.

The composition problem with simple generators

The generators we've seen so far have an important limitation: they can only produce values outward. To build complex asynchronous systems, we need to be able to compose multiple generators together, make them communicate, and delegate execution from one to another.

Consider this naive attempt at composition:

def simple_operation():
	print("Simple operation start")
	yield "simple result"
	print("Simple operation end")

def composed_operation():
	print("Composed operation start")

	# Naive delegation attempt
	gen = simple_operation()
	result = next(gen)  # Retrieve the result

	print(f"Result received: {result}")
	yield "composed result"
	print("Composed operation end")

gen = composed_operation()
print(next(gen))

Composed operation start
Simple operation start
Result received: simple result
composed result

This approach works for simple cases, but it has a major flaw: composed_operation() must intimately know how simple_operation() works. If simple_operation() produced multiple values, or if it had more complex suspension logic, the composition code would quickly become unmanageable.

The arrival of `yield from`: transparent delegation

Python 3.3 introduced yield from to solve exactly this problem. This construct allows a generator to completely delegate execution to another generator:

def simulated_io_operation():
	print("  ↳ I/O start")
	yield "in progress..."
	yield "50% progress"
	yield "90% progress"
	print("  ↳ I/O end")
	return "data loaded"

def complex_task():
	print("Complex task start")

	# Complete delegation with yield from
	result = yield from simulated_io_operation()

	print(f"Data processing: {result}")
	yield "processing completed"
	return "task accomplished"

# Execution
gen = complex_task()
try:
	while True:
		value = next(gen)
		print(f"Received: {value}")
		except StopIteration as e:
			print(f"Final value: {e.value}")

Complex task start
  ↳ I/O start
Received: in progress...
Received: 50% progress
Received: 90% progress
  ↳ I/O end
Data processing: data loaded
Received: processing completed
Final value: task accomplished

Let's observe what happens here. yield from simulated_io_operation() completely delegates execution to simulated_io_operation(). All values produced by the delegated generator bubble up directly to the caller, and when the delegated generator terminates (with return), its return value is assigned to result.

This delegation is transparent: the caller of complex_task() doesn't know that there's delegation happening. It receives values directly from simulated_io_operation().

Coroutines: bidirectional generators

Until now, our generators were unidirectional: they produced values outward. Python coroutines introduce bidirectionality: they can also receive values.

def bidirectional_coroutine():
	print("Coroutine started")

	# Receive a value from outside
	received_value = yield "ready to receive"
	print(f"Value received: {received_value}")

	# Do something with this value
	result = received_value * 2

	# Send back the result
	new_value = yield f"result: {result}"
	print(f"New value: {new_value}")

	return "finished"

# Using a bidirectional coroutine
coro = bidirectional_coroutine()

# First next() to start the coroutine
message = next(coro)
print(f"Initial message: {message}")

# Send a value with send()
try:
	response = coro.send(42)
	print(f"Response: {response}")

	# Send another value
	coro.send("final")
except StopIteration as e:
	print(f"Coroutine finished: {e.value}")

Initial message: ready to receive
Value received: 42
Response: result: 84
New value: final
Coroutine finished: finished

The send() method allows sending a value to the coroutine. This value becomes the result of the current yield expression. It's this bidirectionality that opens the door to asynchronous programming.

Historical evolution toward modern syntax

Asynchronous programming in Python evolved in two major steps. Python 3.4 introduced asyncio using existing generators with @asyncio.coroutine and yield from. Python 3.5 then introduced async def and await to improve readability and type safety:

import asyncio

# Old syntax (Python 3.4)
@asyncio.coroutine
def old_syntax():
	result = yield from asyncio.sleep(1)
	return "finished"

# Modern syntax (Python 3.5+)
async def modern_syntax():
	await asyncio.sleep(1)
	return "finished"

This transition is not just a cosmetic change. Native coroutines (async def) are a distinct object type from generators, with their own type checking and protocol.

Coroutines and generators: two syntaxes, same mechanics

While native coroutines are based on generators, Python treats them as distinct object types. Both use the same fundamental suspension/resumption mechanism, but with different syntaxes for different uses.

Generators are designed for iteration and producing sequences of values. Native coroutines are specialized for asynchronous programming, with stricter type checking and dedicated syntax (async/await) that makes code more readable and less error-prone.

This evolution from generators to native coroutines doesn't erase the fundamental mechanisms we've described. Instead, it encapsulates them in safer and more expressive syntax.

Native coroutines still use the same suspension/resumption principle as generators. They still communicate via mechanisms similar to yield and send(). And the underlying infrastructure still uses the same system polling mechanisms we've seen previously.

Python asyncio: modern syntax

Now that we understand generators and system polling mechanisms, we can reveal what really happens when you write async def and await. This modern syntax is just an elegant facade over the mechanisms we've just detailed.

Under the hood: coroutine transformation

Native coroutines implement the same protocol as generators, revealing their common nature:

import asyncio
import types

async def my_coroutine():
	await asyncio.sleep(0.1)
	return "finished"

# Create the coroutine
coro = my_coroutine()

# Check that it implements the generator protocol
print(f"Send method: {hasattr(coro, 'send')}")
print(f"Throw method: {hasattr(coro, 'throw')}")
print(f"Close method: {hasattr(coro, 'close')}")

# But Python distinguishes it from classic generators
print(f"Is a generator: {isinstance(coro, types.GeneratorType)}")
print(f"Is a coroutine: {isinstance(coro, types.CoroutineType)}")

coro.close()

The coroutine has the same methods (send, throw, close) as generators, because it uses the same internal suspension and resumption mechanisms.

The `await` mechanism: sophisticated delegation

When you write await expression, Python performs a series of operations that correspond exactly to yield from with additional checks:

import asyncio

# Demonstration of the __await__ protocol
async def async_operation():
	await asyncio.sleep(0.1)
	return "data"

async def detailed_coroutine():
	print("Before await")

	# What happens during an await:
	operation = async_operation()

	# 1. Python checks that the object is "awaitable"
	awaiter = operation.__await__()
	print(f"Awaiter type: {type(awaiter)}")

	# 2. Delegates to this awaiter (like yield from)
	try:
		awaiter.send(None)  # First send to start
	except StopIteration as e:
		result = e.value
		print(f"Result retrieved via StopIteration: {result}")

	operation.close()

asyncio.run(detailed_coroutine())

Before await
Awaiter type: <class 'coroutine'>
Result retrieved via StopIteration: data

The __await__() mechanism returns a generator (or generator-compatible object) that Python uses exactly like with yield from. The return value transits through the same StopIteration mechanism we've seen with generators.

The event loop: anatomy of a system coordinator

The event loop is an infinite loop that coordinates three fundamental components to orchestrate asynchronous execution: file descriptors ready for reading, file descriptors ready for writing, and timeouts for timed operations.

At the heart of the event loop lies a call to select() (the polling mechanism we've seen previously). The event loop maintains lists of file descriptors to monitor for different types of operations.

When you do await reader.read(), the coroutine suspends and communicates with the event loop via the underlying yield mechanism. This communication carries the necessary information: "I'm waiting for file descriptor X to be ready for reading". The event loop then adds this file descriptor to its monitoring list.

Similarly, await writer.write() signals "I'm waiting for file descriptor Y to be ready for writing". The select() call monitors all these file descriptors simultaneously and immediately returns those that are ready for the requested operation.

The event loop must also handle timed operations like asyncio.sleep(). It maintains a queue of tasks ordered by temporal deadline. To calculate the timeout to pass to select(), the event loop simply looks at the next scheduled task: if it should execute in 1.3 seconds, then select() will receive a timeout of 1.3 seconds maximum. If no task is scheduled, select() can wait indefinitely.

The event loop combines these elements in a simple but powerful loop. At each iteration, it calculates the timeout for the next timed task, then uses select() with this timeout to monitor I/O. When select() returns, either file descriptors are ready, or the timeout has elapsed. In the first case, the event loop wakes up coroutines whose I/O are available. In the second case, it wakes up coroutines whose time deadline has been reached. Finally, it executes all ready tasks by sending them None via the send() method to resume them exactly where they left off.

This loop elegantly solves the fundamental problem of asynchronous programming: never wait passively. Either I/O are ready (immediate return from select()), or a time deadline arrives (timeout from select()), or both. In all cases, the event loop will only wait if it has nothing to do.

Each await on an I/O operation ultimately translates to a coroutine suspension via the yield mechanism, registration of the file descriptor in the system select(), switching to other tasks during the wait, then waking up the coroutine when select() signals that the operation is ready, and resuming execution exactly where it left off.

This architecture explains why asyncio is more efficient than multithreaded code: a single thread can handle thousands of simultaneous connections, because it only coordinates the moments when each operation actually becomes ready to progress.

Execution timeline: anatomy of an `await`

To make these mechanisms concrete, let's follow the complete execution of this simple code:

async def simple():
	result = await asyncio.sleep(1.0)
	return "finished"

asyncio.run(simple())

Execution timeline:

Event loop startup: asyncio.run() creates an event loop and initializes its structures (self._ready = collections.deque() for ready tasks, self._scheduled = [] for timed tasks)
Adding the main coroutine: simple() is transformed into a Task and added to self._ready
Executing self._ready: the event loop finds simple() in self._ready and executes it via coroutine.send(None)
Encountering the await: the coroutine reaches await asyncio.sleep(1.0) and suspends
Communication with the event loop: the suspension transmits the information "wake me up in 1.0 second" via the underlying yield mechanism
Temporal scheduling: the event loop calculates the deadline (current timestamp + 1.0s) and adds the task to self._scheduled
Timeout calculation: self._ready empty, next deadline in 1.0s → timeout = 1.0s for select()
Waiting in select(): select([], [], [], 1.0) - no file descriptor to monitor, timeout at 1 second
Wake up after timeout: select() returns after 1 second (no I/O, timeout elapsed)
Checking deadlines: the event loop finds that the simple() task should be woken up now
Return to self._ready: the simple() task is moved from self._scheduled to self._ready
Execution resumption: coroutine.send(None) resumes simple() exactly after the await
End of coroutine: return "finished" raises StopIteration("finished")
Result retrieval: the event loop captures the exception and retrieves "finished" in exception.value
Final check: self._ready empty, self._scheduled empty, no file descriptor → event loop stops

This timeline reveals how a simple await asyncio.sleep(1.0) mobilizes all the technical arsenal we've described: generator mechanisms, communication by exceptions, timeout calculation, and system call select().

This syntactic elegance hides considerable technical complexity. When you write:

async def simple():
	result = await some_async_operation()
	return result

Python transforms this into a complex machine that transforms your function into a sophisticated generator, handles the __await__ protocol for delegation, communicates with the event loop via suspension/resumption mechanisms, uses polling syscalls to optimize waiting, manages exceptions through multiple layers of abstraction, and potentially coordinates thousands of concurrent tasks.

This transformation is not trivial. It explains why certain asyncio behaviors can seem surprising if one ignores the underlying mechanisms. It also explains why mixing synchronous and asynchronous code creates problems: the two worlds use fundamentally different execution models.

From apparent simplicity to real complexity

The async/await syntax gives the illusion of simplicity by masking a system of formidable technical complexity. But this accessibility has a price: it hides the reality of the underlying mechanisms.

Understanding that await is just a sophisticated yield from, that coroutines use the same mechanisms as generators, and that the event loop relies on system polling syscalls, allows one to go beyond superficial use of asyncio.

This understanding becomes necessary when debugging performance problems, handling complex exceptions, or designing robust asynchronous architectures. Modern syntax hides complexity well, but doesn't eliminate it.

Conclusion: Python, from simplicity to complexity

We've reached the end of our exploration of asyncio, and it's time to step back and reflect on what we've just discovered. This technical dive reveals a striking paradox: a language created for simplicity that now hides formidable technical complexity.

Python's origins: simplicity above all

Python wasn't designed as a high-performance production language. Guido van Rossum created it with a clear philosophy: "Computer Programming for Everybody". The goal was to make programming accessible to non-computer scientist researchers, children, scientists who needed to automate their calculations without becoming computer experts.

This vision still shows through today in the Zen of Python: "There should be one obvious way to do it", "Simple is better than complex", "Readability counts". Python was meant to be the language of rapid prototyping, experimentation, learning. A pedagogical tool before being a production tool.

This philosophy explains why Python was long perceived as "slow but simple". The GIL (Global Interpreter Lock) prevented true parallelism of Python code, but guaranteed simplicity of the execution model. No race conditions, no memory corruption, no synchronization complexity. A single execution thread, a simple mental model.

Mass adoption changes the game

But Python has experienced success that exceeded its creators' original intentions. For well-known economic reasons, prototypes invariably end up in production. A proof-of-concept developed quickly in Python to validate an idea gradually becomes a critical system that must be maintained, evolved, and above all scaled. This drift from prototyping to production is not specific to Python, but it hits this language full force due to its ease of learning and initial productivity.

The result is that Python now finds itself in production in contexts for which it was never designed: high-load web servers, real-time systems, critical applications requiring sustained performance. This mass adoption by default has created needs that Python was not initially equipped to satisfy.

Developers have demanded performance, parallelism, the ability to handle thousands of simultaneous connections. They wanted to keep Python's simplicity while rivaling the performance of languages like Go, Rust, or Node.js. This pressure has pushed the Python ecosystem toward increasingly sophisticated solutions.

Asyncio represents this tension perfectly. It allows Python to handle thousands of simultaneous connections without completely collapsing, which is already progress compared to threads, but at the cost of considerable technical complexity. Behind the elegant async/await syntax lies a system of formidable complexity.

The result: an iceberg of complexity

Our technical exploration reveals the extent of this hidden complexity. A simple await asyncio.sleep(1.0) mobilizes:

Function transformation into generators via automatic detection of yield
The __await__() protocol and transparent delegation
Communication by StopIteration exceptions to transport return values
Suspension/resumption mechanisms inherited from generators
Orchestration by the event loop with its queues of ready and scheduled tasks
System polling calls (select, epoll, kqueue) to optimize I/O waiting
Management of file descriptors and timeouts at the kernel level

This accumulation of abstraction layers transforms Python into an iceberg: a smooth and simple surface, but a technical depth dangerous for those who don't understand it. And for those who do understand it, it's equally dangerous because this smooth surface provides limited visibility into the mechanisms actually at work. This historical evolution is documented in PEP 3156 (Python 3.4, with @asyncio.coroutine and yield from) then PEP 492 (Python 3.5+, with async def and await).

Concrete example: the treachery of syntactic sugar and the ecosystem

Let's take this example that perfectly illustrates the multiple traps of asyncio:

async def download_data():
	try:
		data = await http_client.get("https://api.example.com/data")
		return data.json()
	except:
		return None  # Danger!

This code seems innocent, but the bare except: can mask critical exceptions like asyncio.CancelledError, KeyboardInterrupt, or SystemExit. These exceptions must bubble up to allow proper program shutdown. But this subtlety is only understandable if one understands the underlying mechanisms of asyncio.

The trap is that the async/await syntax gives the illusion that one is writing classic synchronous code. In reality, one is manipulating a complex system of coroutines, event loops, and inter-task communication. This illusion pushes developers to apply synchronous patterns to asynchronous code, creating subtle bugs.

But there's worse: the Python ecosystem relies heavily on third-party libraries, some of which are poorly documented or of dubious origin. These libraries can make blocking calls within an asynchronous system without the developer knowing. Python provides no detection or protection mechanism against these involuntary sabotages.

The typical scenario is revealing: you integrate an apparently innocent library 6 months ago. In development, with one or two simultaneous users, everything works correctly. Tests pass, blocking calls don't last long enough to be visible. The service goes into production. A few months later, traffic increases and you find yourself with 5000 concurrent users. Gradually, the service performs very poorly: timeouts, unexplained slowdowns, mysterious blockages.

After hours of debugging, you discover that a library makes a blocking DNS request secretly, or uses an undocumented synchronous system call. This single poorly designed library is enough to paralyze your entire asynchronous system, because it blocks the main event loop. And the worst part is that you don't know it - the code seems perfectly asynchronous on the surface.

The paradox of linguistic evolution

Asyncio illustrates a fascinating paradox of programming language evolution. To remain relevant in the face of new needs, Python had to enrich itself with sophisticated features. But each addition complicates the language and moves away from the original philosophy of simplicity.

This evolution is not unique to Python. JavaScript has experienced the same trajectory with the addition of async/await on top of Promises and callbacks. But JavaScript succeeded in its implementation by forcing this mode: once async/await was adopted, the entire ecosystem aligned with this model. Python, on the other hand, must coexist with a mixed ecosystem where synchronous and asynchronous code dangerously intermingle. Java has also added streams, lambdas, and Virtual Threads. Every high-level language faces the same dilemma: stay simple and become obsolete, or evolve and lose its original simplicity.

This complexification fundamentally changes the nature of the language. Python is no longer the simple language of the 1990s. It has become a rich and complex ecosystem that requires deep expertise to be mastered.

The price of evolution

Asyncio represents both the best and worst of Python's evolution. The best because it allows Python to remain relevant in a world where asynchronous performance is recognized and adopted. The worst because it betrays the original simplicity of the language.

This tension is not about to be resolved. Python will continue to evolve to meet the changing needs of developers. Each new feature will add its own complexity. The challenge for the Python community will be to maintain a balance between power and simplicity.

For developers, the lesson is clear: mastering modern Python requires going beyond surface syntax. One must understand the underlying mechanisms, accept hidden complexity, and develop the expertise necessary to navigate in this sophisticated ecosystem.

Asyncio is no longer the exception in modern Python: it's a representative example of the direction the language is taking. A language that keeps accessible syntax on the surface, but hides considerable technical depth. An iceberg whose beauty of the emerged part must not make us forget the dangers of the submerged part.

The lesson is brutal but clear: use Python for what it was designed for. Rapid prototyping, learning for children, occasional scientific calculations, script automation. But keep it away from critical production environments.

Modern Python has betrayed its original mission of simplicity. It has become a trap for developers: easy to learn, difficult to master, dangerous to deploy. Performance is dismal, hidden complexity is treacherous, and the production ecosystem is fragile.

For production, prefer languages designed for it: Go for simplicity AND relative performance, C, C++ or Rust for performance and control, or even Java which assumes its complexity rather than hiding it.