退回到 Python 2.2,生成器第一次在PEP 255中提出(那时也把它成为迭代器,因为它实现了迭代器协议)。主要是受到Icon编程语言的启发,生成器允许创建一个在计算下一个值时不会浪费内存空间的迭代器。例如你想要自己实现一个range() 函数,你可以用立即计算的方式创建一个整数列表:
def eager_range(up_to):
"""Create a list of integers, from 0 to up_to, exclusive."""
sequence = []
index = 0
while index < up_to:
sequence.append(index)
index += 1
return sequence
然而这里存在的问题是,如果你想创建从0到1,000,000这样一个很大的序列,你不得不创建能容纳1,000,000个整数的列表。但是当加入了生成器之后,你可以不用创建完整的序列,你只需要能够每次保存一个整数的内存即可。
def lazy_range(up_to):
"""Generator to return the sequence of integers from 0 to up_to, exclusive."""
index = 0
while index < up_to:
yield index
index += 1
你有可能已经发现,生成器完全就是关于迭代器的。有一种更好的方式生成迭代器当然很好(尤其是当你可以给一个生成器对象添加 __iter__() 方法时),但是人们知道,如果可以利用生成器“暂停”的部分,添加“将东西发送回生成器”的功能,那么 Python 突然就有了协程的概念(当然这里的协程仅限于 Python 中的概念;Python 中真实的协程在后面才会讨论)。将东西发送回暂停了的生成器这一特性通过 PEP 342添加到了 Python 2.5。与其它特性一起,PEP 342 为生成器引入了 send() 方法。这让我们不仅可以暂停生成器,而且能够传递值到生成器暂停的地方。还是以我们的 range() 为例,你可以让序列向前或向后跳过几个值:
def jumping_range(up_to):
"""Generator for the sequence of integers from 0 to up_to, exclusive.
Sending a value into the generator will shift the sequence by that amount.
"""
index = 0
while index < up_to:
jump = yield index
if jump is None:
jump = 1
index += jump
if __name__ == '__main__':
iterator = jumping_range(5)
print(next(iterator))# 0
print(iterator.send(2))# 2
print(next(iterator))# 3
print(iterator.send(-1))# 2
for x in iterator:
print(x)# 3, 4
直到PEP 380 为 Python 3.3 添加了 yield from之前,生成器都没有变动。严格来说,这一特性让你能够从迭代器(生成器刚好也是迭代器)中返回任何值,从而可以干净利索的方式重构生成器。
def lazy_range(up_to):
"""Generator to return the sequence of integers from 0 to up_to, exclusive."""
index = 0
def gratuitous_refactor():
while index < up_to:
yield index
index += 1
yield from gratuitous_refactor()
yield from 通过让重构变得简单,也让你能够将生成器串联起来,使返回值可以在调用栈中上下浮动,而不需对编码进行过多改动。
def bottom():
# Returning the yield lets the value that goes up the call stack to come right back
# down.
return (yield 42)
def middle():
return (yield from bottom())
def top():
return (yield from middle())
# Get the generator.
gen = top()
value = next(gen)
print(value)# Prints '42'.
try:
value = gen.send(value * 2)
except StopIteration as exc:
value = exc.value
print(value)# Prints '84'.
# Borrowed from http://curio.readthedocs.org/en/latest/tutorial.html.
@asyncio.coroutine
def countdown(number, n):
while n > 0:
print('T-minus', n, '({})'.format(number))
yield from asyncio.sleep(1)
n -= 1
关于基于生成器的协程和async定义的协程之间的差异,我想说明的关键点是只有基于生成器的协程可以真正的暂停执行并强制性返回给事件循环。你可能不了解这些重要的细节,因为通常你调用的像是asyncio.sleep() function 这种事件循环相关的函数,由于事件循环实现他们自己的API,而这些函数会处理这些小的细节。对于我们绝大多数人来说,我们只会跟事件循环打交道,而不需要处理这些细节,因此可以只用async定义的协程。但是如果你和我一样好奇为什么不能在async定义的协程中使用asyncio.sleep(),那么这里的解释应该可以让你顿悟。
到这里你的大脑可能已经灌满了新的术语和概念,导致你想要从整体上把握所有这些东西是如何让你可以实现异步编程的稍微有些困难。为了帮助你让这一切更加具体化,这里有一个完整的(伪造的)异步编程的例子,将代码与事件循环及其相关的函数一一对应起来。这个例子里包含的几个协程,代表着火箭发射的倒计时,并且看起来是同时开始的。这是通过并发实现的异步编程;3个不同的协程将分别独立运行,并且都在同一个线程内完成。
import datetime
import heapq
import types
import time
class Task:
"""Represent how long a coroutine should before starting again.
Comparison operators are implemented for use by heapq. Two-item
tuples unfortunately don't work because when the datetime.datetime
instances are equal, comparison falls to the coroutine and they don't
implement comparison methods, triggering an exception.
Think of this as being like asyncio.Task/curio.Task.
"""
def run_until_complete(self):
# Start all the coroutines.
for coro in self._new:
wait_for = coro.send(None)
heapq.heappush(self._waiting, Task(wait_for, coro))
# Keep running until there is no more work to do.
while self._waiting:
now = datetime.datetime.now()
# Get the coroutine with the soonest resumption time.
task = heapq.heappop(self._waiting)
if now < task.waiting_until:
# We're ahead of schedule; wait until it's time to resume.
delta = task.waiting_until - now
time.sleep(delta.total_seconds())
now = datetime.datetime.now()
try:
# It's time to resume the coroutine.
wait_until = task.coro.send(now)
heapq.heappush(self._waiting, Task(wait_until, task.coro))
except StopIteration:
# The coroutine is done.
pass
@types.coroutine
def sleep(seconds):
"""Pause a coroutine for the specified number of seconds.
Think of this as being like asyncio.sleep()/curio.sleep().
"""
now = datetime.datetime.now()
wait_until = now + datetime.timedelta(seconds=seconds)
# Make all coroutines on the call stack pause; the need to use `yield`
# necessitates this be generator-based and not an async-based coroutine.
actual = yield wait_until
# Resume the execution stack, sending back how long we actually waited.
return actual - now
async def countdown(label, length, *, delay=0):
"""Countdown a launch for `length` seconds, waiting `delay` seconds.
This is what a user would typically write.
"""
print(label, 'waiting', delay, 'seconds before starting countdown')
delta = await sleep(delay)
print(label, 'starting after waiting', delta)
while length:
print(label, 'T-minus', length)
waited = await sleep(1)
length -= 1
print(label, 'lift-off!')
def main():
"""Start the event loop, counting down 3 separate launches.
This is what a user would typically write.
"""
loop = SleepingLoop(countdown('A', 5), countdown('B', 3, delay=2),
countdown('C', 4, delay=1))
start = datetime.datetime.now()
loop.run_until_complete()
print('Total elapsed time is', datetime.datetime.now() - start)