Python Yield Keyword

Python Yield Keyword

The yield keyword in Python is typically associated with the use of Generators. In this tutorial, let’s take a look at the yield keyword in Python and Generators as well. Generators are used in Python to complete tasks associated with processing large amounts of data and doing so in a memory-efficient way. The yield keyword inside of a for loop provides a nice elegant syntax for using generators in Python.


A Standard Function

First off, let’s look at a function that takes a list of numbers, and then returns the cube of each number. To make this work we first have a function defined named cubed_numbers(). It takes in a list of numbers and then cubes each number. As each number is cubed, it is added to a result list by using the append() method. Lastly, the result is returned.

def cubed_numbers(n):
    result = []
    for i in n:
        result.append(i ** 3)
    return result

Now we can make a call to the cubed_numbers() function, and it performs as we would expect. The numbers 1, 2, 3, 4, and 5 become 1, 8, 27, 64, and 125.

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(my_cubes)
[1, 8, 27, 64, 125]

Converting To A Generator

To change the cubed_numbers() function into a Generator producing function, we can make some changes. We remove the results[] list, as well as the return statement. Since we have no list, we can no longer make use of the append() method. Inside of the for loop, we have the first appearance of the yield keyword.

def cubed_numbers(n):
    for i in n:
        yield i ** 3

The result of calling this function is something different now. We don’t get a list of the results, we get a generator object.

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(my_cubes)
<generator object cubed_numbers at 0x000002C0736DAC80>

The reason for this is that generators don’t hold the entire result in memory, they instead yield one result at a time. So this generator is waiting for us to ask for the next result.


Introducing next()

Ok so the generator doesn’t output anything and it uses a small amount of memory. Great. Now how can I see a result? We can see what a generator calculates as a result by calling next().

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(next(my_cubes))
1

Hey, where are all my answers? The next() function only reaches into the generator and pulls out a single value. It then moves its pointer to the next available value but does not return it right away. If we call next() again, we should see the next result.

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(next(my_cubes))
print(next(my_cubes)) 
1
8

If we want to see all 5 results, we need to call next() five times like so.

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
1
8
27
64
125

StopIteration Error

If you try to call next() more times than there are values in the generator, you will get a StopIteration exception. This means that the entire generator contents have been exhausted and it is now out of values.

my_cubes = cubed_numbers([1, 2, 3, 4, 5])
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
print(next(my_cubes))
1
8
27
64
125
Traceback (most recent call last):
  File "C:\python\justhacking\howtoyield.py", line 12, in <module>
    print(next(my_cubes))
StopIteration

Generators With For Loops

The code above is not something you will see when actually using generators, especially since the idea of working with generators is to process large volumes of data without consuming large amounts of memory. The yield keyword is often used inside of a for loop. Let’s see this in the full context of all the code so far.

def cubed_numbers(n):
    for i in n:
        yield i ** 3


my_cubes = cubed_numbers([1, 2, 3, 4, 5])

for cube in my_cubes:
    print(cube)
1
8
27
64
125

By simply looping over the generator and using the yield keyword inside of the loop, Python is smart enough to get all values and stop short of exhaustion thus preventing a StopIteration error.


Generator Comprehension

We have seen how Python List Comprehensions work in a differnt tutorial and generators have a similar feature. The difference is that instead of using the surrounding [ ] characters, you can use the surrounding ( ) characters as we see below.

my_cubes = (i ** 3 for i in [1, 2, 3, 4, 5])

for cube in my_cubes:
    print(cube)
1
8
27
64
125

Generator Performance

We can demonstrate the performance of python yield vs return by setting up two different functions to cube Five Million integers. That is a fairly big number and by using Pythons time.perf_counter() and memory_profiler.memory_usage() functions, we can determine both how much memory it takes to cube 5 million integers and how much time it takes to cube 5 million integers using each approach. The first function is called cubed_list() and it uses a standard for loop in combination with an empty list to calculate the cube of each integer one at a time, and then append it to the list. Once all integers are cubed, the result is returned. The second function is named cubed_generator() and instead of appending each calculation we just use the yield keyword in Python.

The List Performance

import memory_profiler as mem_profile
import random
import time

mem_before = mem_profile.memory_usage()[0]
print(f'Before calling the function, Python is using {mem_before} MB of memory')


def cubed_list(n):
    result = []
    for i in range(n):
        result.append(i ** 3)
    return result


def cubed_generator(n):
    for i in range(n):
        yield i ** 3


time_start = time.perf_counter()
cubes = cubed_list(5000000)
time_end = time.perf_counter()
elapsed = time_end + time_start

mem_after = mem_profile.memory_usage()[0]
mem_usage = mem_after - mem_before

print(f'After calling the function, Python is using {mem_after} MB of memory')
print(f'It Took {elapsed} Seconds to cube 5,000,000 integers')
Before calling the function, Python is using 39.82421875 MB of memory
After calling the function, Python is using 310.109375 MB of memory
It Took 4.24566814 Seconds to cube 5,000,000 integers

We can see that memory usage spiked a good amount and it took 4 seconds to complete the task.

The Generator Performance

import memory_profiler as mem_profile
import random
import time

mem_before = mem_profile.memory_usage()[0]
print(f'Before calling the function, Python is using {mem_before} MB of memory')


def cubed_list(n):
    result = []
    for i in range(n):
        result.append(i ** 3)
    return result


def cubed_generator(n):
    for i in range(n):
        yield i ** 3


time_start = time.perf_counter()
cubes = cubed_generator(5000000)
time_end = time.perf_counter()
elapsed = time_end + time_start

mem_after = mem_profile.memory_usage()[0]
mem_usage = mem_after - mem_before

print(f'After calling the function, Python is using {mem_after} MB of memory')
print(f'It Took {elapsed} Seconds to cube 5,000,000 integers')
Before calling the function, Python is using 39.73046875 MB of memory
After calling the function, Python is using 39.7421875 MB of memory
It Took 2.166753844 Seconds to cube 5,000,000 integers

This time around memory usage barely moved and it only took 2 seconds to complete the task. As we can see, the Generator version using the yield keyword performs incredibly well with minimal impact on memory.

Learn More About The Python Yield Keyword

Python Yield Keyword Summary

The yield keyword and generators in Python provide a clean way to work with large datasets. They have a nice readable syntax and tend to be memory friendly along with high performing. In addition to the yield keyword itself, we also saw the shorthand comprehension-like syntax for creating a generator by using the ( ) characters.