What’s the problem?
Python is quite good at iteration. Its consistent object interface means that for
loops with the for <var> in <container>:
syntax can be applied to just about anything: strings, dict
, list
s, set
s, and even classes you’ve created. You’ve also probably seen comprehension syntax, for example, l = [item for item in container]
to build a new list (l
) from an iterable (container
).
These are perfectly fine approaches, but can (sometimes) be improved. You can end up creating extra variables that you don’t actually need. Plus you can end up using a lot time or memory because you’re having to perform the iteration when creating the list, when actually the contents of the list aren’t needed until later.
How can it be solved?
We can use some of Python’s functional-programming like tools to more-cleverly think about and perform iteration. The map()
and filter()
functions operate on iterables, to map each values to a function, or filter values based on a function, respectively. We’ll also look at lambda
s: temporary single-line functions which are often used with map()
and filter()
, as well as other functions that accept functions as arguments.
Since we’ll be using lambda
s, quite a lot, let’s start with a quick intro/refresher.
lambdas
in Python
A lambda
is a temporary function that always returns something. You can use them anywhere normal functions are used, with some caveats which we’ll cover soon. To show how they work, we’ll start by using a normal function, then replace it with a lambda
.
The sorted()
function takes an iterable and returns a list containing the iterable’s members, sorted. Sorting on built in Python types (like str
or int
) basically Just Works™.
>>> sorted([5, 6, 1, 3, 2, 4])
[1, 2, 3, 4, 5, 6]
However, if you want to sort on types/classes you’ve created you’ll need to tell sorted()
how to retrieve the attribute of the value to sort on. For example, a list of Person
objects that you want to sort on first_name
. This is done by passing in a function to the key
argument to sorted()
. The key function accepts each member as an argument and then returns some value that the object can be sorted on.
Let’s look at the Person
example in detail.
This code uses data classes and type hinting. Check out my post on data classes and post on type hinting if you’re not familiar with them already.
First, here’s the Person
class definition.
from dataclasses import dataclass
@dataclass
class Person:
first_name: str
last_name: str
age: int
Next, we’ll define a get_first_name()
function that accepts a Person
and returns their first_name
. It’s pretty much as you would expect:
def get_first_name(p: Person) -> str:
return p.first_name
Next create a list of Person
s (purposely unsorted to prove sorting works).
people = [
Person("Ken", "Clark", 20),
Person("John", "Hardin", 73),
Person("Erin", "Taunton", 39),
Person("Charles", "Blevins", 21),
]
Finally we can create the sorted list and the sort it with sorted()
, remembering to pass in key=get_first_name
to tell it to use the get_first_name
function to find the sortable value of each member (Person
object).
sorted_people = sorted(people, key=get_first_name)
print(sorted_people)
Here’s the output:
[Person(first_name='Charles', last_name='Blevins'), Person(first_name='Erin', last_name='Taunton'), Person(first_name='John', last_name='Hardin'), Person(first_name='Ken', last_name='Clark')]
In this case,
key
is required to sortPerson
objects because Python doesn’t know how to compare them. It will actually raise an exception if called withoutkey
.
Now that you’ve seen it with a normal function, let’s look at lambda
s. They are defined in the form:
lamba <parameters>: <return value>
What you’re going to see now is something that you should never do, which is to assign a lambda
to a variable. This is something that the Python linter flake8
will reject. It’s only being done here to demonstrate how to build a lambda
by comparing it to a normal function.
With that warning out of the way, let’s see get_first_name()
as a lambda
.
get_first_name = lambda p: p.first_name
Again, never assign a lambda
to a variable! This is just done to relate it back to the normal implementation.
The parameter of the function is p
, and multiple parameters can be specified by separating them with commas, just like a normal function. Then, the return value comes after the :
. Notice there is no return
keyword. The return is implicit as the result of whatever statement is there. Since it’s a lambda
, there can be only one statement in the function body.
We don’t have to make any changes to the sorting code, since the lambda
was assigned to a variable that had the same name the original function, as well as accepting the same parameters and returning the same type.
Let’s finish up the lambda
intro by looking at the right way to pass the lambda
to sorted
: by defining it inline. With what we’ve seen so far this should make sense to you:
sorted_people = sorted(people, key=lambda p: p.first_name)
If you’re curious, type errors in lambda
s do get picked up by mypy
, for example if we were to try to access a missing attribute on p
then mypy
would warn us.
Now that you’ve got the basics of lambda
s, we’ll move on to actually using them. First by looking at map()
.
Calling a function on every item with map()
Quite often we want to perform a function on every item in an iterable container. The potential examples are endless, so here’s a basic one: uppercase every string in a list
. With list comprehension, you would do it like this:
strings = ["i", "Am", "a", "LiSt", "oF", "sTrIngs"]
uppercase_strings = [s.upper() for s in strings]
print(uppercase_strings)
The output is:
['I', 'AM', 'A', 'LIST', 'OF', 'STRINGS']
No surprises there.
We’ll return to this example soon, but first a bit about map()
. It takes two arguments, the first is a callable which gets passed each value in the container and returns some value. The callable can be a function, a method on an object or, yes, even a lambda
.
Let’s try to convert this example to use map()
. It might look something like this:
uppercase_strings = map(lambda s: s.upper(), strings)
print(uppercase_strings)
The output is:
<map object at 0x101204940>
Wait, what?
Your hexadecimal memory address may vary.
map()
actually returns a map
object which doesn’t actually do anything until it is iterated over. The advantage here is when iterating over large objects, the results can be processed one by one, rather than processing all into a new list
. With list comprehension, for example, you will effectively double the amount of memory you would be using, since you’re building a whole new list
with the transformed copies of the input.
If you do need the mapped values in a list``, then just call
list()on the
map` to process them all.
print(list(uppercase_strings))
['I', 'AM', 'A', 'LIST', 'OF', 'STRINGS']
You can also use list
as a shortcut to iterate and evaluate the map
, if you don’t care about the function’s return value. Just ignore the list
that’s created. Or, just iterate over the map()
with a for
loop:
for uc_string in map(lambda s: s.upper(), strings):
print(uc_string)
Next let’s look at filtering values in an iterable using filter()
.
Filtering an iterable’s values with filter()
filter()
takes the same arguments as map()
: a function and an iterable. The difference is, the function should return True
to keep the value or False
to skip it. The original iterable is unchanged.
We’re going to be using the terms truthy and falsy. Python considers
True
, non-empty strings, non-empty containers (list
s,dict
s,set
s etc), and many other, to be truthy, that is they behave the same asTrue
in anif
. Values likeFalse
,None
, empty strings and empty containers behave likeFalse
in anif
.
OK, that short diversion was just to let you know what your filtering function doesn’t necessarily have to return True
/False
, just something truthy or falsy.
Anyway, let’s see filter()
in action. Here’s a basic example: how to get the people whose first name is less than four characters long:
people = [
Person("Ken", "Clark", 20),
Person("John", "Hardin", 73),
Person("Erin", "Taunton", 39),
Person("Charles", "Blevins", 21),
]
short_names = filter(lambda p: len(p.first_name) < 4, people)
print(short_names)
The output is:
<filter object at 0x10f05a8e0>
Yes, similar to map()
, filter()
returns a filter
object which doesn’t do anything until it’s iterated over, either with a for
loop or by calling list()
on it.
print(list(short_names))
[Person(first_name='Ken', last_name='Clark')]
Of course map
s and filter
s can be chained together. Here’s filtering by short first names, then returning the first names uppercased.
short_names_uc = map(
lambda p: p.first_name.upper(),
filter(
lambda p: len(p.first_name) < 4, people
)
)
print(list(short_names_uc))
The filter()
call in inside the map()
call so it is executed first, and will only yield those Person
s matching the criteria len(p.first_name) < 4
. Then, each matching Person
(there’s only one) will be passed to the map()
function to have their first_name
uppercased and returned. Remember, the original people
list and each Person
remains unchanged.
Here’s the output:
['KEN']
Of course, you could achieve the same thing with list comprehension:
short_names_uc = [p.first_name.upper() for p in people if len(p.first_name) < 4]
So map()
and filter()
really only become advantageous to list comprehension if you’re chaining lots of them together, or if just-in-time execution is desired, or if you have a large dataset that you don’t wany to duplicate in memory.
We’ll finish this post with two final interesting functions, any()
and all()
, which make it useful to find information about truthy values in a list.
any()
and all()
It can be quite common to iterate over a list of items to see if all of them have some characteristic. Or, to iterate to see if at least one of them has a certain characteristic.
Returning again to our list of Person
s. Let’s pretend we run a restaurant that offers a discount if all people at a table are 30 or over. Using a normal for
loop we could figure it out like this:
all_over_30 = True
for person in people:
if person.age < 30:
all_over_30 = False
break
Here we use the all_over_30
variable to check if all patrons are over 30. If any are under, we set all_over_30
to False
and exit the loop. Since two of our people are under 30, all_over_30
is False
.
We can swap this with all()
– combined with a filter – to do the same thing in one line. all()
will return True
if all the values it iterates over are truthy. First we can map()
all the people
to a function that returns True
if they are 30 or over. Then, pass that to all()
.
all_over_30 = all(map(lambda p: p.age >= 30, people))
all_over_30
is still False
, but we did it in one line of code, and you could argue that this more declarative approach is easier to understand.
In a nutshell, declarative programming is when you say what you want instead of how to do things to figure out what you want. Kind of like saying "I want a peanut butter sandwich" instead of "Get bread, get knife, spread peanut butter on bread…", etc. The latter would be an example of imperative instructions.
In the
for
loop example, we’re listing the steps to execute to set a variable describing the result. In the use ofall()
we just say we want to know ifall
people are>= 30
. It’s not a perfect example but hopefully you get the idea.
Back to the restaurant – we have the Senior’s Sharing Special which allows someone over 70 to get a discount for their table. We’ll allow this to be applied if anyone at the table is aged 70 or older.
You could solve this problem with a for
loop:
any_over_70 = False
for person in people:
if person.age >= 70:
any_over_70 = True
break
In our case we have one Person
over 70 to any_over_70
is True
.
But here’s how to solve it with any()
, which returns True
if any of the values it iterates over a truthy. Again, map()
the people
to a lambda
, but this time, it should return True
if Person
is 70 or older.
any_over_70 = any(map(lambda p: p.age >= 70, people))
any_over_70
is True
again here as we have one Person
over 70. So the whole table gets the Special. Yay!
If you’re wondering, all()
and any()
will both stop iterating as soon as they "know" their result. So for all()
, as soon as it finds a falsy value it can stop and return False
as it knows that not all the values are truthy.
any()
will stop iterating as soon as it finds True
as it has satisfied the "any are true" requirement. In short, each will only perform the entire iteration if absolutely necessary.
Let’s take one final quick detour before finishing to talk about solving these problems with filter()
.
Solving any and all problems by filtering?
We could approach these problems by filtering with filter()
, and then counting the results. That is, we filter out anyone over 30 and look for an empty result. Or, we filter everyone over 70 and look for a non-empty result.
You might think about trying something like this:
all_over_30 = len(filter(lambda p: p.age < 30, people)) == 0
However that gives an error:
TypeError: object of type 'filter' has no len()
Since filter()
s (and map()
s) are executed just-in-time, over an iterable that might not have a known length, they too have no length (even if the input interable does have a length). Therefore to perform this comparison we have to do a list()
conversion:
all_over_30 = len(list(filter(lambda p: p.age < 30, people))) == 0
Which would work, but if your iterables have a lot of values, you would need to iterate over them all to build the list``, and then check its length. This would not allow you to short-circuit the evaluation and exit early once a criteria had been satisfied, as is the case when using
all()`.
And just for completion sake, here’s how to check if any people are over 70 using a filter()
instead of any()
:
any_over_70 = len(list(filter(lambda p: p.age > 70, people))) > 0
But once again, all the values must be filtered and converted to a list
before the list
length can be counted. We can’t short-circuit the evaluation as soon as one truthy value is found, like when using any()
.
Now one final note about performance benefits when writing code using map()
.
Parallel Processing
Another advantage of using functional programming tools like map()
is that the code can easily be refactoring to be parallelised, leading to significant performance gains on multi-core systems. With just a few modifications, the code can be made to run on multiple cores, leveraging all the available resources available to you. This can be especially useful when processing large datasets, as it can dramatically speed up the time required to complete operations. While it’s beyond the scope of this post, the multiprocessing
module is a good place to start: using Pool.map()
is a simple way to split "mappable" tasks across multiple processes.
Conclusion
Python offers a number of options to perform iteration in an efficient manner. You can use for
loops or comprehensions, but these methods may have drawbacks such as requiring extra variables or consuming excessive time and memory.
To address these limitations, Python offers functional programming-like tools such as map()
, filter()
, and lambda
functions, which provide more sophisticated approaches to iteration. By combining these tools with any()
and all()
, expressive and declarative programming can be achieved.
One response to “Effortless Iteration with Lambdas, Map and Filter”
Helpful and thorough ! Thanks !