Effortless Iteration with Lambdas, Map and Filter

Is this a loop?

What’s the problem?

Python is quite good at iteration. Its consistent object interface means that for loops with the for <var> in <container>: syntax can be applied to just about anything: strings, dict, lists, sets, and even classes you’ve created. You’ve also probably seen comprehension syntax, for example, l = [item for item in container] to build a new list (l) from an iterable (container).

These are perfectly fine approaches, but can (sometimes) be improved. You can end up creating extra variables that you don’t actually need. Plus you can end up using a lot time or memory because you’re having to perform the iteration when creating the list, when actually the contents of the list aren’t needed until later.

How can it be solved?

We can use some of Python’s functional-programming like tools to more-cleverly think about and perform iteration. The map() and filter() functions operate on iterables, to map each values to a function, or filter values based on a function, respectively. We’ll also look at lambdas: temporary single-line functions which are often used with map() and filter(), as well as other functions that accept functions as arguments.

Since we’ll be using lambdas, quite a lot, let’s start with a quick intro/refresher.

lambdas in Python

A lambda is a temporary function that always returns something. You can use them anywhere normal functions are used, with some caveats which we’ll cover soon. To show how they work, we’ll start by using a normal function, then replace it with a lambda.

The sorted() function takes an iterable and returns a list containing the iterable’s members, sorted. Sorting on built in Python types (like str or int) basically Just Works™.

>>> sorted([5, 6, 1, 3, 2, 4])
[1, 2, 3, 4, 5, 6]

However, if you want to sort on types/classes you’ve created you’ll need to tell sorted() how to retrieve the attribute of the value to sort on. For example, a list of Person objects that you want to sort on first_name. This is done by passing in a function to the key argument to sorted(). The key function accepts each member as an argument and then returns some value that the object can be sorted on.

Let’s look at the Person example in detail.

This code uses data classes and type hinting. Check out my post on data classes and post on type hinting if you’re not familiar with them already.

First, here’s the Person class definition.

from dataclasses import dataclass


@dataclass
class Person:
    first_name: str
    last_name: str
    age: int

Next, we’ll define a get_first_name() function that accepts a Person and returns their first_name. It’s pretty much as you would expect:

def get_first_name(p: Person) -> str:
    return p.first_name

Next create a list of Persons (purposely unsorted to prove sorting works).

people = [
    Person("Ken", "Clark", 20),
    Person("John", "Hardin", 73),
    Person("Erin", "Taunton", 39),
    Person("Charles", "Blevins", 21),
]

Finally we can create the sorted list and the sort it with sorted(), remembering to pass in key=get_first_name to tell it to use the get_first_name function to find the sortable value of each member (Person object).

sorted_people = sorted(people, key=get_first_name)
print(sorted_people)

Here’s the output:

[Person(first_name='Charles', last_name='Blevins'), Person(first_name='Erin', last_name='Taunton'), Person(first_name='John', last_name='Hardin'), Person(first_name='Ken', last_name='Clark')]

In this case, key is required to sort Person objects because Python doesn’t know how to compare them. It will actually raise an exception if called without key.

Now that you’ve seen it with a normal function, let’s look at lambdas. They are defined in the form:

lamba <parameters>: <return value>

What you’re going to see now is something that you should never do, which is to assign a lambda to a variable. This is something that the Python linter flake8 will reject. It’s only being done here to demonstrate how to build a lambda by comparing it to a normal function.

With that warning out of the way, let’s see get_first_name() as a lambda.

get_first_name = lambda p: p.first_name

Again, never assign a lambda to a variable! This is just done to relate it back to the normal implementation.

The parameter of the function is p, and multiple parameters can be specified by separating them with commas, just like a normal function. Then, the return value comes after the :. Notice there is no return keyword. The return is implicit as the result of whatever statement is there. Since it’s a lambda, there can be only one statement in the function body.

We don’t have to make any changes to the sorting code, since the lambda was assigned to a variable that had the same name the original function, as well as accepting the same parameters and returning the same type.

Let’s finish up the lambda intro by looking at the right way to pass the lambda to sorted: by defining it inline. With what we’ve seen so far this should make sense to you:

sorted_people = sorted(people, key=lambda p: p.first_name)

If you’re curious, type errors in lambdas do get picked up by mypy, for example if we were to try to access a missing attribute on p then mypy would warn us.

Now that you’ve got the basics of lambdas, we’ll move on to actually using them. First by looking at map().

Calling a function on every item with map()

Quite often we want to perform a function on every item in an iterable container. The potential examples are endless, so here’s a basic one: uppercase every string in a list. With list comprehension, you would do it like this:

strings = ["i", "Am", "a", "LiSt", "oF", "sTrIngs"]

uppercase_strings = [s.upper() for s in strings]
print(uppercase_strings)

The output is:

['I', 'AM', 'A', 'LIST', 'OF', 'STRINGS']

No surprises there.

We’ll return to this example soon, but first a bit about map(). It takes two arguments, the first is a callable which gets passed each value in the container and returns some value. The callable can be a function, a method on an object or, yes, even a lambda.

Let’s try to convert this example to use map(). It might look something like this:

uppercase_strings = map(lambda s: s.upper(), strings)
print(uppercase_strings)

The output is:

<map object at 0x101204940>

Wait, what?

Your hexadecimal memory address may vary.

map() actually returns a map object which doesn’t actually do anything until it is iterated over. The advantage here is when iterating over large objects, the results can be processed one by one, rather than processing all into a new list. With list comprehension, for example, you will effectively double the amount of memory you would be using, since you’re building a whole new list with the transformed copies of the input.

If you do need the mapped values in a list``, then just call list()on themap` to process them all.

print(list(uppercase_strings))
['I', 'AM', 'A', 'LIST', 'OF', 'STRINGS']

You can also use list as a shortcut to iterate and evaluate the map, if you don’t care about the function’s return value. Just ignore the list that’s created. Or, just iterate over the map() with a for loop:

for uc_string in map(lambda s: s.upper(), strings):
    print(uc_string)

Next let’s look at filtering values in an iterable using filter().

Filtering an iterable’s values with filter()

filter() takes the same arguments as map(): a function and an iterable. The difference is, the function should return True to keep the value or False to skip it. The original iterable is unchanged.

We’re going to be using the terms truthy and falsy. Python considers True, non-empty strings, non-empty containers (lists, dicts, sets etc), and many other, to be truthy, that is they behave the same as True in an if. Values like False, None, empty strings and empty containers behave like False in an if.

OK, that short diversion was just to let you know what your filtering function doesn’t necessarily have to return True/False, just something truthy or falsy.

Anyway, let’s see filter() in action. Here’s a basic example: how to get the people whose first name is less than four characters long:

people = [
    Person("Ken", "Clark", 20),
    Person("John", "Hardin", 73),
    Person("Erin", "Taunton", 39),
    Person("Charles", "Blevins", 21),
]

short_names = filter(lambda p: len(p.first_name) < 4, people)
print(short_names)

The output is:

<filter object at 0x10f05a8e0>

Yes, similar to map(), filter() returns a filter object which doesn’t do anything until it’s iterated over, either with a for loop or by calling list() on it.

print(list(short_names))
[Person(first_name='Ken', last_name='Clark')]

Of course maps and filters can be chained together. Here’s filtering by short first names, then returning the first names uppercased.

short_names_uc = map(
    lambda p: p.first_name.upper(),
    filter(
        lambda p: len(p.first_name) < 4, people
    )
)
print(list(short_names_uc))

The filter() call in inside the map() call so it is executed first, and will only yield those Persons matching the criteria len(p.first_name) < 4. Then, each matching Person (there’s only one) will be passed to the map() function to have their first_name uppercased and returned. Remember, the original people list and each Person remains unchanged.

Here’s the output:

['KEN']

Of course, you could achieve the same thing with list comprehension:

short_names_uc = [p.first_name.upper() for p in people if len(p.first_name) < 4]

So map() and filter() really only become advantageous to list comprehension if you’re chaining lots of them together, or if just-in-time execution is desired, or if you have a large dataset that you don’t wany to duplicate in memory.

We’ll finish this post with two final interesting functions, any() and all(), which make it useful to find information about truthy values in a list.

any() and all()

It can be quite common to iterate over a list of items to see if all of them have some characteristic. Or, to iterate to see if at least one of them has a certain characteristic.

Returning again to our list of Persons. Let’s pretend we run a restaurant that offers a discount if all people at a table are 30 or over. Using a normal for loop we could figure it out like this:

all_over_30 = True

for person in people:
    if person.age < 30:
        all_over_30 = False
        break

Here we use the all_over_30 variable to check if all patrons are over 30. If any are under, we set all_over_30 to False and exit the loop. Since two of our people are under 30, all_over_30 is False.

We can swap this with all() – combined with a filter – to do the same thing in one line. all() will return True if all the values it iterates over are truthy. First we can map() all the people to a function that returns True if they are 30 or over. Then, pass that to all().

all_over_30 = all(map(lambda p: p.age >= 30, people))

all_over_30 is still False, but we did it in one line of code, and you could argue that this more declarative approach is easier to understand.

In a nutshell, declarative programming is when you say what you want instead of how to do things to figure out what you want. Kind of like saying "I want a peanut butter sandwich" instead of "Get bread, get knife, spread peanut butter on bread…", etc. The latter would be an example of imperative instructions.

In the for loop example, we’re listing the steps to execute to set a variable describing the result. In the use of all() we just say we want to know if all people are >= 30. It’s not a perfect example but hopefully you get the idea.

Back to the restaurant – we have the Senior’s Sharing Special which allows someone over 70 to get a discount for their table. We’ll allow this to be applied if anyone at the table is aged 70 or older.

You could solve this problem with a for loop:

any_over_70 = False

for person in people:
    if person.age >= 70:
        any_over_70 = True
        break

In our case we have one Person over 70 to any_over_70 is True.

But here’s how to solve it with any(), which returns True if any of the values it iterates over a truthy. Again, map() the people to a lambda, but this time, it should return True if Person is 70 or older.

any_over_70 = any(map(lambda p: p.age >= 70, people))

any_over_70 is True again here as we have one Person over 70. So the whole table gets the Special. Yay!

If you’re wondering, all() and any() will both stop iterating as soon as they "know" their result. So for all(), as soon as it finds a falsy value it can stop and return False as it knows that not all the values are truthy.

any() will stop iterating as soon as it finds True as it has satisfied the "any are true" requirement. In short, each will only perform the entire iteration if absolutely necessary.

Let’s take one final quick detour before finishing to talk about solving these problems with filter().

Solving any and all problems by filtering?

We could approach these problems by filtering with filter(), and then counting the results. That is, we filter out anyone over 30 and look for an empty result. Or, we filter everyone over 70 and look for a non-empty result.

You might think about trying something like this:

all_over_30 = len(filter(lambda p: p.age < 30, people)) == 0

However that gives an error:

TypeError: object of type 'filter' has no len()

Since filter()s (and map()s) are executed just-in-time, over an iterable that might not have a known length, they too have no length (even if the input interable does have a length). Therefore to perform this comparison we have to do a list() conversion:

all_over_30 = len(list(filter(lambda p: p.age < 30, people))) == 0

Which would work, but if your iterables have a lot of values, you would need to iterate over them all to build the list``, and then check its length. This would not allow you to short-circuit the evaluation and exit early once a criteria had been satisfied, as is the case when using all()`.

And just for completion sake, here’s how to check if any people are over 70 using a filter() instead of any():

any_over_70 = len(list(filter(lambda p: p.age > 70, people))) > 0

But once again, all the values must be filtered and converted to a list before the list length can be counted. We can’t short-circuit the evaluation as soon as one truthy value is found, like when using any().

Now one final note about performance benefits when writing code using map().

Parallel Processing

Another advantage of using functional programming tools like map() is that the code can easily be refactoring to be parallelised, leading to significant performance gains on multi-core systems. With just a few modifications, the code can be made to run on multiple cores, leveraging all the available resources available to you. This can be especially useful when processing large datasets, as it can dramatically speed up the time required to complete operations. While it’s beyond the scope of this post, the multiprocessing module is a good place to start: using Pool.map() is a simple way to split "mappable" tasks across multiple processes.

Conclusion

Python offers a number of options to perform iteration in an efficient manner. You can use for loops or comprehensions, but these methods may have drawbacks such as requiring extra variables or consuming excessive time and memory.

To address these limitations, Python offers functional programming-like tools such as map(), filter(), and lambda functions, which provide more sophisticated approaches to iteration. By combining these tools with any() and all(), expressive and declarative programming can be achieved.


One response to “Effortless Iteration with Lambdas, Map and Filter”

Leave a Reply

Your email address will not be published. Required fields are marked *