What’s the problem?
You have different but related classes that only differ in some parts of their implementation, and you want to use them interchangeably. You can solve this by setting up a class hierarchy with inheritance. We would call this polymorphism, where different classes can be used in places because they have the same interface (set of methods).
However this can have some drawbacks. For example, how does a developer know which methods on a class need to be overridden and which can be kept as is? And even worse, what if one of the sub classes has a mistake and "overrides" a base method that doesn’t exist?
How can it be solved?
By using abstract classes, which can explicitly mark methods that need to be implemented by subclasses. Abstract classes with abstract methods can’t be instantiated, which means you’ll get an error instead of using the wrong implementation. Unimplemented methods will be detected at runtime, but can also be found by mypy
(check out my post on type hinting for more information about mypy
), which means you can detect a whole class of problems before they hit production.
Now that we have a high level overview of the fix, let’s first dive deeper into how problems can manifest when not using abstract classes. We’ll look at a simple class hierarchy in a program that uses "normal" inheritance.
Inheritance without abstract classes
Let’s look at a simple example you might use a class hierarchy for: to represents different shapes that can calculate their own area. This starts with a Shape
base class which defines a get_area()
method. This method on the base class doesn’t do anything though, it just raises a NotImplementedError
.
class Shape:
def get_area(self) -> float:
raise NotImplementedError()
NotImplementedError
is a built in Python exception. This could be used to emulate abstract classes in older versions of Python, but with the introduction of abstract base classes it’s redundant, at least for this purpose.
Then we an define classes to actually implement the area calculation. First, there’s a Square
class. Hopefully you remember from geometry that the area of a square is its side length, squared. Here it is:
class Square(Shape):
def __init__(self, length: float) -> None:
self.length = length
def get_area(self) -> float:
return self.length ** 2
Using it is pretty easy:
s = Square(5)
print(s.get_area())
Which outputs 25
.
Next, a Circle
class. The areas of a circle is π * r^2 (pi times the radius squared). Note that there is an intentional mistake in this class:
from math import pi
class Circle(Shape):
def __init__(self, radius: float) -> None:
self.radius = radius
def area(self) -> float:
return pi * self.radius ** 2
Notice that class defines an area()
method instead of get_area()
. A mistake that could be quite easy to make. This mistake is not picked up by mypy
:
c = Circle(5)
print(c.get_area())
$ mypy non_abstract.py
Success: no issues found in 1 source file
It makes sense that mypy
doesn’t find this mistake, since the Circle.get_area()
method does exist, on the base Shape
class. mypy
just doesn’t know that all it does is raise an exception. Which of course, only shows up at runtime:
$ python non_abstract.py
Traceback (most recent call last):
File "/Users/ben/non_abstract.py", line 34, in <module>
main()
File "/Users/ben/non_abstract.py", line 30, in main
print(c.get_area())
File "/Users/ben/non_abstract.py", line 6, in get_area
raise NotImplementedError()
Which could happen in production!
Before looking at how abstract classes can solve this problem, let’s first talk about why we want to have the shared base classes at all: polymorphism.
Shared base classes and polymorphism
Polymorphism is a term that describes different types (classes) having the same interface (methods). This means that the classes can safely be used interchangeably which can cut down on code, including the code to do type checking.
Now that we have some Shape
classes we want to be able to use them to calculate the total area of all the shapes, perhaps to know how much material we would need to cut them out.
Some code to work out the total areas of a list of shapes might be implemented a simple for
loop with a running total, like this:
def get_total_area(shapes: typing.List[Shape]) -> float:
total_area = 0
for shape in shapes:
total_area += shape.get_area()
return total_area
We have specified that the shapes
list only has Shape
objects in it. Since the Shape
object specifies a get_area()
method, we know (and type checkers and IDEs also know) that it’s safe to call get_area()
on each object.
We could define some shape classes without using a Shape
base class if we wanted:
# some methods trimmed for brevity
class Square:
def get_area(self) -> float:
return self.length ** 2
class Circle:
def get_area(self) -> float:
return pi * self.radius ** 2
# assume these classes also implement `get_area()`
class Triangle:
# snip
class Rectangle:
# snip
class Trapezoid:
# snip
But then for proper type checking, our type hints start to get out of hand:
def get_total_area(
shapes: typing.List[
typing.Union[
Square,
Circle,
Triangle,
Rectangle,
Trapezoid
]
]
) -> float:
total_area = 0
for shape in shapes:
total_area += shape.get_area()
return total_area
And of course, this list of shapes only gets longer as you add more. Plus, you have to repeat this type definition every time a list of "shapes" is used. It’s a maintenance nightmare.
Hopefully now you can see the benefits of base classes to support polymorphism. Let’s get back to abstract classes in Python. Next we’ll see how to define abstract classes using the abc
module.
Easy as abc
OK, I’m sure every single person who’s written about abstract base classes in Python has made that joke… But the abc
module really is that simple.
To mark a class as abstract, just have it inherit from the abc.ABC
class. Here’s our update to the Shape
base class to make it abstract.
from abc import ABC
class Shape(ABC):
def get_area(self) -> float:
raise NotImplementedError()
In some programming languages, abstract classes can’t be instantiated. Not so in Python! With the above implementation, Shape
can be instantiated just fine, mypy
raises no errors, and we’ll still get NotImplementedError
at runtime.
How do we get enforce "proper" abstract classes? We need to make use of the method decorator abc.abstractmethod
. Under the hood, ABC
has the checks in place to prevent instantiation, but they’re only applied when decorated methods are found on the class.
We’ll update the get_area()
method to have this decorator. At the same time, since this will stop the class from being able to be instantiated, raising NotImplementedError
is no longer necessary. We can replace the method body with just pass
.
from abc import ABC, abstractmethod
class Shape(ABC):
@abstractmethod
def get_area(self) -> float:
pass
Now look at what happens when Shape
tries to get instantiated:
>>> s = Shape()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Shape with abstract method get_area
This runtime check may be more useful than raising NotImplementedError
, as it’s raised as soon as the class is instantiated instead of when the method is used, so it’s likely to be picked up earlier. However, this error will be found even earlier by using mypy
.
It should be noted here, that not all methods on an
ABC
must be abstract, there can be non-abstract methods too, which can call even call abstract methods. We’ll return to this concept with an example a bit later.
Let’s just quickly revisit the Square
and Circle
classes – note that since they both inherit from Shape
, and that has been updated to be abstract, no changes have been made to these classes.
Quick aside: what’s the opposite of abstract? In programming terms, it’s concrete. That is,
Shape
is an abstract class, butSquare
andCircle
are both concrete classes.
To refresh your memory, here are the concrete classes:
class Square(Shape):
def __init__(self, length: float) -> None:
self.length = length
def get_area(self) -> float:
return self.length ** 2
class Circle(Shape):
def __init__(self, radius: float) -> None:
self.radius = radius
def area(self) -> float:
return pi * self.radius ** 2
And just a reminder, Circle
still incorrectly implements area
instead of get_area
. But luckily, that mistake can be detected with mypy
now. All we need to do is have some code that tries to instantiate Circle
, and it outputs this error:
$ mypy abstract.py
abstract.py:31: error: Cannot instantiate abstract class "Circle" with abstract attribute "get_area" [abstract]
Found 1 error in 1 file (checked 1 source file)
Because get_area()
was not overridden in the concrete class, it’s still considered abstract and the error is shown. Hopefully this triggers someone to rename the area()
method to get_area()
, and everything will be right in the world.
We’ll close this post by talking about the explicit information granted by abstract classes and methods.
What abstract methods tell you (and other programmers) about a class
If we only work with concrete classes, we’re kind of in-the-dark in terms of separation of responsibilities when creating subclasses. For example, let’s revisit the idea of cutting shapes out of material. We want to know how much each shape costs, which is equal to the cost of the material per area, times the area. This cost calculation doesn’t change based on the type of shape, just on the area. We can therefore create a get_cost()
method on Shape
like so:
from abc import ABC, abstractmethod
from math import pi
class Shape(ABC):
@abstractmethod
def get_area(self) -> float:
pass
def get_cost(self, cost_per_area: float) -> float:
return cost_per_area * self.get_area()
The get_cost()
just takes the cost of the material per whatever the area is (which could be square metre or square foot, whichever you prefer) and multiplies it by the area of the shape to give the total cost of the shape. Note that this method is not abstract. Concrete methods can call abstract methods because:
- A class without concrete overrides of abstract methods can’t be instantiated.
- Therefore a class must have implemented the abstract methods once instantiated.
- And following from that, calling a method that was abstract is perfectly fine as it must no longer be abstract once on a concrete class.
To give an example of it in use, at $5/square metre, how much does a 3m x 3m square cost?
>>> s = Square(3)
>>> s.get_cost(5)
45
As someone who is working on a new Shape
subclass, what do these method decorators tell me?
I know that the only methods that must be overriden are the abstract ones. The original developer has given me a clue the get_cost()
is the same calculation regardless of the type of shape, whereas get_area()
is different and should be implemented for each subclass. That’s not to say that get_cost()
can’t be implemented in a subclass. If there’s a lot of wastage when cutting out Hexagon
s then maybe the cost needs to be increased for that shape, but it’s certainly not the expected behaviour.
Conclusion
Using abstract classes can help solve the issue of related classes that differ in parts of their implementation and need to be used interchangeably. Abstract classes with abstract methods can explicitly mark methods that need to be implemented by subclasses, making it easier for developers to know which methods to override and reducing the risk of making mistakes.
Additionally, the use of abstract classes can help detect problems before runtime (in production!) using mypy
. The shared base classes and polymorphism that result from the use of abstract classes allow for more streamlined code and reduced code for type checking.