What’s the problem?
You need to pass around a bunch of related values. Maybe, a username and password. Or, to make an HTTP request you need a method, URL and (optionally) some headers and data. It can be tempting to use a dict
as an easy way to store these. But, your IDE won’t be able to autocomplete the key names. You also won’t be able to find out before runtime if you’ve made a mistake (req["method"]
vs. req["METHOD"]
, for example). It’s also hard to do type checking, if your dictionary values are of different types.
You could implement custom classes every time, which is arguably a better solution. But writing classes requires a lot of boilerplate code, for example, an __init__()
method to set all the attributes you want to store.
How can it be solved?
By using Data Classes. The dataclass
decorator from the dataclasses
module can be applied to a class to automatically set up the __init__()
method. The class can then be constructed with parameters automatically assigned to attributes. Type hints are actually used to define the attributs, and if your IDE understands data classes you’ll get lots of autocomplete goodness.
The code in this post uses type hinting, if you’re not familiar with the syntax, please check out my type hinting intro.
Let’s start with the basics.
Basic dataclass
usage
The dataclass
decorator can be imported from the dataclasses
module. Then, simply define a class with the attributes you need and add the @dataclass
decorator. Attributes must have type hints. They can optionally also have default values, just like normal classes. It’s simple, and not much code.
Let’s return to our HTTP request example. We’ll create a data class to represent an HTTP request. We won’t implement absolutely everything in the HTTP spec, just some common properties:
method
: the method used to make the request, for exampleGET
,POST
, orPUT
.url
: the URL to send the request to.headers
: a dictionary of HTTP headers to use for the request. This is optional.body
: a dictionary of data to send with the request. This is optional too.
Here’s how an HttpRequest
class could be defined as a dataclass
:
from dataclasses import dataclass
import typing
@dataclass
class HttpRequest:
method: str
url: str
headers: typing.Optional[typing.Dict[str, str]] = None
body: typing.Optional[typing.Dict[str, str]] = None
The code starts out by importing dataclass
and typing
. Then, the HttpRequest
class is defined. You can see we’ve just defined the four attributes we need. Two of them, headers
and body
, are both optional and so have been given the default value None
, exactly the same as how you’d define defaults on attributes on standard classes.
The line above the class definition is where the magic happens, by decorating the class with @dataclass
. Let’s explore the magical properties that it grants.
First, we have an automatic __init__()
method that’s generated, with default/optional parameters. Let’s see what happens when just passing in two values:
>>> req = HttpRequest("GET", "https://www.example.com")
>>> req.method
'GET'
>>> req.url
'https://www.example.com'
>>> req.headers
>>> req.body
As you can see, the order of the parameters to __init__()
matches the order in which the attributes were defined. Since the headers
and body
parameters are unfilled, they default to None
.
Arguments can also be provided as keywords. For example, to make a POST
request with a body
dictionary:
>>> post_req = HttpRequest(
"POST",
"https://www.example.com/login",
body={"username": "ben", "password": "123456"}
)
>>> post_req.method
'POST'
>>> post_req.url
'https://www.example.com/login'
>>> post_req.headers
>>> post_req.body
{'username': 'ben', 'password': '123456'}
The body
parameter that was passed in is now available in the expected place. headers
remains as None
, again, as you would expect.
Now let’s look at some of the benefits we get thanks to the type hinting.
Type checking with dataclass
Type checking data classes with mypy
should Just Work™. With what we have done so far, mypy
shouldn’t give any errors. However, if we make a change to how HttpRequest
is instantiated to pass in some incorrect types, we’ll get errors. Perhaps if we try to pass headers as a list
of tuple
s, for example:
h = HttpRequest(
"GET",
"https://www.example.com",
[("User-Agent": "Next Level Python Client v/1.0")]
)
Then mypy
will alert us of this mistake:
$ mypy dataclasses_example.py
dataclasses_example.py:17: error: Argument 3 to "HttpRequest" has incompatible type "List[Tuple[str, str]]"; expected "Optional[Dict[str, str]]" [arg-type]
Found 1 error in 1 file (checked 1 source file)
As well as type-safety, data classes have one more advantage over dictionaries – they can be made read-only with the frozen
argument. Let’s look at this now
Read-only (frozen) data classes
Sometimes we might want to make a data class read only. Perhaps in a codebase with multiple contributors, you want to make it clear that the data shouldn’t be updated once written.
This is more for annotating the intention of non-writability to your fellow developers, rather than fool-proof security. Because of the dynamic nature of Python, someone clever is sure to find away around it.
To freeze a data class, use the @dataclass()
decorator as a function instead, and pass in the frozen=True
argument. We’ll use a slightly shorter data class for these demonstrations so there’s less code to look at. We’ll define Point
, a class that stores an x
and y
coordinate.
@dataclass(frozen=True)
class Point:
x: float
y: float
Note the decorator is applied in its functional form with the frozen=True
keyword argument.
We can construct a Point
in the usual way, but watch what happens when we try to change a value.
>>> p.x = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'x'
A dataclasses.FrozenInstanceError
exception is raised and the value is unable to be changed.
Another advantage of using frozen data classes is that they become hashable. In Python terms, this means that an object with certain parameters/values can have a unique hash (identifier) generated for it. Only objects that are hashable can be used in certain contexts, for example, added to set
s or used as dict
keys.
Let’s demonstrate this with a couple of examples. First, trying to add the same Point
to a set twice. Since set
s are only allowed to contain unique values, it should only end up in the set
once.
>>> points = set() # create a set
>>> len(points) # see how many values it contains (0)
0
>>> points.add(Point(1,2)) # add in a point with coordinates (1, 2)
>>> len(points) # check the set's length again, only one value
1
>>> points.add(Point(1, 2)) # try to add the same point again
# the length is still 1 as an equivalent Point was already there
>>> len(points)
1
>>> points.add(Point(2, 1)) # add another point
>>> len(points) # since it's a different Point, the set grows
2
And another short example, create a dict
mapping of a Point
to its magnitude.
Technically a point doesn’t have magnitude, so we actually mean the distance from the origin
(0, 0)
, as if the point is a vector that starts from there. The important part is that the magnitude of such a vector is equal to the square root of x^2 + y^2.
>>> from math import sqrt # the sqrt function is needed
>>> magnitudes = {} # the dictionary of magnitudes
>>> p1 = Point(1, 2) # create some points
>>> p2 = Point(3, 5)
# calculate and store the magnitudes
>>> magnitudes[p1] = sqrt(p1.x ** 2 + p1.y ** 2)
>>> magnitudes[p2] = sqrt(p2.x ** 2 + p2.y ** 2)
>>> magnitudes # here's what they look like printed out
{Point(x=1, y=2): 2.23606797749979, Point(x=3, y=5): 5.830951894845301}
>>> magnitudes[p1] # and each one is accessible by the `Point`
2.23606797749979
Now let’s compare these examples with a non-frozen dataclass, in this case one called DefrostedPoint
. It can’t even be added to a set
:
>>> points.add(DefrostedPoint(1,2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'DefrostedPoint'
And we also get a TypeError
if trying to use it as a dictionary key:
>>> p3 = DefrostedPoint(1, 2)
>>> magnitudes[p3] = sqrt(p3.x ** 2 + p3.y ** 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'DefrostedPoint'
We previously saw a function to calculate magnitudes. It would make sense if this was a method on the Point
class, or maybe even better, a property. Do data classes support extra methods? Let’s find out.
Data classes and extra methods
Methods can be added to data classes just like normal classes. Here’s an implementation of the magnitude calculation in a property on Point
:
@dataclass(frozen=True)
class Point:
x: float
y: float
@property
def magnitude(self) -> float:
return sqrt(self.x ** 2 + self.y ** 2)
And here it is in action:
>>> p1 = Point(1, 2)
>>> p1.magnitude
2.23606797749979
So, perhaps not surprisingly at all, it’s just like executing a method on a normal class.
We’ll finish up this post byt looking at a couple of helper functions the dataclasses
modules provides for converting to primitive types: asdict
and astuple
.
Converting data classes to dict
and tuple
Sometimes we do want to be able to convert our data classes to more primitive types, maybe to dict
s for JSON serialization or tuple
s for compatibility with other code. Whatever the reason, asdict
and astuple
can help.
First a basic example of converting an HttpRequest
object to dictionary. First the asdict
function needs to be imported.
>>> from dataclasses import asdict
Next, we’ll construct an example HttpRequest
(actually we’ll reuse one we’ve seen before), then pass it to asdict
.
>>> h = HttpRequest(
... "GET",
... "https://www.example.com",
... [("User-Agent", "Next Level Python Client v/1.0")]
... )
>>> asdict(h)
{'method': 'GET', 'url': 'https://www.example.com', 'headers': [('User-Agent', 'Next Level Python Client v/1.0')], 'body': None}
And again with astuple
. First import the function:
>>> from dataclasses import asdict
Then call it, passing in the particular dataclass
instance:
>>> astuple(h)
('GET', 'https://www.example.com', [('User-Agent', 'Next Level Python Client v/1.0')], None)
The attributes are output in the order they are specified.
What’s even more impressive about asdict
and astuple
is they work fine with nested data classes, calling themselves recursively to build the primitive object.
We’ll quickly create another data class to demonstrate this. And yes, this once again shows how easy data classes are to quickly set up!
In this scenario, we want to make a number of authenticated HTTP requests. We need to store the username and password that will be used for every request, plus a list of requests that will be sent with those credentials.
Here is the RequestSet
data class.
@dataclass
class RequestSet:
username: str
password: str
requests: typing.List[HttpRequest]
We’ll start by creating two HTTP requests (with some not-very-exciting variable names).
>>> r1 = HttpRequest("GET", "https://example.com/posts")
>>> r2 = HttpRequest("POST", "https://example.com/create-post", body={"Title": "My Cool Post"})
Next we’ll create the set of RequestSet
to store these two requests and the authentication.
>>> rs = RequestSet("ben", "123465", [r1, r2])
Then let’s see the output from asdict
. Remember this is just being called with rs
, the top-level RequestSet
.
>>> asdict(rs)
{'username': 'ben', 'password': '123465', 'requests': [{'method': 'GET', 'url': 'https://example.com/posts', 'headers': None, 'body': None}, {'method': 'POST', 'url': 'https://example.com/create-post', 'headers': None, 'body': {'Title': 'My Cool Post'}}]}
Notice how it recursively converts all the child HttpRequest
objecst to dict
s as well.
Finally, we’ll see how astuple
behaves when used in the same manner.
>>> astuple(rs)
('ben', '123465', [('GET', 'https://example.com/posts', None, None), ('POST', 'https://example.com/create-post', None, {'Title': 'My Cool Post'})])
It’s obviously a little less verbose then asdict
but you can see it’s tuple
s all the way down.
And with that, the introduction to dataclasses
is done.
Conclusion
Swapping data classes for dict
s has quite a few advantages. You get type safety and IDE autocompletion. As you should know from my post about enums, anything to reduce mistakes from using the wrong variable/key/attribute name is a good thing. And IDE autocompletion can save so much time, it should be used whenever possible. Being able to make them frozen, and therefore readonly and hashable, increases the number of places where you can use them (for example, did you known dict
s are unhashable and can’t be used in set
s?).
With the use of asdict
, you can even translate dataclasses back into dict
s, should you need to.
So next time you’re tempting to reach for dict
, try a dataclass
instead – it could just save you some problems in production!