Smarter Value Storage: Data Classes

Old Dictionary

What’s the problem?

You need to pass around a bunch of related values. Maybe, a username and password. Or, to make an HTTP request you need a method, URL and (optionally) some headers and data. It can be tempting to use a dict as an easy way to store these. But, your IDE won’t be able to autocomplete the key names. You also won’t be able to find out before runtime if you’ve made a mistake (req["method"] vs. req["METHOD"], for example). It’s also hard to do type checking, if your dictionary values are of different types.

You could implement custom classes every time, which is arguably a better solution. But writing classes requires a lot of boilerplate code, for example, an __init__() method to set all the attributes you want to store.

How can it be solved?

By using Data Classes. The dataclass decorator from the dataclasses module can be applied to a class to automatically set up the __init__() method. The class can then be constructed with parameters automatically assigned to attributes. Type hints are actually used to define the attributs, and if your IDE understands data classes you’ll get lots of autocomplete goodness.

The code in this post uses type hinting, if you’re not familiar with the syntax, please check out my type hinting intro.

Let’s start with the basics.

Basic dataclass usage

The dataclass decorator can be imported from the dataclasses module. Then, simply define a class with the attributes you need and add the @dataclass decorator. Attributes must have type hints. They can optionally also have default values, just like normal classes. It’s simple, and not much code.

Let’s return to our HTTP request example. We’ll create a data class to represent an HTTP request. We won’t implement absolutely everything in the HTTP spec, just some common properties:

  • method: the method used to make the request, for example GET, POST, or PUT.
  • url: the URL to send the request to.
  • headers: a dictionary of HTTP headers to use for the request. This is optional.
  • body: a dictionary of data to send with the request. This is optional too.

Here’s how an HttpRequest class could be defined as a dataclass:

from dataclasses import dataclass
import typing


@dataclass
class HttpRequest:
    method: str
    url: str
    headers: typing.Optional[typing.Dict[str, str]] = None
    body: typing.Optional[typing.Dict[str, str]] = None

The code starts out by importing dataclass and typing. Then, the HttpRequest class is defined. You can see we’ve just defined the four attributes we need. Two of them, headers and body, are both optional and so have been given the default value None, exactly the same as how you’d define defaults on attributes on standard classes.

The line above the class definition is where the magic happens, by decorating the class with @dataclass. Let’s explore the magical properties that it grants.

First, we have an automatic __init__() method that’s generated, with default/optional parameters. Let’s see what happens when just passing in two values:

>>> req = HttpRequest("GET", "https://www.example.com")
>>> req.method
'GET'
>>> req.url
'https://www.example.com'
>>> req.headers
>>> req.body

As you can see, the order of the parameters to __init__() matches the order in which the attributes were defined. Since the headers and body parameters are unfilled, they default to None.

Arguments can also be provided as keywords. For example, to make a POST request with a body dictionary:

>>> post_req = HttpRequest(
    "POST",
    "https://www.example.com/login",
    body={"username": "ben", "password": "123456"}
)
>>> post_req.method
'POST'
>>> post_req.url
'https://www.example.com/login'
>>> post_req.headers
>>> post_req.body
{'username': 'ben', 'password': '123456'}

The body parameter that was passed in is now available in the expected place. headers remains as None, again, as you would expect.

Now let’s look at some of the benefits we get thanks to the type hinting.

Type checking with dataclass

Type checking data classes with mypy should Just Work™. With what we have done so far, mypy shouldn’t give any errors. However, if we make a change to how HttpRequest is instantiated to pass in some incorrect types, we’ll get errors. Perhaps if we try to pass headers as a list of tuples, for example:

h = HttpRequest(
    "GET",
    "https://www.example.com",
    [("User-Agent": "Next Level Python Client v/1.0")]
)

Then mypy will alert us of this mistake:

$ mypy dataclasses_example.py
dataclasses_example.py:17: error: Argument 3 to "HttpRequest" has incompatible type "List[Tuple[str, str]]"; expected "Optional[Dict[str, str]]"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

As well as type-safety, data classes have one more advantage over dictionaries – they can be made read-only with the frozen argument. Let’s look at this now

Read-only (frozen) data classes

Sometimes we might want to make a data class read only. Perhaps in a codebase with multiple contributors, you want to make it clear that the data shouldn’t be updated once written.

This is more for annotating the intention of non-writability to your fellow developers, rather than fool-proof security. Because of the dynamic nature of Python, someone clever is sure to find away around it.

To freeze a data class, use the @dataclass() decorator as a function instead, and pass in the frozen=True argument. We’ll use a slightly shorter data class for these demonstrations so there’s less code to look at. We’ll define Point, a class that stores an x and y coordinate.

@dataclass(frozen=True)
class Point:
    x: float
    y: float

Note the decorator is applied in its functional form with the frozen=True keyword argument.

We can construct a Point in the usual way, but watch what happens when we try to change a value.

>>> p.x = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'x'

A dataclasses.FrozenInstanceError exception is raised and the value is unable to be changed.

Another advantage of using frozen data classes is that they become hashable. In Python terms, this means that an object with certain parameters/values can have a unique hash (identifier) generated for it. Only objects that are hashable can be used in certain contexts, for example, added to sets or used as dict keys.

Let’s demonstrate this with a couple of examples. First, trying to add the same Point to a set twice. Since sets are only allowed to contain unique values, it should only end up in the set once.

>>> points = set()  # create a set
>>> len(points)  # see how many values it contains (0)
0

>>> points.add(Point(1,2))  # add in a point with coordinates (1, 2)
>>> len(points)  # check the set's length again, only one value
1

>>> points.add(Point(1, 2))  # try to add the same point again
# the length is still 1 as an equivalent Point was already there
>>> len(points)
1

>>> points.add(Point(2, 1))  # add another point
>>> len(points)  # since it's a different Point, the set grows
2

And another short example, create a dict mapping of a Point to its magnitude.

Technically a point doesn’t have magnitude, so we actually mean the distance from the origin (0, 0), as if the point is a vector that starts from there. The important part is that the magnitude of such a vector is equal to the square root of x^2 + y^2.

>>> from math import sqrt  # the sqrt function is needed
>>> magnitudes = {}  # the dictionary of magnitudes

>>> p1 = Point(1, 2)  # create some points
>>> p2 = Point(3, 5)

# calculate and store the magnitudes
>>> magnitudes[p1] = sqrt(p1.x ** 2 + p1.y ** 2)
>>> magnitudes[p2] = sqrt(p2.x ** 2 + p2.y ** 2)
>>> magnitudes  # here's what they look like printed out
{Point(x=1, y=2): 2.23606797749979, Point(x=3, y=5): 5.830951894845301}
>>> magnitudes[p1]  # and each one is accessible by the `Point`
2.23606797749979

Now let’s compare these examples with a non-frozen dataclass, in this case one called DefrostedPoint. It can’t even be added to a set:

>>> points.add(DefrostedPoint(1,2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'DefrostedPoint'

And we also get a TypeError if trying to use it as a dictionary key:

>>> p3 = DefrostedPoint(1, 2)
>>> magnitudes[p3] = sqrt(p3.x ** 2 + p3.y ** 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'DefrostedPoint'

We previously saw a function to calculate magnitudes. It would make sense if this was a method on the Point class, or maybe even better, a property. Do data classes support extra methods? Let’s find out.

Data classes and extra methods

Methods can be added to data classes just like normal classes. Here’s an implementation of the magnitude calculation in a property on Point:

@dataclass(frozen=True)
class Point:
    x: float
    y: float

    @property
    def magnitude(self) -> float:
            return sqrt(self.x ** 2 + self.y ** 2)

And here it is in action:

>>> p1 = Point(1, 2)
>>> p1.magnitude
2.23606797749979

So, perhaps not surprisingly at all, it’s just like executing a method on a normal class.

We’ll finish up this post byt looking at a couple of helper functions the dataclasses modules provides for converting to primitive types: asdict and astuple.

Converting data classes to dict and tuple

Sometimes we do want to be able to convert our data classes to more primitive types, maybe to dicts for JSON serialization or tuples for compatibility with other code. Whatever the reason, asdict and astuple can help.

First a basic example of converting an HttpRequest object to dictionary. First the asdict function needs to be imported.

>>> from dataclasses import asdict

Next, we’ll construct an example HttpRequest (actually we’ll reuse one we’ve seen before), then pass it to asdict.

>>> h = HttpRequest(
...         "GET",
...         "https://www.example.com",
...         [("User-Agent", "Next Level Python Client v/1.0")]
...     )
>>> asdict(h)
{'method': 'GET', 'url': 'https://www.example.com', 'headers': [('User-Agent', 'Next Level Python Client v/1.0')], 'body': None}

And again with astuple. First import the function:

>>> from dataclasses import asdict

Then call it, passing in the particular dataclass instance:

>>> astuple(h)
('GET', 'https://www.example.com', [('User-Agent', 'Next Level Python Client v/1.0')], None)

The attributes are output in the order they are specified.

What’s even more impressive about asdict and astuple is they work fine with nested data classes, calling themselves recursively to build the primitive object.

We’ll quickly create another data class to demonstrate this. And yes, this once again shows how easy data classes are to quickly set up!

In this scenario, we want to make a number of authenticated HTTP requests. We need to store the username and password that will be used for every request, plus a list of requests that will be sent with those credentials.

Here is the RequestSet data class.

@dataclass
class RequestSet:
    username: str
    password: str
    requests: typing.List[HttpRequest]

We’ll start by creating two HTTP requests (with some not-very-exciting variable names).

>>> r1 = HttpRequest("GET", "https://example.com/posts")
>>> r2 = HttpRequest("POST", "https://example.com/create-post", body={"Title": "My Cool Post"})

Next we’ll create the set of RequestSet to store these two requests and the authentication.

>>> rs = RequestSet("ben", "123465", [r1, r2])

Then let’s see the output from asdict. Remember this is just being called with rs, the top-level RequestSet.

>>> asdict(rs)
{'username': 'ben', 'password': '123465', 'requests': [{'method': 'GET', 'url': 'https://example.com/posts', 'headers': None, 'body': None}, {'method': 'POST', 'url': 'https://example.com/create-post', 'headers': None, 'body': {'Title': 'My Cool Post'}}]}

Notice how it recursively converts all the child HttpRequest objecst to dicts as well.

Finally, we’ll see how astuple behaves when used in the same manner.

>>> astuple(rs)
('ben', '123465', [('GET', 'https://example.com/posts', None, None), ('POST', 'https://example.com/create-post', None, {'Title': 'My Cool Post'})])

It’s obviously a little less verbose then asdict but you can see it’s tuples all the way down.

And with that, the introduction to dataclasses is done.

Conclusion

Swapping data classes for dicts has quite a few advantages. You get type safety and IDE autocompletion. As you should know from my post about enums, anything to reduce mistakes from using the wrong variable/key/attribute name is a good thing. And IDE autocompletion can save so much time, it should be used whenever possible. Being able to make them frozen, and therefore readonly and hashable, increases the number of places where you can use them (for example, did you known dicts are unhashable and can’t be used in sets?).

With the use of asdict, you can even translate dataclasses back into dicts, should you need to.

So next time you’re tempting to reach for dict, try a dataclass instead – it could just save you some problems in production!


Leave a Reply

Your email address will not be published. Required fields are marked *