One of the new features introduced in Python 3.7 is the data classes (Data classes). They are designed to automate the generation of code classes that are used to store data. Despite the fact that they use other mechanisms of work, they can be compared with "mutable named tuples with default values".
Introduction
All of these examples require Python 3.7 or higher for their work.
Most python developers have to write such classes regularly:
class RegularBook: def __init__(self, title, author): self.title = title self.author = author
Already in this example is visible redundancy. The title and author identifiers are used several times. The real class will also contain the overridden methods __eq__
and __repr__
.
The dataclasses
module contains the @dataclass
decorator. Using it, the similar code will look like this:
from dataclasses import dataclass @dataclass class Book: title: str author: str
It is important to note that type annotations are required . All fields that do not have type marks will be ignored. Of course, if you do not want to use a specific type, you can specify Any
from the typing
module.
What do you get as a result? You automatically get a class with the implemented methods __init__
, __repr__
, __str__
and __eq__
. In addition, it will be a regular class and you can inherit from it or add arbitrary methods.
>>> book = Book(title="Fahrenheit 451", author="Bradbury") >>> book Book(title='Fahrenheit 451', author='Bradbury') >>> book.author 'Bradbury' >>> other = Book("Fahrenheit 451", "Bradbury") >>> book == other True
Alternatives
Tuple or dictionary
Of course, if the structure is fairly simple, you can save the data to a dictionary or tuple:
book = ("Fahrenheit 451", "Bradbury") other = {'title': 'Fahrenheit 451', 'author': 'Bradbury'}
However, this approach has disadvantages:
- It must be remembered that the variable contains data related to this structure.
- In the case of a dictionary, you must keep track of the key names. Such initialization of the dictionary
{'name': 'Fahrenheit 451', 'author': 'Bradbury'}
will also be formally correct. - In the case of a tuple, you must follow the order of the values, since they do not have names.
There is a better option:
Namedtuple
from collections import namedtuple NamedTupleBook = namedtuple("NamedTupleBook", ["title", "author"])
If we use the class created in this way, we will actually get the same thing as using with the data class.
>>> book = NamedTupleBook("Fahrenheit 451", "Bradbury") >>> book.author 'Bradbury' >>> book NamedTupleBook(title='Fahrenheit 451', author='Bradbury') >>> book == NamedTupleBook("Fahrenheit 451", "Bradbury")) True
But despite the common similarity, named tuples have their limitations. They come from the fact that named tuples are still tuples.
First, you can still compare instances of different classes.
>>> Car = namedtuple("Car", ["model", "owner"]) >>> book = NamedTupleBook("Fahrenheit 451", "Bradbury")) >>> book == Car("Fahrenheit 451", "Bradbury") True
Secondly, named tuples are immutable. In some situations, this is useful, but I would like more flexibility.
Finally, you can operate on a named tuple as well as a regular tuple. For example, iterate.
Other projects
If not limited to the standard library, you can find other solutions to this problem. In particular, the project attrs . It can even more than dataclass and works on older versions of python such as 2.7 and 3.4. Nevertheless, the fact that it is not part of the standard library may be inconvenient.
Creature
To create a data class, you can use the @dataclass
decorator. In this case, all class fields defined with type annotation will be used in the corresponding methods of the resulting class.
Alternatively, there is the function make_dataclass
, which works similarly to creating named tuples.
from dataclasses import make_dataclass Book = make_dataclass("Book", ["title", "author"]) book = Book("Fahrenheit 451", "Bradbury")
Default values
One of the useful features is the ease of adding default values to fields. You still do not need to override the __init__
method; it’s enough to specify values directly in the class.
@dataclass class Book: title: str = "Unknown" author: str = "Unknown author"
They will be taken into account in the generated method __init__
>>> Book() Book(title='Unknown', author='Unknown author') >>> Book("Farenheit 451") Book(title='Farenheit 451', author='Unknown author')
But as is the case with regular classes and methods, you need to be careful with the use of mutable defaults. If you, for example, need to use a list as there are default values, there is another way, but more on that below.
In addition, it is important to follow the order of defining the fields that have default values, since it exactly corresponds to their order in the __init__
method
Immunity Data Classes
Named tuple instances are immutable. In many situations, this is a good idea. For data classes, you can also do this. Just specify the parameter frozen=True
when creating the class and if you try to change its fields, the exception FrozenInstanceError
@dataclass(frozen=True) class Book: title: str author: str
>>> book = Book("Fahrenheit 451", "Bradbury") >>> book.title = "1984" dataclasses.FrozenInstanceError: cannot assign to field 'title'
Configure data class
In addition to the frozen
parameter, the @dataclass
decorator has other parameters:
init
: if it is True
(the default), the __init__
method is generated. If the class has __init__
method already defined, the parameter is ignored.repr
: includes (by default) the creation of the __repr__
method. The generated string contains the class name and the name and representation of all fields defined in the class. In this case, you can exclude individual fields (see below)eq
: Enables (by default) the creation of the __eq__
method. The objects are compared in the same way as if they were tuples containing the corresponding field values. Additionally, the type matching is checked.order
includes (by default, disabled) the creation of __lt__
, __le__
, __gt__
and __ge__
. The objects are compared in the same way as the corresponding tuples of field values. At the same time, the type of objects is also checked. If order
specified and eq
is not, a ValueError
exception will be thrown. Also, the class should not contain already defined comparison methods.unsafe_hash
affects the generation of the __hash__
method. The behavior also depends on the values of the parameters eq
and frozen
Customization of individual fields
In most standard situations, this is not required, but it is possible to customize the behavior of the data class down to individual fields using the field function.
Variable Defaults
A typical situation described above is the use of lists or other mutable defaults. You may want a bookshelf class containing a list of books. If you run the following code:
@dataclass class Bookshelf: books: List[Book] = []
The interpreter will report an error:
ValueError: mutable default <class 'list'> for field books is not allowed: use default_factory
However, for other variable values, this warning will not work and will lead to incorrect program behavior.
To avoid problems, it is suggested to use the field
function's default_factory
parameter. As its value can be any called object or function without parameters.
The correct version of the class looks like this:
@dataclass class Bookshelf: books: List[Book] = field(default_factory=list)
Other options
In addition to the specified default_factory
, the field function has the following parameters:
default
: the default
value. This parameter is required because calling field
replaces setting the default field valueinit
: enables (default) the use of the field in the __init__
methodrepr
: enables (default) the use of the field in the __repr__
methodcompare
includes (default) the use of the field in comparison methods ( __eq__
, __le__
and others)hash
: may be a boolean value or None
. If it is True
, the field is used when calculating the hash. If None
specified (default), the value of the compare
parameter is used.
One of the reasons to specify hash=False
for a given compare=True
may be the complexity of calculating the field hash while it is necessary for comparison.metadata
: arbitrary dictionary or None
. The value is wrapped in MappingProxyType
so that it becomes immutable. This parameter is not used by the data classes themselves and is intended for the operation of third-party extensions.
Processing after initialization
The auto-generated __init__
method calls the __post_init__
method if it is defined in the class. As a rule, it is called in the form self.__post_init__()
, however, if variables of type InitVar
defined in the class, they will be passed as method parameters.
If the __init__
method has not been generated, then __post_init__
will not be called.
For example, add the generated book description
@dataclass class Book: title: str author: str desc: str = None def __post_init__(self): self.desc = self.desc or "`%s` by %s" % (self.title, self.author)
>>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')
Parameters for initialization only
One of the possibilities associated with the __post_init__
method is the parameters used only for initialization. If you declare a field as its type InitVar
when declaring a field, its value will be passed as a parameter of the __post_init__
method. In no other way, such fields are not used in the data class.
@dataclass class Book: title: str author: str gen_desc: InitVar[bool] = True desc: str = None def __post_init__(self, gen_desc: str): if gen_desc and self.desc is None: self.desc = "`%s` by %s" % (self.title, self.author)
>>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury') >>> Book("Fareneheit 481", "Bradbury", gen_desc=False) Book(title='Fareneheit 481', author='Bradbury', desc=None)
Inheritance
When you use the @dataclass
decorator, it goes through all the parent classes starting with object and for each data class found saves the fields in an ordered dictionary (mapping), then adding the properties of the class being processed. All generated methods use fields from the resulting ordered dictionary.
As a result, if the parent class defines default values, you will need to define the fields with default values.
Since an ordered dictionary stores values in the order of insertion, for the following classes
@dataclass class BaseBook: title: Any = None author: str = None @dataclass class Book(BaseBook): desc: str = None title: str = "Unknown"
A __init__
method with the following signature will be generated:
def __init__(self, title: str="Unknown", author: str=None, desc: str=None)