Introduction to Data classes

One of the new features introduced in Python 3.7 is the data classes (Data classes). They are designed to automate the generation of code classes that are used to store data. Despite the fact that they use other mechanisms of work, they can be compared with "mutable named tuples with default values".

Introduction

All of these examples require Python 3.7 or higher for their work.

Most python developers have to write such classes regularly:

class RegularBook: def __init__(self, title, author): self.title = title self.author = author

Already in this example is visible redundancy. The title and author identifiers are used several times. The real class will also contain the overridden methods __eq__ and __repr__ .

The dataclasses module contains the @dataclass decorator. Using it, the similar code will look like this:

 from dataclasses import dataclass @dataclass class Book: title: str author: str

It is important to note that type annotations are required . All fields that do not have type marks will be ignored. Of course, if you do not want to use a specific type, you can specify Any from the typing module.

What do you get as a result? You automatically get a class with the implemented methods __init__ , __repr__ , __str__ and __eq__ . In addition, it will be a regular class and you can inherit from it or add arbitrary methods.

 >>> book = Book(title="Fahrenheit 451", author="Bradbury") >>> book Book(title='Fahrenheit 451', author='Bradbury') >>> book.author 'Bradbury' >>> other = Book("Fahrenheit 451", "Bradbury") >>> book == other True

Alternatives

Tuple or dictionary

Of course, if the structure is fairly simple, you can save the data to a dictionary or tuple:

 book = ("Fahrenheit 451", "Bradbury") other = {'title': 'Fahrenheit 451', 'author': 'Bradbury'}

However, this approach has disadvantages:

It must be remembered that the variable contains data related to this structure.
In the case of a dictionary, you must keep track of the key names. Such initialization of the dictionary {'name': 'Fahrenheit 451', 'author': 'Bradbury'} will also be formally correct.
In the case of a tuple, you must follow the order of the values, since they do not have names.

There is a better option:

Namedtuple

 from collections import namedtuple NamedTupleBook = namedtuple("NamedTupleBook", ["title", "author"])

If we use the class created in this way, we will actually get the same thing as using with the data class.

 >>> book = NamedTupleBook("Fahrenheit 451", "Bradbury") >>> book.author 'Bradbury' >>> book NamedTupleBook(title='Fahrenheit 451', author='Bradbury') >>> book == NamedTupleBook("Fahrenheit 451", "Bradbury")) True

But despite the common similarity, named tuples have their limitations. They come from the fact that named tuples are still tuples.

First, you can still compare instances of different classes.

 >>> Car = namedtuple("Car", ["model", "owner"]) >>> book = NamedTupleBook("Fahrenheit 451", "Bradbury")) >>> book == Car("Fahrenheit 451", "Bradbury") True

Secondly, named tuples are immutable. In some situations, this is useful, but I would like more flexibility.
Finally, you can operate on a named tuple as well as a regular tuple. For example, iterate.

Other projects

If not limited to the standard library, you can find other solutions to this problem. In particular, the project attrs . It can even more than dataclass and works on older versions of python such as 2.7 and 3.4. Nevertheless, the fact that it is not part of the standard library may be inconvenient.

Creature

To create a data class, you can use the @dataclass decorator. In this case, all class fields defined with type annotation will be used in the corresponding methods of the resulting class.

Alternatively, there is the function make_dataclass , which works similarly to creating named tuples.

 from dataclasses import make_dataclass Book = make_dataclass("Book", ["title", "author"]) book = Book("Fahrenheit 451", "Bradbury")

Default values

One of the useful features is the ease of adding default values to fields. You still do not need to override the __init__ method; it’s enough to specify values directly in the class.

 @dataclass class Book: title: str = "Unknown" author: str = "Unknown author"

They will be taken into account in the generated method __init__

 >>> Book() Book(title='Unknown', author='Unknown author') >>> Book("Farenheit 451") Book(title='Farenheit 451', author='Unknown author')

But as is the case with regular classes and methods, you need to be careful with the use of mutable defaults. If you, for example, need to use a list as there are default values, there is another way, but more on that below.

In addition, it is important to follow the order of defining the fields that have default values, since it exactly corresponds to their order in the __init__ method

Immunity Data Classes

Named tuple instances are immutable. In many situations, this is a good idea. For data classes, you can also do this. Just specify the parameter frozen=True when creating the class and if you try to change its fields, the exception FrozenInstanceError

 @dataclass(frozen=True) class Book: title: str author: str

 >>> book = Book("Fahrenheit 451", "Bradbury") >>> book.title = "1984" dataclasses.FrozenInstanceError: cannot assign to field 'title'

Configure data class

In addition to the frozen parameter, the @dataclass decorator has other parameters:

init : if it is True (the default), the __init__ method is generated. If the class has __init__ method already defined, the parameter is ignored.
repr : includes (by default) the creation of the __repr__ method. The generated string contains the class name and the name and representation of all fields defined in the class. In this case, you can exclude individual fields (see below)
eq : Enables (by default) the creation of the __eq__ method. The objects are compared in the same way as if they were tuples containing the corresponding field values. Additionally, the type matching is checked.
order includes (by default, disabled) the creation of __lt__ , __le__ , __gt__ and __ge__ . The objects are compared in the same way as the corresponding tuples of field values. At the same time, the type of objects is also checked. If order specified and eq is not, a ValueError exception will be thrown. Also, the class should not contain already defined comparison methods.
unsafe_hash affects the generation of the __hash__ method. The behavior also depends on the values of the parameters eq and frozen

Customization of individual fields

In most standard situations, this is not required, but it is possible to customize the behavior of the data class down to individual fields using the field function.

Variable Defaults

A typical situation described above is the use of lists or other mutable defaults. You may want a bookshelf class containing a list of books. If you run the following code:

 @dataclass class Bookshelf: books: List[Book] = []

The interpreter will report an error:

 ValueError: mutable default <class 'list'> for field books is not allowed: use default_factory

However, for other variable values, this warning will not work and will lead to incorrect program behavior.

To avoid problems, it is suggested to use the field function's default_factory parameter. As its value can be any called object or function without parameters.
The correct version of the class looks like this:

 @dataclass class Bookshelf: books: List[Book] = field(default_factory=list)

Other options

In addition to the specified default_factory , the field function has the following parameters:

default : the default value. This parameter is required because calling field replaces setting the default field value
init : enables (default) the use of the field in the __init__ method
repr : enables (default) the use of the field in the __repr__ method
compare includes (default) the use of the field in comparison methods ( __eq__ , __le__ and others)
hash : may be a boolean value or None . If it is True , the field is used when calculating the hash. If None specified (default), the value of the compare parameter is used.
One of the reasons to specify hash=False for a given compare=True may be the complexity of calculating the field hash while it is necessary for comparison.
metadata : arbitrary dictionary or None . The value is wrapped in MappingProxyType so that it becomes immutable. This parameter is not used by the data classes themselves and is intended for the operation of third-party extensions.

Processing after initialization

The auto-generated __init__ method calls the __post_init__ method if it is defined in the class. As a rule, it is called in the form self.__post_init__() , however, if variables of type InitVar defined in the class, they will be passed as method parameters.

If the __init__ method has not been generated, then __post_init__ will not be called.

For example, add the generated book description

 @dataclass class Book: title: str author: str desc: str = None def __post_init__(self): self.desc = self.desc or "`%s` by %s" % (self.title, self.author)

 >>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')

Parameters for initialization only

One of the possibilities associated with the __post_init__ method is the parameters used only for initialization. If you declare a field as its type InitVar when declaring a field, its value will be passed as a parameter of the __post_init__ method. In no other way, such fields are not used in the data class.

 @dataclass class Book: title: str author: str gen_desc: InitVar[bool] = True desc: str = None def __post_init__(self, gen_desc: str): if gen_desc and self.desc is None: self.desc = "`%s` by %s" % (self.title, self.author)

 >>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury') >>> Book("Fareneheit 481", "Bradbury", gen_desc=False) Book(title='Fareneheit 481', author='Bradbury', desc=None)

Inheritance

When you use the @dataclass decorator, it goes through all the parent classes starting with object and for each data class found saves the fields in an ordered dictionary (mapping), then adding the properties of the class being processed. All generated methods use fields from the resulting ordered dictionary.

As a result, if the parent class defines default values, you will need to define the fields with default values.

Since an ordered dictionary stores values in the order of insertion, for the following classes

 @dataclass class BaseBook: title: Any = None author: str = None @dataclass class Book(BaseBook): desc: str = None title: str = "Unknown"

A __init__ method with the following signature will be generated:

 def __init__(self, title: str="Unknown", author: str=None, desc: str=None)

Source: https://habr.com/ru/post/415829/

All Articles