Introduction to Data classes

One of the new features introduced in Python 3.7 is the data classes (Data classes). They are designed to automate the generation of code classes that are used to store data. Despite the fact that they use other mechanisms of work, they can be compared with "mutable named tuples with default values".



Introduction


All of these examples require Python 3.7 or higher for their work.

Most python developers have to write such classes regularly:


class RegularBook: def __init__(self, title, author): self.title = title self.author = author 

Already in this example is visible redundancy. The title and author identifiers are used several times. The real class will also contain the overridden methods __eq__ and __repr__ .


The dataclasses module contains the @dataclass decorator. Using it, the similar code will look like this:


 from dataclasses import dataclass @dataclass class Book: title: str author: str 

It is important to note that type annotations are required . All fields that do not have type marks will be ignored. Of course, if you do not want to use a specific type, you can specify Any from the typing module.


What do you get as a result? You automatically get a class with the implemented methods __init__ , __repr__ , __str__ and __eq__ . In addition, it will be a regular class and you can inherit from it or add arbitrary methods.


 >>> book = Book(title="Fahrenheit 451", author="Bradbury") >>> book Book(title='Fahrenheit 451', author='Bradbury') >>> book.author 'Bradbury' >>> other = Book("Fahrenheit 451", "Bradbury") >>> book == other True 

Alternatives


Tuple or dictionary


Of course, if the structure is fairly simple, you can save the data to a dictionary or tuple:


 book = ("Fahrenheit 451", "Bradbury") other = {'title': 'Fahrenheit 451', 'author': 'Bradbury'} 

However, this approach has disadvantages:



There is a better option:


Namedtuple


 from collections import namedtuple NamedTupleBook = namedtuple("NamedTupleBook", ["title", "author"]) 

If we use the class created in this way, we will actually get the same thing as using with the data class.


 >>> book = NamedTupleBook("Fahrenheit 451", "Bradbury") >>> book.author 'Bradbury' >>> book NamedTupleBook(title='Fahrenheit 451', author='Bradbury') >>> book == NamedTupleBook("Fahrenheit 451", "Bradbury")) True 

But despite the common similarity, named tuples have their limitations. They come from the fact that named tuples are still tuples.


First, you can still compare instances of different classes.


 >>> Car = namedtuple("Car", ["model", "owner"]) >>> book = NamedTupleBook("Fahrenheit 451", "Bradbury")) >>> book == Car("Fahrenheit 451", "Bradbury") True 

Secondly, named tuples are immutable. In some situations, this is useful, but I would like more flexibility.
Finally, you can operate on a named tuple as well as a regular tuple. For example, iterate.


Other projects


If not limited to the standard library, you can find other solutions to this problem. In particular, the project attrs . It can even more than dataclass and works on older versions of python such as 2.7 and 3.4. Nevertheless, the fact that it is not part of the standard library may be inconvenient.


Creature


To create a data class, you can use the @dataclass decorator. In this case, all class fields defined with type annotation will be used in the corresponding methods of the resulting class.


Alternatively, there is the function make_dataclass , which works similarly to creating named tuples.


 from dataclasses import make_dataclass Book = make_dataclass("Book", ["title", "author"]) book = Book("Fahrenheit 451", "Bradbury") 

Default values


One of the useful features is the ease of adding default values ​​to fields. You still do not need to override the __init__ method; it’s enough to specify values ​​directly in the class.


 @dataclass class Book: title: str = "Unknown" author: str = "Unknown author" 

They will be taken into account in the generated method __init__


 >>> Book() Book(title='Unknown', author='Unknown author') >>> Book("Farenheit 451") Book(title='Farenheit 451', author='Unknown author') 

But as is the case with regular classes and methods, you need to be careful with the use of mutable defaults. If you, for example, need to use a list as there are default values, there is another way, but more on that below.


In addition, it is important to follow the order of defining the fields that have default values, since it exactly corresponds to their order in the __init__ method


Immunity Data Classes


Named tuple instances are immutable. In many situations, this is a good idea. For data classes, you can also do this. Just specify the parameter frozen=True when creating the class and if you try to change its fields, the exception FrozenInstanceError


 @dataclass(frozen=True) class Book: title: str author: str 

 >>> book = Book("Fahrenheit 451", "Bradbury") >>> book.title = "1984" dataclasses.FrozenInstanceError: cannot assign to field 'title' 

Configure data class


In addition to the frozen parameter, the @dataclass decorator has other parameters:



Customization of individual fields


In most standard situations, this is not required, but it is possible to customize the behavior of the data class down to individual fields using the field function.


Variable Defaults


A typical situation described above is the use of lists or other mutable defaults. You may want a bookshelf class containing a list of books. If you run the following code:


 @dataclass class Bookshelf: books: List[Book] = [] 

The interpreter will report an error:


 ValueError: mutable default <class 'list'> for field books is not allowed: use default_factory 

However, for other variable values, this warning will not work and will lead to incorrect program behavior.


To avoid problems, it is suggested to use the field function's default_factory parameter. As its value can be any called object or function without parameters.
The correct version of the class looks like this:


 @dataclass class Bookshelf: books: List[Book] = field(default_factory=list) 

Other options


In addition to the specified default_factory , the field function has the following parameters:



Processing after initialization


The auto-generated __init__ method calls the __post_init__ method if it is defined in the class. As a rule, it is called in the form self.__post_init__() , however, if variables of type InitVar defined in the class, they will be passed as method parameters.


If the __init__ method has not been generated, then __post_init__ will not be called.


For example, add the generated book description


 @dataclass class Book: title: str author: str desc: str = None def __post_init__(self): self.desc = self.desc or "`%s` by %s" % (self.title, self.author) 

 >>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury') 

Parameters for initialization only


One of the possibilities associated with the __post_init__ method is the parameters used only for initialization. If you declare a field as its type InitVar when declaring a field, its value will be passed as a parameter of the __post_init__ method. In no other way, such fields are not used in the data class.


 @dataclass class Book: title: str author: str gen_desc: InitVar[bool] = True desc: str = None def __post_init__(self, gen_desc: str): if gen_desc and self.desc is None: self.desc = "`%s` by %s" % (self.title, self.author) 

 >>> Book("Fareneheit 481", "Bradbury") Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury') >>> Book("Fareneheit 481", "Bradbury", gen_desc=False) Book(title='Fareneheit 481', author='Bradbury', desc=None) 

Inheritance


When you use the @dataclass decorator, it goes through all the parent classes starting with object and for each data class found saves the fields in an ordered dictionary (mapping), then adding the properties of the class being processed. All generated methods use fields from the resulting ordered dictionary.


As a result, if the parent class defines default values, you will need to define the fields with default values.


Since an ordered dictionary stores values ​​in the order of insertion, for the following classes


 @dataclass class BaseBook: title: Any = None author: str = None @dataclass class Book(BaseBook): desc: str = None title: str = "Unknown" 

A __init__ method with the following signature will be generated:


 def __init__(self, title: str="Unknown", author: str=None, desc: str=None) 

Source: https://habr.com/ru/post/415829/


All Articles