Hibernate - what the tutorials are silent about

This article will not cover the basics of hibernate (how to define an entity or write criteria criteria). Here I will try to talk about more interesting points, really useful in work. Information about which I have not met in one place.
image

Immediately make a reservation. All the following is true for Hibernate 5.2. Errors are also possible due to the fact that I misunderstood something. If you find - write.

Problems of mapping an object model to a relational


But let's start with the basics of ORM. ORM - object-relational mapping - respectively, we have a relational and object models. And when mapping one to another, there are problems that we need to solve on our own. Let's take them apart.

To illustrate, take the following example: we have the essence of “User”, which can be either a Jedi or an attack aircraft. The Jedi must have strength, and the attack aircraft must have specialization. Below is a class diagram.

image

Problem 1. Inheritance and polymorphic queries.


There is inheritance in the object model, but not in the relational one. Accordingly, this is the first problem - how to correctly display inheritance in the relational model.

Hibernate offers 3 options for displaying such an object model:

  1. All the heirs are in the same table:
    @Inheritance (strategy = InheritanceType.SINGLE_TABLE)

    image

    In this case, the common fields and the fields of the heirs are in the same table. Using this strategy, we avoid join-s when choosing entities. Of the minuses, it is worth noting that, firstly, in the relational model we cannot set the “NOT NULL” restriction for the column “force” and secondly, we lose the third normal form. (a transitive dependence of non-key attributes appears: force and disc).

    By the way, including for this reason, there are 2 ways to specify the not null field limit — NotNull is responsible for validation; @Column (nullable = true) - is responsible for not null restriction in the database.

    In my opinion, this is the best way to display an object model in relational.
  2. The entity-specific fields are in a separate table.

    @Inheritance (strategy = InheritanceType.JOINED)

    image

    In this case, the common fields are stored in the common table, and specific for the child entities - in separate ones. Using this strategy, we have a JOIN when choosing an entity, but now we save the third normal form, and we can also specify the NOT NULL restriction in the database.
  3. Each entity has its own table

    @ InheritanceType.TABLE_PER_CLASS

    image

    In this case, we do not have a common table. Using this strategy, we use UNION for polymorphic queries. We have problems with primary key generators and other integrity constraints. This type of display of inheritance is strictly not recommended.

Just in case, I will mention the annotation - @MappedSuperclass. It is used when you want to “hide” common fields for several entities of the object model. At the same time, the annotated class itself is not considered as a separate entity.

Problem 2. Composition Attitudes in OOP


Returning to our example, we note that in the object model we have taken the user profile into a separate entity, the Profile. But in the relational model, we did not allocate a separate table for it.

OneToOne is often a bad practice, because In the select we have an unjustified JOIN (even if we specify fetchType = LAZY in most cases we will have a JOIN - we will discuss this problem later).

To display the composition in the general table, there are annotations @Embedable and @Embeded. The first is placed above the field, and the second above the class. They are interchangeable.

Entity manager


Each instance of EntityManager (EM) defines a session to interact with the database. Within an instance of EM, there is a first-level cache. Here I will highlight the following significant points:

  1. Capture the database connection

    This is just an interesting point. Hibernate captures Connection not at the time of receiving the EM, but at the time of first accessing the database or opening a transaction (although this problem can be solved ). This is done to reduce the time of a busy connection. While receiving EM-a, the existence of a JTA transaction is checked.
  2. Persisted entities always have id
  3. Entities describing one line in a database are equivalent by reference.
    As mentioned above, in EM there is a first level cache, the objects in it are compared by reference. Accordingly, the question arises - what fields to use to override equals and hashcode? Consider the following options:

    • Use all fields. Bad idea, because equals may affect lazy fields. By the way, this is also true for the toString method.
    • Use id only. A normal idea, but there are also nuances. As most often for new entities id is put down by the generator at the moment of persist. The following situation is possible:

      Entity foo = new Entity(); //   (id = null) set.put(foo); //   hashset em.persist(foo); // persist  (id = some value) set.contains(foo) == false // .. hashCode    

    • Use a business key (roughly speaking, fields that are unique and NOT NULL). But this option is not always convenient.

      By the way, since we started talking about NOT NULL and UNIQUE, it is sometimes convenient to make a public constructor with NOT NULL arguments, and a constructor without arguments is protected.
    • Do not override equals and hashcode at all.
  4. How flush works
    Flush - executes the accumulated insert, update and delete in the database. By default, flush is executed in the following cases:

    • Before executing a query (with the exception of em.get), this is necessary to comply with the ACID principle. For example: we changed the date of birth of a storm trooper, and then we wanted to get the number of adult stormtroopers.

      If we are talking about CriteriaQuery or JPQL, then flush will be executed if the query affects the table whose entities are in the first level cache.
    • When committing a transaction;
    • Sometimes, when a new entity persists, in the case when we can get its id only through insert.

    And now a little test. How many UPDATE operations will be performed in this case?

     val spaceCraft = em.find(SpaceCraft.class, 1L); spaceCraft.setCoords(...); spaceCraft.setCompanion( findNearestSpaceCraft(spacecraft) ); 

    Under the flush operation, there is an interesting hibernate feature - it is trying to reduce the time it takes to lock rows in the database.

    Also note that there are different strategies for flush operation. For example, you can prevent “merging” changes to the database - it is called MANUAL (it also disables the dirty checking mechanism).
  5. Dirty checking

    Dirty Checking is a mechanism executed during a flush operation. His goal is to find entities that have changed and update them. To implement such a mechanism, hibernate must keep the original copy of the object (this is what the actual object will be compared with). To be precise, hibernate stores a copy of the object's fields, not the object itself.

    Here it is worth noting that if the entity graph is large, then the operation of dirty checking can be expensive. Do not forget that hibernate stores 2 copies of entities (roughly speaking).
    In order to “cheapen” this process, use the following features:

    • em.detach / em.clear - detach entities from EntityManager
    • FlushMode = MANUAL- useful for reading operations.
    • Immutable - also avoids dirty checking operations

  6. Transactions

    As you know, hibernate only allows entities to be updated within a transaction. More freedom is offered by read operations - we can perform them without explicitly opening a transaction. But this is precisely the question, is it worth it to open the transaction explicitly for reading operations?

    I will cite a few facts:

    • Any statement is executed in the database inside the transaction. Even if we obviously did not open it. (auto-commit mode).
    • As a rule, we are not limited to a single query to the database. For example: to get the first 10 records, you probably want to return the total number of records. And this is almost always 2 requests.
    • If we are talking about spring data, then the repository methods are transactional by default , with the read methods being read-only.
    • The @Transactional spring annotation (readOnly = true) also affects FlushMode, more precisely, Spring translates it into MANUAL status, thus the hibernate will not perform dirty-checking.
    • Synthetic tests with one or two queries to the database will show that auto-commit is faster. But in combat mode this may not be the case. ( great article on this topic , + see comments)

    If in a nutshell: it is good practice to perform any communication with the database in a transaction.

Generators


Generators are needed to describe how the primary keys of our entities will get values. Let's quickly go over the options:


Let's talk a little more about the sequence. In order to increase the speed of hibernate, it uses different optimizing algorithms. All of them are aimed at reducing the number of communication with the database (the number of round-trip-s). Let's look at them in a little more detail:


Now let's see how the optimizer is selected. Hibernate has several sequence generators. We will be interested in 2 of them:


You can also customize the generator with the @GenericGenerator annotation.

Deadlock


Let's take an example of a pseudo-code situation that can lead to a deadlock:

 Thread #1: update entity(id = 3) update entity(id = 2) update entity(id = 1) Thread #2: update entity(id = 1) update entity(id = 2) update entity(id = 3) 

To prevent such problems, hibernate has a mechanism that allows you to avoid this type of deadlock - the hibernate.order_updates parameter. In this case, all updates will be ordered by id and executed. I also mention once again that hibernate tries to “delay” the capture of the connection and the execution of the insert and update.

Set, Bag, List


In hibernate, there are 3 main ways to present the OneToMany connection collection.


For Bag in java core there is no class that would describe such a structure. Therefore, all the List and Collection are bag if no column is specified by which our collection will be sorted (Abstract OrderColumn. Not to be confused with SortBy). I highly recommend not using OrderColumn annotation due to bad (in my opinion) implementation of the feature - not optimal sql queries, the possible presence of NULLs in the sheet.

The question arises, and what is still better to use the bag or set? To begin with, the following problems are possible when using the bag:


In the case when you want to add another entity to the @OneToMany connection, it is more advantageous to use Bag, since it does not require loading all the related entities for this operation. Let's see an example:

 //  bag spaceCraft.getCrew().add( luke ); //       //  set spaceCraft.getCrew().put( luke ); //      //        .     ManyToOne   : luke.setCurrentSpaceCraft( spaceCraft ); 

Strength References


Reference is a link to the object, the download of which we decided to postpone. In the case of a ManyToOne relationship with fetchType = LAZY, we get this reference. The object is initialized at the moment of accessing the fields of the entity, with the exception of id (because we know the value of this field).

It is worth noting that in the case of Lazy Loading, the reference always refers to an existing string in the database. It is for this reason that most of the cases of Lazy Loading in OneToOne relationships do not work - hibernate needs to be JOIN to check for the existence of a link and the JOIN already exists, then hibernate loads it into the object model. If we specify nullable = true links in OneToOne, then LazyLoad should work.

We can create a reference ourselves using the em.getReference method. However, in this case there is no guarantee that reference refers to an existing string in the database.

Let's give an example of using such a link:

 //  bag spaceCraft.getCrew().add( em.getReference( User.class, 1L ) ); //      ,      

Just in case, let me remind you that we will get a LazyInitializationException in the case of a closed EM or a detached link.

date and time


Despite the fact that java 8 has an excellent API for working with date and time, the JDBC API still allows you to work only with the old date API. Therefore, we analyze some interesting points.

First, you need to clearly understand the differences between LocalDateTime from Instant and ZonedDateTime. (I will not stretch, but I will give excellent articles on this topic: the first and second )

In short
LocalDateTime and LocalDate represent a regular tuple of numbers. They are not tied to a specific time. Those. the landing time of the aircraft cannot be stored in LocalDateTime. And the date of birth through LocalDate is quite normal. Instant also represents a point in time relative to which we can get local time at any point on the planet.

A more interesting and important point is how dates are stored in the database. If we have TIMESTAMP WITH TIMEZONE type, then there should be no problems, if TIMESTAMP (WITHOUT TIMEZONE) costs then there is a possibility that the date will be written / read incorrectly. (except for LocalDate and LocalDateTime)

Let's see why:

When we save a date, the method is used with the following signature:

 setTimestamp(int i, Timestamp t, java.util.Calendar cal) 

As you can see, the old API is used here. The optional Calendar argument is needed to convert the timestamp to a string representation. That is, he keeps a timezone. If Calendar is not transmitted, then the default Calendar is used with the JVM timezone.

This problem can be solved in 3 ways:


An interesting question is why LocalDate and LocalDateTime do not fall under this problem?

Answer
To answer this question, you need to understand the structure of the java.util.Date class (java.sql.Date and java.sql.Timestamp its heirs and their differences in this case do not bother us). Date stores the date in milliseconds since 1970 roughly speaking in UTC, but the toString method converts the date according to the system timeZone.

Accordingly, when we get a date from the database without a timezone, it is displayed in a Timestamp object, so that the toString method displays its desired value. At the same time, the number of milliseconds since 1970 may differ (depending on the time zone). That is why only local time is always displayed correctly.

I also give an example of the code responsible for converting the Timesamp to LocalDateTime and Instant:

 // LocalDateTime LocalDateTime.ofInstant( ts.toInstant(), ZoneId.systemDefault() ); // Instant ts.toInstant(); 


Batching


By default, requests are sent to the database one by one. When batching is enabled, hibernate will be able to send several statements to the database in one request. (i.e. batching reduces the number of round-trip to the database)

For this you need:


I also remind you about the efficiency of the em.clear () operation - it unties the entities from the em, thereby freeing up the memory and reducing the time for the dirty checking operation.
If we use postgres, then we can also tell hibernate to use a multi-raw insert .

N + 1 problem


This is quite a topic, so let's go over it quickly.

N + 1 problem - this is a situation where instead of one request for choosing N books, at least N + 1 request occurs.

The easiest way to solve the N + 1 problem is to do fetch related tables. In this case, we may have several other problems:


There are other ways to solve N + 1 problems.


Testing


Ideally, the development environment should provide as much useful information as possible about the work of hibernate and about interaction with the database. Namely:


From useful utilities you can highlight the following:

But I repeat once again that this is only for development, you should not include this in production.

Literature


Source: https://habr.com/ru/post/416851/


All Articles