In the wake of the meetup "New PostgreSQL 11 Features"

Today we will talk about the most important features of PostgreSQL 11. Why only about them - because not everyone needs some of the features, so we stopped at the most popular ones.

Content




JIT compilation


In PostgreSQL, JIT compilation has finally appeared, that is, compiling requests into binary code. To do this, you need to compile PostgreSQL with support for JIT compilation (Compile time 1 (--with-llvm)) . At the same time, the machine must have an LLVM version not lower than 3.9.

What can JIT accelerate?


Compile time 2 means that JIT is not used. In PostgreSQL, there is a moment of scheduling a request when the system decides what is worth JIT'it and what is not worth it. At this point, it JIT'and further executor performs as it is.

JIT is made pluggable. By default it works with LLVM, but you can plug in any other JIT.



If you compiled PostgreSQL without JIT support, the very first setting does not work. Implemented options for developers, there are settings for individual JIT functions.

The next subtle point is associated with jit_above_cost. JIT itself is not free. Therefore, PostgreSQL by default deals with JIT optimization if the cost of the request exceeds 100 thousand conditional parrots, in which explain, analyze and so on is measured. This value is chosen at random, so pay attention to it.

But not always after turning on the JIT immediately everything works. Usually, everyone starts experimenting with JIT using the select * from table where id = 600 query and it doesn't work. Probably, it is necessary to somehow complicate the request, and then everyone generates a giant base and compose the request. As a result, PostgreSQL rests on the capacity of the disk, it lacks the capacity of common buffers or caches.

Here is a completely abstract example. There are 9 null fields with different frequencies so that you can notice the effect of tuple deforming.

select i as x1,
case when i % 2 = 0 then i else null end as x2,
case when i % 3 = 0 then i else null end as x3,
case when i % 4 = 0 then i else null end as x4,
case when i % 5 = 0 then i else null end as x5,
case when i % 6 = 0 then i else null end as x6,
case when i % 7 = 0 then i else null end as x7,
case when i % 8 = 0 then i else null end as x8,
case when i % 9 = 0 then i else null end as x9
into t
from generate_series(0, 10000000) i;

vacuum t;
analyze t;


PostgreSQL has a lot of opportunities, and to see the benefits of JIT, turn off the first two lines, so as not to interfere, and reset thresholds.

set max_parallel_workers=0;
set max_parallel_workers_per_gather=0;
set jit_above_cost=0;
set jit_inline_above_cost=0;
set jit_optimize_above_cost=0;


Here is the request itself:

set jit=off;
explain analyze
select count(*) from t where
sqrt(pow(x9, 2) + pow(x8,2)) < 10000;

set jit=on;
explain analyze
select count(*) from t where
sqrt(pow(x9, 2) + pow(x8,2)) < 10000;


And here is the result:

Planning Time: 0.71 ms
Execution Time: 1986.323 ms

VS

Planning Time: 0.060 ms
JIT:
Functions: 4
Generation Time: 0.911 ms
Inlining: true
Inlining Time: 23.876 ms
Optimization: true
Optimization Time: 41.399 ms
Emission Time: 21.856 ms
Execution Time: 949.112 ms


JIT helped speed up the request in half. Planning time is about the same thing, but this is most likely a consequence of the fact that PostgreSQL has cached something, so do not pay attention to it.

If you add up, then the JIT compilation took about 80 ms. Why is JIT not free? Before you run the query, you need to compile it, and this also takes time. And three orders of magnitude more than planning. Not a cheap pleasure, but it pays off due to the duration of the performance.

This is how JIT can be used, although it does not always benefit.

Sectioning


If you paid attention to partitioning in PostgreSQL, then you probably noticed that it was made there for a tick. The situation improved somewhat in version 10, when a declarative declaration of partitions (sections) appeared. On the other hand, everything inside remained the same and worked about the same as in previous versions, that is, it was bad.
In many ways, this problem was solved by the pg_pathman module, which allowed working with sections and rather optimally cutting them off at run time.

In version 11, partitioning is significantly improved:


The big question is: how many sections can you have? Honestly, with a large number of sections (thousands and tens of thousands), the feature works poorly. Pg_pathman does better.

Also made sections by default. Again, pg_pathman can be used to automatically create sections, which is more convenient. Here in the section everything that could not be shoved is dumped. If in a real system to do this by default, then after some time you get such a mess, which you then rake.

PostgreSQL 11 can now optimize partitioning if two tables are joined by a partitioning key, and partitioning schemes are the same. This is controlled by a special parameter that is turned off by default.

You can calculate the units for each section separately, and then summarize. Finally, you can create an index on the parent partitioned table, and then local indexes will be created on all the tables that are connected to it.

In the “What's New” section, a wonderful thing is mentioned - the ability to drop sections when executing a query. Let's check how it works. It turned out this table:



We make a type and a table of two columns with a primary key, with a bigserial column, insert data. Create a second table that will be partitioned and will be a copy of the first. Add the primary key to the partitioned table.



The table will consist of two types of records: “nanny women” and “male drivers”. And there will be one female driver. We make two sections, divide by the list, add the primary key and insert all the data from the table in which all this is generated. The result was completely uninteresting:



Pay attention to the request. We select everything from a non-partitioned table, connect it to a partitioned table. Take a small piece and choose only one type, they go through one. We indicate that the column ac should have one value. It turns out a sample of solid drivers.

During execution, we specifically disable paralleling, because by default PostgreSQL 11 very actively parallels more or less complex queries. If we look at the execution plan (explain analyze), it can be seen that the system has put the data in both sections: in the nanny, and in the drivers, although the nanny was not there. There were no calls to the buffer. Time spent, condition used, although PostgreSQL could figure it all out. That is, the ad partition elimination immediately does not work. Perhaps in the following assemblies this will be corrected. In this case, the pg_pathman module in this case works without problems.

Indices



Covering indexes


This functionality was implemented by PostgresPro three years ago, and all this time PostgreSQL tried to add it. The covering indexes mean that you can add additional columns to the unique index, directly into the index tuple.

What for? Everyone loves the index-only scan for fast work. To do this, build conditionally "covering" indices:



But it is necessary to preserve the uniqueness. Therefore, two indices are built, narrow and wide.
The disadvantage is that when you apply vacuum, insert, or update to the table, you need to update both indices without fail. So inserting into the index is a slow operation. And the covering index will allow to manage only one index.

True, he has some limitations. More precisely, the benefits that may not be immediately clear. The columns c and d in the first create index do not have to be scalar types for which the b-tree index is defined. That is, they do not necessarily have a more-less comparison. These can be points or polygons. The only thing is that a tuple must be less than 2.7 Kb, because there is no toasting in the index, but there you can fit what is impossible to compare.

However, inside the index with these guaranteed covered columns, no computation is performed during the search. This should be done by the filter that stands above the index. On the one hand, why not calculate it inside the index, on the other hand, it is an extra function call. But everything is not as scary as it seems.

Well, in addition, you can add these covered columns to the primary key.

SP GiST


This index is used by very few people because it is quite specific. Nevertheless, the opportunity to store in it is not exactly what they inserted. This refers to lossy - index, compression. As an example, take the polygons. Instead, an index is placed in the bounding box, that is, the minimum rectangle that contains the desired polygon. In this case, we represent a rectangle as a point in four-dimensional space, and then we work with the classic quad3, in four-dimensional space.

Also for SP-GiST introduced the operation "prefix search". It returns true if one line is prefixed by another. They introduced this for a reason, but for the sake of such a request with support for SP-GiST.

SELECT * FROM table WHERE c ^@ „abc“

In b-tree there is a limit of 2.7 Kb for the entire line, while SP-GiST does not have such a limit. True, PostgreSQL has a limitation: one value cannot exceed 1 GB.

Performance



Now, using tweaking, you can change this behavior for the entire database or for a separate table, so that such small overruns do not require the use of toast. But you have to understand what you are doing, without this nothing happens.

WAL



Backup and Replication



For DBA



Parallel execution



:

alter table usr reset (parallel_workers)
create index on usr(lower((so).occ)) — 2
alter table usr set (parallel_workers=2)
create index on usr(upper((so).occ)) — 1.8


parallel worker. . 16 4 ( ) 2 ., — 1,8 . , , . , .

:

explain analyze
select u1.* from usr u, usr1 u1 where
u.id=u1.id+0


, . , user — , . . , , .

, PostgreSQL 11 .



1425 , 1,5 . 1,4 . 2 . , 9.6 : 1 — 1 ., 2 1 . , 10 tuple. 11 . : user, batch, x-scan append .

:



. 211 , 702 . , 510 1473. , 2 .

parallel hash join. . — 4. , .

parallel index scan . batch . What does this mean? hash join, . user . , parallel hash, .

1 . , OLAP-, OLTP . OLTP , .


.


Window-


SQL:2011, .


, , . , , , , , .

websearch, . , . , .

# select websearch_to_tsquery('dog or cat');
----------------------
'dor' | 'cat'
# select websearch_to_tsquery('dog -cat');
----------------------
'dor' & !'cat'
# select websearch_to_tsquery('or cat');
----------------------
'cat'


— dog or cat — . Websearch . | , . “or cat”. , . websearch “or” . , -, .

Websearch — . : , . , .

Json(b)


10- , 11- . json json(b), tsvector. ( json(b)) - . , , , bull, numeric, string, . .

# select jsonb_to_tsvector
('{"a":"texts", "b":12}', '"string"');
-------------------
'text':1
# select jsonb_to_tsvector
('{"a":"texts", "b":12}', '["string", "numeric"]');
-------------------
'12':3 'text':1


json(b), . , , , .

PL/*


.

CREATE PROCEDURE transaction_test1()
LANGUAGE plpgsql
AS $$
BEGIN
FOR i IN 0..9 LOOP
INSERT INTO test1 (a) VALUES (i);
IF i % 2 = 0 THEN
COMMIT;
ELSE
ROLLBACK;
END IF;
END LOOP;
END
$$;
CALL transaction_test1();


call, , . . . select, insert .

, , PostgreSQL . Perl, Python, TL PL/pgSQL. Perl sp begin, .

PL/pgSQL : , .

pgbench


pgbench ICSB bench — , , . if, , . case, - . --init-steps , , .

random-seed. zipfian- . / — , . - , , - , .

, , - .

PSQL


, PSQL, . exit quit.

Source: https://habr.com/ru/post/414031/


All Articles