In the wake of the new PostgreSQL 11 featuresup (part 2)

In the first part, we talked about the major innovations and changes in PostgreSQL 11. This time we will discuss in more detail some of the points in question / answer format, which were raised at the meetup.

What is the best way to transfer a large array of data as a set of input parameters for a stored procedure in PL / pgSQL?


The most convenient way is to create a temporary table, make copies of the data there, and then use it in the procedure.

External engines (zheap) and the development of in-memory PostgreSQL


Not all workloads are suitable for the model with the storage of old versions of records in the table itself. In all other subd (versioned), they are stored in the undo-journal. You can argue about expediency, but the bottom line is that you need to store old records somewhere. If their lifespan is short and rarely is addressed, then storing in the table itself is harmful. The external zheap PostgreSQL engine is an EnterpriseDB attempt to make a table engine for PostgreSQL with undo log. It works, although there is still something to improve.

Who works with Ms. SQL in SNAPSHOT Isolation Level mode, knows that it has tempdb, where it puts the old versions, and is equipped with quite an adult vacuum for cleaning tempdb. On the other hand, the community asks to create in-memory tables in PostgreSQL. This can be done quite easily: tmpfs, that's all. In PostgreSQL Pro even released the first pilot version, you can try.

What PostgreSQL never did was plug-in engines. There were pluggable indexes that shared a common WAL. In PostgreSQL a lot of things can be connected and there is little that can be changed on the fly. For example, executor is not disabled, but it is already possible to use custom nodes that you yourself will program. Optimizers in PostgreSQL are fully connected. You can write your own and use PostgreSQL as an interpreter of your requests. SQL parser cannot be connected.

The engines want to be connected in three directions:


Postgres Pro is negotiating with EnterpriseDB how to make an API for connecting all of this.

About foreign key


Foreign key inside PostgreSQL is implemented by triggers. You can write your trigger that will implement any kind of functionality. All possible restrictions must be made in the trigger. The logic in the triggers is not particularly necessary, but it is necessary to check everything.

Does Postgres Pro plan to do SaaS or PaaS?


Postgres Pro plans to make PostgreSQL more optimized for the cloud, in particular, to implement dynamic changes of the shares buffers, reduce the number of parameters that require a restart of PostgreSQL. They themselves are not going to build a cloud.

How do I set up a disk to make parallel indexing faster? What is better, some HDD or one SSD?


Better a few SSDs. The more opportunities for parallelization equipment provides, the better. If you have one disk, little memory and one processor, then parallelization will not help you. But SSD has a feature: they start to slow down if more than 80% of the volume is occupied. So do not forget to adjust the trim, otherwise the limit of 80% will come somewhere in the 50%.

Manage dictionary and add words with full-text search


If you use a spell or snowball, it is enough to change the dictionary of stop words. The trouble is that if you add a stop word, it makes no sense to index. This can be done slowly. The stop word will be thrown out of the query and never searched. And if you remove the stop word, it is not found anywhere in the collection and you need to reindex. The problem is not in the dictionary, but in the fact that you have already used it and have saved the knowledge.

Also in many cases, you can use the little-known function ts_rewrite, which allows you to replace a piece of the query to another query. For example, when the submarine Kursk sank, everyone rushed to look for information about it. Fyodor Sigaev at that time worked in a Rambler, and at the request of Kursk information about the city was given out. They quickly made a substitution: for this word to issue information about the submarine. But then users who were interested in the town itself began to curse. I don’t know if they realized or not, but it was necessary to introduce the “city of Kursk”. Such changes and allows you to do ts_rewrite. In addition, the function can be used for a smooth transition to the period of change of dictionaries.

Of course, changing the parser and dictionaries is a difficult task. Languages ​​with different alphabets, like Russian and English, get along well. Much worse now, for example, French-English texts are being indexed. Sometimes it is not clear to which language a word belongs, which is spelled the same, but in one language it is a stop word, but not in another. Postgres Pro is currently working on tweaking dictionaries, which will allow us to describe more complex configurations.

Cover indices and hot update


Completely friendly. However, if at least one field is updated in the covering index, then the index will behave as usual, everything will be replaced.

The inability to create temporary tables when performing queries on standby


PostgreSQL stores knowledge about tables not in the system directory, but there is a patch transferring knowledge to the system directory. Therefore, you can use temporary tables with this patch. But then another problem arises: there are no transactions on stand by. To work with a temporary table, you will have to use double virtual transaction id, which refers only to temporary tables, and not to the main tables that come from the wizard. And when you look at the 32-bit number, it will be two different numbers.

Also in Postgres Pro is the pg_variables module, which also works on stand by. This is not exactly a temporary table, but you can portray the necessary functionality.

Implementing a clustered index


Postgres Pro has had several attempts at implementing it. Now you can enter cluster table index, and the table will be in the same order. Tortured with how to maintain the table in a clustered state. Tried different approaches, but invariably inserting into such a table was very expensive. And this is not interesting to anyone. Therefore, for the time being it is concluded that it is necessary to move precisely towards Index Organized Tables.

Recommended autovacuum scale factor


It is usually recommended to set 1 - 5%. But it is completely optional. For small tables, which, despite the changes, on average remain the same distribution, you can set a large value. If the table is large and rarely updated, but aptly, with a strong change in distribution, you will have to invent something else. It all depends on the distribution of your data.

Hints in complex queries


In Oracle, with complex queries, you have to periodically help hints, because sudden full scan occurs. In Postgres Pro hints are quite capricious, but you can start them. However, in the usual PostgreSQL hints are not, and they are unlikely to appear. If you have built-in hints, then users, faced with an optimizer problem, insert hints, calm down and do not report a problem. The development of the optimizer stops.

By the way, the PostgreSQL optimizer has a problem. When he evaluates a sample from a table, even for a more or less reasonable amount, he guesses with some error. Then it starts to connect, the result is connected to something else, the error accumulates, and at the third or fourth level PostgreSQL misses a lot.

There is such a setting - join collapse limit. PostgreSQL sorts the JOIN for more efficient use, but the default sorting limit is 8. If the JOIN in a row is greater than 8, the system will not sort them and a dependence on the order of the JOIN in the query will arise.

There is also a genetic optimizer with various parameters. You can include various settings in a session and more or less describe how the request should be executed. Using this order, with the help of brackets you can set the shutdown of some operations, the same sec scan. Another option is to insert certain parameters into the functions. In a sense, it is also hints. Not very convenient, but at least something.

Source: https://habr.com/ru/post/416187/


All Articles