
Imagine an Oracle DBA. He is already over thirty, he is a bit plump, wears a vest, he has a secret access to all databases hanging on his neck, and in the summary half the page of certificates he passed. Saturday. Big release day. The climax. Time to roll changes to the database. He dials sqlplus, presses ENTER and across the black screen somewhere up into the void, kilometers of SQL commands rush. Just like in star wars. After five minutes, everything is ready. An hour later, the release is complete. The work is done, the day was a success. Now you can and a couple of beers.
It is quite another thing - Monday. It turns out that some commands were not fulfilled due to errors, which, however, did not stop the script in its unrestrained pursuit of a black void. And without that difficult task to understand what is broken, complicated by some pressure from the leadership. In general, Monday did not ask.
Of course, this is a made-up story. This has never happened to anyone. At least, it didn’t happen if work on changing the database schema was organized through migrations.
What is a database migration tool?
The idea of managing database schema changes through migrations is extremely simple:
- Each change is recorded as a separate migration file.
- Migration file includes both direct and reverse change.
- The application of migrations to the database is carried out by a special utility.
The simplest example of migration is:
This approach provides many advantages compared to organizing changes in a general SQL file. The mere absence of merge conflicts is worth something.
It is all the more surprising that the approach itself gained popularity relatively recently. It seems that the overall familiarity with the approach was brought by the Ruby on Rails framework, in which the migration tool was built in initially, this was the end of 2005. A little earlier, Martin Fowler, 2003,
wrote about the approach. Probably, the point is that development began to actively adapt the use of the version control system only at the beginning of this century. Back in 2000, the first item of
the Joel Spolsky test sounded
“Do you use source control?” - this suggests that not all of them used the version control systems. But we digress.
Eight years with MyBatis Migrations
At
Wrike, we started using the database change approach through migrations in 2010, March 29, at half past twelve. Since then, we have implemented 1440 migrations, containing 6436 direct changes and 5015 reverse. In general, we have gained some experience using the
MyBatis Migrations tool in conjunction with
PostgreSQL .
In short, we have never regretted. If you are not using Migrations or something similar, it's time to start. Yes,
Pentium 4 is also outdated.
But it is boring to talk about the merits of something, let's go straight to the difficulties.
The specifics of working with PostgreSQL
There are no difficulties with writing migrations for Postgres, except perhaps two:
- You cannot create indexes,
- Cannot add NOT NULL columns.
No, actually it is possible, just not in a very obvious way. When creating an index, you should always specify
CREATE INDEX CONCURRENTLY , otherwise you will break production, since Postgres will block the table during the creation of the index, and this can be quite long. Of course, the developers forget about this once, you always have to have this subtlety in mind. Here you could write a test. But this is only a slight inconvenience.
Creating NOT NULL columns is trickier, here you need to make a change in four steps:
- Create a null column (Postgres is free).
- Set the column DEFAULT value.
- In a loop, in portions, update the NULL values in a DEFAULT.
- Set SET NOT NULL.
The biggest catch here is in the third paragraph. NULL values need to be updated in portions, since
UPDATE some_table SET some_column='' WHERE some_column IS NULL
; locks the table, as in the case of the index, with the same consequences. And Migrations can only execute SQL commands, so such scripts have to be rolled into production by hand. The pleasure is below average. Now, if it were possible to write a cycle in Migrations, there would be no problem. Perhaps this is implementable via
hooks .
Creating a
UNIQUE
index and changing the
PRIMARY KEY
also requires some skill, but these operations are relatively rare to make.
Cluster specifics
The database migration management tool is good as long as you have one database. All the fun, if you have several bases. Especially if you have several types of databases, each of which has several instances.
As a result, after
git pull
developer must roll the changes to the first instance of the first database, then to the second instance, then to the first instance of the second database, and so on - such a principle. Here it is time to write the utility to manage the database migration management utility. Total automation.
Role Juggling
It's not a secret that roles as entities do not live at the level of a separate database, but at the level of the entire database server, at least in Postgres. In this case, it may be necessary to specify
REVOKE INSERT ON some_table FROM some_role
; It is still possible to expect that roles will be pre-configured in production, but for dev or staging this is already difficult. At the same time, in development, of course, all databases exist on the same local server, so you simply cannot write a
CREATE ROLE
migration, and
IF NOT EXISTS
not supported. Everything is solved simply:
DO $$ BEGIN IF NOT EXISTS (SELECT * FROM pg_roles WHERE rolname = 'some_role') THEN CREATE ROLE "some_role" NOLOGIN; END IF; END; $$;
Look! I catch them and throw them up, catch and throw them, it's so easy.
A bit of development reality
Developers make mistakes, and even in SQL migrations, this happens. Usually, errors can be noticed on the review, but sometimes it is unusual. If we talk about direct changes, there are still shoals before production still do not reach - too many verification steps. But with reverse changes, incidents can occur. To avoid errors in UNDO migration, when testing migration, you need to perform not just
./migrate up
, but
./migrate up
, then
./migrate down
, then again
./migrate up
. There is nothing difficult in this; you only need to ensure that forty developers always do this. In a good way, the utility could perform such a combo for the developer’s environment automatically.
Test environments
If the test environment is short lived: let's say you create a container, initialize the database and run integration tests, there should be no problems. Call
./migrate bootstrap
, then
./migrate up
, and you
./migrate up
done. That's only when the number of migrations exceeds 1000, this process can be delayed. It's a shame when the database is initialized longer than tests run. We have to dodge.
Long-lived environments are still more difficult. QA, you know, do not like, coming to work, to see an immaculately clean database. I do not know why, but fact is fact. So the state of the databases used in manual testing has to be maintained in integrity. And it is not always easy.
The subtlety is that if the migration is applied to the database, the migration identifier is written to it. And if the migration code was later changed, the base will not be affected. If the changes are not critical, the code can successfully reach production. Rassinh. Of course, this is a disgrace. The first principle of working with migrations is to never change written migrations, but always create new ones. But sometimes you want to do such a thing — I’ll change a little here, nothing will break, because it’s true. Of course! Roll around!
If migrations were signed after the review, it would be forbidden to apply drafts to staging. And still it would be possible to save in the
changelog
not only the identifier of the migration, but also the
checksum
— also useful.
Return as it was
Especially a cunning turn happens when a task is canceled: they did it, they did it and they changed their mind. Quite a normal situation. Once the code is no longer needed, the branch should be removed. And there was a migration in the same place ... and she was already in staging ... well, ... oops. A good reason to check whether you can restore the backup repository. Although recall that there was probably easier.
At the same time, migration is text. And it would be possible to save this text there, in the
changelog
. Then, if the migration from the code is gone, it does not matter for what reasons, it could always be rolled back. And even automatically.
Do UNDO again
The UNDO section is definitely needed. But why write it? Of course, there are some intricate cases, but most of the changes are
CREATE TABLE
or
ADD COLUMN
or
CREATE INDEX
. For them, the utility could generate inverse operations automatically, right on the SQL code. Of course, there is a specificity.
CREATE TABLE ${name}
- this is such a special command, suddenly non-standard. And in order to generate
DROP TABLE ${name}
, you need to be able to parse the expression right up to the third word. Although, in general, this is a fully realizable technical problem. Could be out of the box.
Conclusion
Of course, I'm carping. MyBatis Migrations was conceived as a simple and versatile utility, minimally tied to the specifics of databases. And she is more than justified. But it seems that a few small improvements would make it much better, especially when used in the long run.
-
Dmitry Mamonov / Wrike