Release Management on GIS utilities - share experiences and struggle with intuition

Why being late for a flight and not flying is not always bad? Who is to blame for being late for the docking? Why come to the airport in advance? Can the A380 fly to Astrakhan? Why does intuition not always work? Surprises happen - never happened and here again? Why do passengers clap the pilot after landing?

Suppose you are developing a state-wide information system (GIS) on a national scale. The project team (analysts, developers, testers, support services, infrastructure services, etc.) is more than a hundred people. The system was introduced into pilot or commercial operation. Thousands of organizations have integrated with your system and started working with it, an even larger number are planning integration. Tens of thousands of organizations operate through a Web-based interface. The system for citizens is placed useful information, and also provides interesting features. The customer and / or users require new modifications. Millions of people across the country are registered and use the system. From the outside world come gifts in the form of changes in oil prices, sanctions, restrictions, etc.

Submitted? So, exactly such a project at the moment is a project of GIS housing and communal services, which we started to talk about earlier and now we want to continue.

A source

The first experiments of the Wright brothers


Most likely, if you work in a small company and develop “small projects”, then you will not have many release processes as such. The scheme of work looks like this. You just pretend that it would be nice to release some features in 4-5 months. Then you write the other month's productions, develop, develop, then try to stabilize the whole economy in order to release a new version of the software. Of course, by the time of release it turns out that some improvements cannot be stabilized at all, flaws in productions are revealed somewhere, because the performances have been made for a long time and now something has changed, somewhere it turned out that they have subscribed to unreal functionality, etc. Nevertheless, this approach works quite well, but, as a rule, as long as the project team is small and the intensity of the changes is small (the system was not put into operation or piloted by some users, etc.). In principle, without processes, you can scale up to fairly large projects - and up to 20 people and up to 40 people - it all depends on the coolness of individuals and their dedication. Surely, many are familiar with the situation when the project team, consisting of tough and desperate specialists, in cases where the deadlines are already burning and almost everything is gone is amicably tense and ... “on their shoulders”, “on moral and volitional” through the mountains of boxes from under the eaten pizza eventually pulled version into operation.

Wilber and Orville Wright, who went down in history as the Wright brothers, were the first in the world to build a plane and make a flight on it. A source

Believe me, we also went through this (that's why we love all kinds of madnesses like Hero Races , marathons , etc.). But at some point we realized that everything has a limit and that when implementing systems of scale GIS utilities without a distinct release process, we are guaranteed to receive three main unpleasant moments:


The experience of LANIT in the development of GIS utilities tells us that one of the most important moments for the transition to “industrial rails” in the implementation of large projects is the qualitative construction of the release process. This, of course, is not enough for complete happiness, but the release process underlies everything, and without it you are guaranteed everything will be much worse than it could be.

In this article we describe the two main, in our opinion, practices that underlie an effective release process in IT systems scale GIS utilities:


On the one hand, these practices are quite well known and are used in many methodologies, as well as recorded in Agile Manifesto . However, they largely contradict intuition, require certain qualifications and skills, and therefore are difficult to justify and meet with misunderstandings both from the Customer and within the project team. To incorporate them into corporate standards, we need a very good understanding of “what we want to achieve” and “why the introduction of these practices will lead to the desired result”. Below we consider with examples these practices and the main problems in their implementation.

When we analyzed our internal processes related to release management, an association with the work of a modern airline appeared in my head.

Regular flights


The airline made an analysis of the market, predicted which routes would have a large passenger traffic and would become profitable for the next season, agreed with whom you need, obtained the rights to the necessary routes, launched regular flights. Flight schedules are known long, usually in six months or a year. Who exactly will fly, the airline, of course, is unknown - it focuses on the average projected occupancy. Further, the airline can conclude long-term contracts with airports, think over which aircraft are most profitable to use for flights, adjust their internal procedures, etc. This all contributes to cost optimization.

A source

If an employee of the company is required to participate in an important meeting at a certain time, and then fly to another city by plane, then he simply selects for himself flights taking into account the risks that the meeting may be delayed and the traffic situation.

Regular flight connections have their drawbacks. If the employee bought a ticket for the flight, but did not have time, then the plane did not wait for him. Unpleasant? Yes. It may be that an employee needs to fly away today, and there are flights only tomorrow or only a week later. Not very good? Yes. But the big plus is that you can plan your trip in six months or a year and be sure that the flight will take place. You know exactly when you will arrive, and you can tell this time to your loved ones who may meet you, or you can fly with a transfer and not worry that you will not have time to dock. Well, in general, think about it, a huge, heavy iron plane, which for 90% of humanity is incomprehensible at all how it flies, very quickly the devil knows where and cost you almost like a day or two on a train.

Let's return to the project. In GIS utilities uses a regular supply of functionality. The LANIT team makes a schedule of releases for 1 year, where it fixes release dates so that releases are about 1 time per month. Since for the year it can still change a hundred times (about this below), then at the time of drawing up the schedule we still do not know exactly what specific improvements will be implemented and put into operation.

The planning takes into account the schedule of maintenance work required for the infrastructure / maintenance service. It is necessary that global and heterogeneous changes do not overlap each other - this is how risks are reduced and it is easier to keep everything under control. For example, installing a new version of the application software and simultaneously upgrading the version of the DBMS or updating the firmware of the storage controllers is a very bad idea.

The principal point is that the release dates do not change afterwards. If the version is scheduled for some date, then we have to break into a cake, but meet the planned deadline.

Then we will tell why and why.

The fact that the conditions for each release are approximately the same (duration, command, specifics of the tasks being implemented, etc.) allows planning and debugging all the procedures for preparation and release. People who are not immersed in production processes may not understand the complexity of the release version on such a large project as GIS utilities. In order for a version to be released, you need to do a lot of operations, such as regression testing (maybe several times), load testing, testing deployment scripts and data migration, analyze performance statistics (SQL queries, services, etc.) . The situation is aggravated by the fact that these operations can be long, for example, the deployment of a version on a test bench can take a day. If something goes wrong, you can easily get out of the schedule. Therefore, if you do not have a schedule with regular and approximately equal in duration cycles, then I guarantee that each issue will be for you 146% a much more serious test.

The schedule is also needed in order to determine deadlines for the correction of defects and notify them to users. We, as a rule, correct most of the defects in the current version. However, some of the defects may take more time, or they may occur at the end of the release cycle, so they are transferred to the next version. If for any reason (see below) we begin to shift versions, then users will automatically receive corrections later, which is not good.

The schedule of release of versions is also needed for planning the release into operation of modifications that have a clear deadline. The production team understands exactly when the revision should be released into operation and selects the desired release, within which it is released. If the release date shifts, this may result in the release being released late (see below for the implications of a version shift due to an important task)

As well as regular transport communication, it is the basis of the global transport system, the release process based on the regular delivery of functionality is the basis for the effective development of the GIS utilities scale system. If this process is adjusted, then it is already possible to “string” plans for the implementation and delivery of functionality on it.

Unfortunately, the way of introducing such seemingly useful practice from all sides is difficult and thorny. The following are the main pitfalls that a team can fall into and that should be monitored and stopped.

Boarding after registration


Airlines have a well-established procedure for check-in, which in particular includes the rule that a passenger must check-in for a flight no later than 40 minutes before departure. Why do we need these 40 minutes? They are needed so that there is enough time to check the baggage, load it onto the aircraft, so that, based on the weight of the baggage and the number of registered passengers, it is possible to calculate the necessary fuel, refuel the aircraft, etc. In addition, this time is also needed so that passengers have enough time to find the desired exit / terminal. It is clear that something may go wrong with the cargo or the passenger (the passenger got lost at the airport or something happened with the luggage) and even those 40 minutes are not enough. But nevertheless, the end of check-in time is a compromise developed over many years between the waste of passenger time and the risk of emergency situations.

If a passenger likes to arrive at the airport close to the end of check-in, then this simply means that he agrees with the increased risk that something will happen and he will not have time to board the plane. If the airline goes to meet such people, it will lead to the fact that it will increase its risks. She may have to pay for a special flight of the bus to the plane only for this passenger. It is possible that in the rush at check-in at the last moment the airport staff will make a mistake, and the luggage will fly to the wrong city.

A source

When releasing releases, one of the most important moments is the criteria that the refinement should meet, and the deadline for when this should be done (by analogy with the end of check-in). Release of the version every month means that if some task is not ready for this deadline in the current version of the software, then it is transferred to the next release in a month. Often you can put up with it, but it is especially hard to do it when the task is very important, but at the same time it is only a few days late for a week. There is a very big temptation to break the deadline and include a task that is not yet ready in the release and try to press it.

Why is that bad?

First, we must courageously admit and accept the fact that all “pressing”, “and suddenly we will be in time” is most likely a puncture in production management. This means that various measures were not taken in advance and the situation was brought to a critical. Now the situation will be corrected at the expense of the “heroism” of individuals or teams - overtime, crutches, luck, etc. By experience, this may in the short term lead to a solution to the problem, but if you do this all the time, this will lead to the “burning out” of people and other very unpleasant consequences.

Secondly, as in the airline example, the deadline for task readiness is a trade-off between deadlines and risks. If we begin to violate the deadline, then we increase the risk of problems with the version - either the entire version will not be released on time, or the quality will be reduced, and we will receive a referral shaft due to non-working functionality or a problem of work under load.

Unfortunately, situations where there is a very, very important task and it must be released in the current release arise. But the main idea is that such situations should not be generally accepted practice and encouraged, but rather be recognized as a problem and considered on “retrospectives”.

Flight delay due to VIP passenger


Suppose you bought a flight ticket with a transfer. Let's say you figured out an adequate time between connections. You arrived at the airport in advance, checked in, got on a plane, called your mom and dad, and here you are announced that the flight is delayed because An important government delegation is late for it (of course, they will tell you instead about bad weather or additional checks :)). Together with the entire aircraft, you are waiting for this delegation and as a result you are flying away with a significant delay. Arriving at an intermediate point, you discover that you have only 30 minutes left to navigate in a huge airport, physically to get to another terminal. Maybe you, of course, will have time, and maybe you will mess up in a hurry and run to the wrong terminal, or even you just don’t have time to go through all the necessary procedures. And if you are also a member of some other government organization (but just modest) and also hurry somewhere?

A source

Thus, if the airline regularly allowed shift flights, then this would lead to problems from all sides. For the passenger, this means that he will need to lay more time at the docking, passengers will be more inclined to take risks and come right next to the end of registration. This will lead to more conflicts and a waste of time. For the airline, this also means that more time is needed for airport downtime, thereby increasing costs.

In the release process, if suddenly some task does not have time to do it by the scheduled date, there is also a strong temptation to move the entire release - for example, for a week. It seems that this is an easy and good solution, especially if this task is under the control of the main customer.

Let's look at what this all leads to in the end.

The project GIS utilities in the next release released up to 100 tasks and more defects correction. Shift release means that users will get the functionality they need or fix defects later. Of course, the task under discussion is really important, but at the same time many of the remaining 99 tasks are also very important, but since everything is fine with them, we have already forgotten about them.

Further. If we begin to regularly shift the version, then the customer begins to lose faith in the release plan. He always thinks in his head that yes, of course, the next release is scheduled for the 10th, but most likely something will happen and there will be a shift by a week, or even two. The reasons for the shifts may be different, but in the end they are all forgotten and there remains a feeling of an unstable and non-transparent process.

What does this lead to? Moreover, if an urgent defect or task arises, the customer does not agree to release it within a particular release, but requires a special version or a hotfix. As a result, we have significant additional costs.

If we allow for a version shift, this impairs the process optimization capabilities. On the contrary, if the release procedure is monotonous and regular, then we have the opportunity to improve it.After the release of the version, we hold a “retrospective”, where we discuss the main positive and negative points that happened during the iteration. With each repetition, we make some improvements, as a result, overhead costs are reduced, the number of errors decreases, the result improves.

Why a big plane is not always better


— 75-110 ( SSJ-100 ), 140-180 ( A320 , Boeing 737 ), 200-300 ( A330 , A340 ), A380 , 525 853 . , , .

Now at least three airlines fly to my hometown of Astrakhan, making one flight a day on regional or narrow-body aircraft. Even if the infrastructure of the airport in Astrakhan would allow the reception of the A380 (the largest and most spacious aircraft of Airbus), then in order to load it, it would be necessary to drastically reduce the intensity of flights, which would be completely inconvenient for passengers. If the cost difference is not large, then passengers will prefer a larger number of flights per day.

Approximately the same logic works in the release process. The less time between releases of the version, the better the customer. It would be ideal for a customer if the release process were completely invisible and transparent at all. We implemented the task, pressed the button, and she immediately worked without promising downtime.In order to ensure a high frequency of release of versions, it is necessary to improve the level of automation and debug processes.

Here, the full height question arises about overhead.

Indeed, releasing a version implies an overhead or release cost. For example, we carry out such work as regression testing, load testing, analysis of performance statistics, analysis of deployment scripts and data migration. It is logical to assume that the greater the release cycle, the less overhead we have per unit of “useful” output functionality. It turns out that the production team must try to increase the length of the release cycle in order to reduce costs. However, this conclusion is incorrect. The dependence of the cost per unit of functionality on the length of the release cycle has the form shown in the figure below (for our project, our team, with the current level of automation, competences, fitness, emotional balance, and results of the Russian national football team's performance).

Figure 1. Dependence of overhead costs per unit of production on the length of the release cycle (for our conditions)

Indeed, if we want to implement a continuous release process in the style of “accomplish the task, press the button, everything works in sales” or release a version every week, then this will definitely require from us substantial initial efforts to restructure the work. Most likely this will be associated with a further increase in the level of automation, as well as an increase in the discipline and debugging of all procedures. Perhaps this will entail some architectural changes. We have not worked so far so that the release cycle would be two weeks or less for quite a long time, so the behavior of the curve in the specified range is my guess, which is based on the experience of producing small intermediate versions and hotfixes.

In the area of ​​a three-four-week cycle, we most likely have a local minimum for the costs per unit of output, and then with an increase in the length of the release cycle, the costs begin to rise sharply. About this below.

Extra load - extra fuel


Suppose the airline takes on an extra load. It will not be free for her, as it will require at least additional fuel. Unfortunately, in aviation there were sad cases when, due to greed or stupidity, a preponderance was allowed, which led to plane crashes.

If we release a release every three to four weeks, then this requires performing certain procedures for releasing each version — these are two cycles of regression testing, load testing, testing deployment scripts, and data migration (we'll write about this in a separate article). It is a mistake to assume that if you increase the cycle to two months, for example, we will need to do the same work for the release.The fact is that if the cycle is large, then more changes fall into it. This, in turn, causes the growth of potential problem areas — in a complex system, this growth, if not exponential, then precisely (due to the mutual influence of changes) is non-linear. In fact, we already see that for a four-week cycle we lack one iteration of regression testing. In fact, in the framework of the first regression, we identify a number of defects, correct them, and the volume of changes is such that another “final” regression is required. This final regression is already more compact, it runs much faster and without problems, but it is still required. If we had switched to a two-week release cycle, then most likely we could have managed only one regression. On the other hand, if we increase the cycle to 1.5-2 months,then for stabilization we need not only 1.5 regressions, but two or three.

The stewardess in the cabin of the new liner announces that she is on the plane:
- On the first deck - luggage, on the second - a bar, on the third - a golf course, on the fourth pool.
And adds:
- And now, gentlemen, buckle up. Now with all this garbage, we try to fly.

If we extend the release even further, the volume of changes and the risks on them will increase so much that the stabilization process will no longer be convergent. Our plane is most likely not to fly with all these pools and golf courses.

Never was and here again


The release of the release implies a rather complex infrastructure. For release, several test benches are used, a version control system, an assembly conveyor, etc. are set up. Therefore, we try to organize our work so that we only have one release at a time. If suddenly it turns out that you need to support multiple releases, this significantly increases the cost. Therefore, we do our best to avoid it.

A source

First we must recognize that unplanned and urgent tasks happen. Therefore, when this occurs, it is desirable that it be released into commercial operation as soon as the task is ready. If the release cycle is long, then a surprise awaits you!

Suppose you have a release cycle of four months. The likelihood that it turns out that just in the coming days is scheduled for release release, extremely small. In this case, you need to make a special version for this task, prepare a release and release it in prom. This is an additional cost. Even if it so happened that just the other day a version is released. In this case, most likely by the fact that you squeeze in an additional task, you will violate the preparation plan for release. It is possible that this will lead to the fact that you want to move the version. In my opinion, the consequences of this are even worse than making a special version.

If, on the contrary, you release a release every two weeks, then you can try to include the task in the current version or at worst in the next one. In this case, from the time the task is ready to its release, at least 4 weeks will pass, and most likely 2-3 weeks. This is most often perfectly acceptable. This means that you are likely to do without additional infrastructure costs.

Depending on how often you may have changes, the more benefits you will get from a short release cycle.

On projects of scale GIS utilities unplanned urgent changes occur quite regularly. Recognition of this fact requires some effort (and what’s there to hide - great courage), because here we are in some way in conflict with intuition and unwillingness to work with risks. The fact is that if we consider a particular risk that may lead to the need to make a special version, then its probability is likely to be microscopic. If last month there were no revisions in connection with changes in legislation or some regional features that were revealed again, we do not expect anything like that next month. Therefore, the conclusion is made that since the probabilities of each of the events are small, nothing will happen at all. However, this is the wrong conclusion. The fact is that the probabilitythat at least one risk is realized, equal to the sum of the probabilities of realization of each of the individual risks. Therefore, if we take into account the scale of everything that happens even during one month of the release cycle (the number of key employees participating in the preparation of the version, the decisions that are made, the number of external systems with which the GIS is integrated, the complexity of the infrastructure, the complexity of the subject area, etc. d.), then this probability is already quite a can be substantial. For example, even with a one-month release cycle, starting from January 2018 through May 2018, three times, without exaggeration, the most important and urgent tasks that ASAP had to do and which required a special version arose. What can we say about the release cycle in 4-5 months! If we were released every 5 months,then most likely another 2-3 intermediate versions would be added to these two special versions, which makes release cycles for more than 1-1.5 months completely economically impractical in our conditions.

Therefore, release processes that allow you to react flexibly to changes are a great blessing on projects of the scale of GIS utilities.

Slap the pilot only after landing!


Each unfinished revision carries the risks of releasing a version. You can’t believe the words of production managers, no matter how honest their eyes are! Better to believe the facts.

In our experience, more or less, you can breathe easy only when the revision is fully tested and all critical defects are fixed. After that, the main production risks have, as a rule, been removed. But until this happens, we cannot exclude the development of events in which something goes wrong, and the revision will begin to threaten the release version. How many times have we believed in the word that tomorrow the revision will be tested and there will be no problems, and then there were problems.

If the revision is done partially or completely, but is postponed, then most likely, when you return to it after half a year, you will find that it does not work anymore, everything has changed in the system and you need to understand again. Maintaining the relevance of deferred revisions is an additional useless expense that is also best avoided.

A source

Moreover, due to the complexity of the subject area of ​​the GIS housing and communal services, there are always the risks that non-optimal solutions were incorporated in the design or some regional peculiarities were not taken into account. Many points can be detected only during pilot operation. This is another argument for making the release cycle shorter and releasing improvements faster in operation. You will get feedback faster, test your decisions faster and be able to quickly do what the customer and the market really need. If the release cycle is long, then it on the contrary provokes to do redundant functionality, which increases the risk of spending a lot of energy on what is then wrong or unnecessary.

The crew says goodbye to passengers


We reviewed the main release management practices that were foundational to us. This is a regular delivery of functionality and reduced release cycle. Despite being widely known, unfortunately, these practices are actually not entirely obvious and in some ways contradict intuition, therefore they often come across obstacles in their implementation.

In the framework of the GIS housing and communal services, these practices are implemented, they have been successfully functioning for a long time and show a good result. We have ensured that the schedule of releases is strictly adhered to, the procedure for preparing for the release of the version has become more controlled and is going to be orderly and calmer, we can flexibly react to changes.

Of course, the recommendations specified in the article are not limited to life. Overboard there are many interesting nuances and situations, for example:


This will be discussed next time.

It would be interesting to hear your opinion. Do you agree with the statements and recommendations given in the article? How do you manage release on large projects? Was the deployment easy?

Source: https://habr.com/ru/post/415433/


All Articles