How to make your IT infrastructure boring

Michael DeHaan - the man who created Ansible. Many of the things that system administrators, release and DevOps engineers do on a regular basis, to say the least, are uninteresting. DeHaan wants these people to free up their time for more interesting things (at work or outside the office door), and write a product code that frees up administrator time.
More time, less adrenaline during working hours, fewer scripts and fewer errors.
By the way, you can finish reading on this paragraph, instead connecting to the livestream on June 6 here .



If you still read on ...

Ansible: continuous integration and delivery


Ansible is a powerful open source automation language. Yes, it is great not only for management, but also for the deployment and orchestration of IT systems. Ansible was originally created to effectively solve a wide range of automation tasks, and as a simple universal basis for replacing traditional management tools, and as a result turned out to be very useful in many areas. For example, while ensuring zero downtime during continuous integration and application delivery (CI / CD). Usually, this problem is solved by extensive software development, the use of various software packages and a lot of tweaks unique to each specific configuration. Ansible was originally designed specifically for such orchestration scenarios and offers a ready-made “all-in-one” solution.

Continuous Integration and Application Delivery (CI / CD)


Some truisms. The practice of developing software systems over the past 10 years shows that the long life cycle of software versions (a cascading development model) has much higher overhead compared to a short cycle (the so-called “iterative” or agile development). It's all about arrhythmia: when programmers are just starting to work on a new version, IT specialists who are responsible for testing and deployment simply have nothing to do. But the closer the release is to the release, the more IT specialists are busy, and the more often programmers have to switch context, alternating error work and planning for the next version.

In addition, a long cycle increases the interval between the identification and elimination of software errors and shortcomings, which is especially critical for large web systems with a multi-million user audience. Therefore, the software industry is rapidly mastering agile methodologies under the slogan “release faster and more often,” so that participants in the development process can switch the context of work less often and create, debug, and implement improvements and innovations much faster.

Automation of quality control, TDD-development through testing and other related techniques further enhance the effectiveness of new methods of work. And where is the automation? Where are the technologies that make gears spin faster and reduce human participation to the strictly necessary minimum?

And here, for example, Ansible and Ansible Tower from Red Hat for orchestrating IT systems in the framework of modern software development processes.

Zero downtime


A little more obvious. Downtime is a lost profit and dissatisfied customers. Therefore, in web-based queuing systems whose users are distributed across all time zones, scheduled outages are allowed only in really serious cases, the list of which clearly does not include updating application versions. Similarly, the situation is in corporate environments, where the inaccessibility of the intranet or accounting system dramatically reduces the productivity of employees. Thus, any process automation must provide an update without interrupting operations - in other words, with zero downtime.

It is quite realistic to achieve zero downtime, but you need the appropriate tools — such that they provide an extended, multi-level and multi-stage orchestration, such as, for example, the Ansible system.

Application Build Systems


Continuous Delivery (CD) begins with Continuous Integration (CI). The system, which monitors the source code repositories for changes, automatically runs the corresponding tests and automatically builds (and ideally, tests) the new version of the application each time the code is updated, for example, the Jenkins project (jenkins.io).

To transfer the baton to a CD system after successfully assembling a new version of the application, the CI system build subsystem can call Ansible to immediately provide this new version to those who perform unit or integration testing. In particular, Jenkins can use Tower to deploy assemblies in various environments, and test or intermediate environments can be modeled based on the production environment, which greatly improves predictability throughout the software life cycle. The data returned to Ansible from the results of the automation scripts can be directly used in the Build Systems tasks of the Tower system. In fact, Tower even allows you to test deployment scenarios in a staging environment, before running them on “combat” servers.

Alternately updating multi-tier applications


A CD system must be able to orchestrate the rolling update process of multi-level applications. Thanks to the push-architecture and multi-level multistage orchestration capabilities, Ansible copes with this task quite well, updating any application level by level, while exchanging data between them.

To implement sequential updates, Ansible uses Play scripts that allow you to precisely specify the group of target hosts and assign tasks (Role) to be performed on them. Tasks are usually advertisements that a specific IT resource must be in a given state, for example, for one version of the software a specific package must be installed, and for the other it is required to check with the code repository. Web application topologies, as a rule, require updating them in a strict sequence, and you cannot still update applications and system configurations on all machines at the same time.

When the service is restarted, it remains unavailable for some time, the replacement of the application version also does not happen instantly. Therefore, before updating the system, we remove it from the balancing pool. As a result, we need the ability to automate the operations of connecting and disconnecting machines from the pool. “Consistently” is the key word. Ansible can very accurately control the size of the window of the alternate update. Well, the development of such updates is carried out very carefully, and if at some stage there is a failure, then the update is suspended so as not to disable the rest of the IT infrastructure.

Continuous deployment for automation scripts


In addition to the CD functionality for services operating in industrial mode, you can also organize the continuous deployment of the automation scripts themselves (Ansible Playbook instruction sets. Do not stop reading, in the second part there will be examples of playbooks). This allows system administrators and developers to manage scripts using the source code repository, test these scripts in a staging environment, and automatically transfer them to the production environment in the event of a successful run-in. In other words, when working with scripts, you get all the methodological and other advantages of the central code repository, which you are used to when developing software.

Changes to software and system configurations are one of the main causes of unplanned outages. Therefore, in addition to automated testing, there is human control. It can be organized by integrating with a code inspection system, like Gerrit , and the changes can be applied only after they are approved by the responsible comrades.

Alternate updates and load balancing systems


Ansible works very independently with load balancing systems when performing incremental updates. Therefore, you can simply write in a Playbook script, in any cycle for a group of hosts, something like “perform this action on system X on behalf of host Y”, and Ansible will take care of the rest.

Ansible interacts well with load balancers of all kinds and is able to set the host temporary shutdown flag to deactivate its availability monitoring for the update period. A simple scheme “turn off monitoring - remove from the pool - update the required level of software - return to the pool - enable monitoring” easily implements a sequential update with zero downtime and no false alarms. And all this in a fully automated mode, without operator participation.

Integrated Intermediate Testing


Tower can work with various resource inventory files (Inventory), which makes it easy to test alternate update scripts in an intermediate environment before running them on “combat” servers. To do this, it is sufficient to simulate the production environment in the test environment, run Ansible with the “-i” parameter and specify which inventory file should be used when executing the script — for the test environment or for the production one. The script itself does not need to be modified.

Version Control Deployment


Some people like to pack applications along with OS packages (RPM, debs, etc.), but often, especially for web applications, such packaging is not needed. Therefore, Ansible includes several modules for deploying applications directly from source control systems. In the Playbook scenario, you can register a reconciliation with the code repository for a given tag or version number, after which Ansible checks that this condition is met on all target servers and activates subsequent actions only if the version needs to be replaced, thus eliminating unnecessary service restarts.

Integration with monitoring tools


As a complete orchestration system, Ansible supports integration with APM-based application performance management systems at the monitoring level. For example, during the deployment or integration testing stage, you must install or update the APM software agent with the application. Ansible has a special role for this, and after installing and activating the agent, Ansible can configure it in the APM monitoring stack (if it is not already configured) so that application managers can immediately verify that the new version is installed and working without problems .

If something went wrong after updating the application in the production environment, the monitoring tools may cause Ansible to roll back to the previous version. Of course, only if such a rollback is allowed.

Event Notification


In the CI / CD paradigm, everyone wants to be notified of events as quickly as possible. Ansible offers both built-in features, including an email module, as well as integration with external notification tools like instant messengers, social networks or event registration systems.

Deploy using the resource status model


One of the key features of Ansible, which makes it a very useful tool for application deployment, is the regular use of the resource state model in software update processes that has gained popularity in managing system configurations. Unlike traditional open source management tools, Ansible does not need to be equipped with any additional software or special scripts to organize the delivery of applications.

In Ansible, you can very accurately register and control the order of events at different levels of the architecture, which allows you to delegate actions to other systems, as well as combine directives of the resource model (like “package X should be able to Y”) and traditional script commands (like “run script .sh ”) in a single process.

Ansible also allows you to easily launch commands for checking various conditions and make decisions based on the results of their execution. Combining systems configuration and application deployment within a single tool chain is much more efficient than a scheme with several specialized tools, and, in addition, increases the consistency of operating system policies and applications.

Deployment Testing


The more opportunities, the greater the responsibility. Automating the process of continuous delivery dramatically increases the risk of deploying a bad configuration on all nodes of the system. To reduce risks, Ansible suggests inserting control tests into the scripts that will interrupt the alternate update if something goes wrong. To test various conditions, including the status of services, you can deploy arbitrary tests using Command or Script modules, and even create such tests as separate Ansible modules.

The Fail module can interrupt the execution of the script on the host at any time, which allows you to catch failures at an early stage of the sequential update. For example, due to the difference between the intermediate and production environments, a configuration error occurs in the latter, which disables the “combat” servers. In this case, in the Playbooks scenario, you can register an emergency exit at the very first stage of the sequential update. And if you have 100 servers, and the size of the window of the sequential update is 10, then such an emergency stop will give time to calmly figure everything out, correct the script and continue the update.

In the event of a failure, Ansible does not continue to work, leaving the system in a semi-configured state, and generates an error in order to attract the operator's attention and inform it on which hosts the update cycle went wrong, and how many changes were made on each platform. In Ansible there is a simulation run mode, when the system generates a report on what changes would have been made if the script was executed without its actual execution.

Compliance check


There are environments where configurations change only when there is no way without it. Any changes in such environments are pre-analyzed. Here systems of continuous delivery “with reservations” are used.

Ansible has a simulation run mode (activated by the "--check" flag), when the system reports on what changes would be made when the script was executed. The actual execution of the script does not occur in this case, the simulation run does not allow to catch errors, but it helps to better understand and analyze the details and results of the proposed changes.

On the other hand, even with the continuous deployment of new assemblies, Ansible allows you to run compliance checks much more often in order to catch the moment when some things in the production environment change as a result of human intervention and need to be fixed by running the appropriate Ansible script, for example, to change software version, adjust permissions, etc.

Deploy on autopilot


If you live in a world of multi-level multistage orchestration of sequential software update processes with zero downtime, then most likely the CI / CD is performed only by operators (both manually and with partial automation) and require coordinated actions of all participants in the dance. Ansible, along with its unique architecture and the absence of software agents on target hosts (enhancing security and eliminating the need to manage the control system itself), can easily describe and easily automate complex deployment processes, that is, Ansible implements full autopilot mode here.

Examples of Ansible automation scripts can be found on GitHub , and now we will give a basis and an example of how to write a Playbook script that can be executed in Ansible or Ansible Tower. Together with the list of modules and other documents, it will help you learn how to create your own playbooks scripts.

What is a playbook?


A Playbook script is, in essence, a set of instructions (plays) that is sent for execution to a single remote host or group of hosts. This is like a guide for assembling furniture from IKEA: just follow the instructions and get exactly what you saw in the store. This is how scripts work.

Modules


We will create a Playbook that will install a web server on a RHEL / CentOS 7 host and create an index.html file on it based on the template specified in the script. The sample script provided here is fully functional and ready to use. Below we look at an example of a Playbook script and show how to use modules.

Authors


The author is the one who creates the instructions that will be executed by the modules (often with additional values: arguments, locations, etc.). The modules are executed on the target host in the order in which they are followed in the Playbook script (including the includes and other additional files included in it). The state of the host changes (or does not change) depending on the results of the module execution, which are displayed as Ansible and Tower outputs.

Running a playbook script


First you need to understand a few things to run scripts Playbook. A playbook is a kind of symbolic system that tells the module to perform a task. To successfully launch a Playbook it is important to understand the following points:

1. Target system (Target)
Since Playbook scripts give instructions to the modules and provide interaction with them, Ansible thinks you understand what you are trying to do and simply automates it. That is why we say that Playbooks are like instructions or instructions: you tell automated elements how you want to configure a task. But at the same time you need to understand very well how the target host on which the Playbook script is running works.

2. Tasks
If you need to start a web server in some part of the Playbook, you need to understand how this is done in order to know which service module to use for this, and run the web server by name. If the Playbook installs a software package, you need to know how it is done on the target host. You should also understand, at least at a basic level, the essence of the tasks to be performed. Do you need additional host configuration for the software you want to install? Are there branching depending on the conditions and values ​​of the arguments? If any variables are passed in the process, you must understand exactly what and why.

Playbook Sample Script
The following Playbook script example will help clarify what you have just read. The target host is the RHEL / CentOS 7 server, where our script installs the NGINX web server and then creates the index.html file in the default webroot directory. After completing the installation and creating the index, the web server is started.

* Note: to run this sample Playbook script in Ansible Tower, you first need to configure inventory and accounts.

Playbooks start with three dashes of YAML (---), followed by:

Name : just the name of the script to preserve readability of the playbook.

Hosts : a list of target hosts on which Ansible should work.

Become : here we registered the true statement to make sure that nginx was installed without problems (this field is not always required).

1 --- 2 - name: Install nginx 3 hosts: host.name.ip 4 become: true 

With the indentation as the three previous lines, there is the tasks directive:, after which, with an additional indentation (according to the rules for nesting YAML), the tasks (plays) are listed. In this example, we have two tasks, and both use the Yum module. The first task adds the epel-release repository so that nginx can be installed. After epel appears on the system, the second task installs the nginx package.

The state : directive means that Ansible should check the status of the target host before performing any further actions. In our example, if the repository or nginx is already on the host, Ansible understands that it is not necessary to perform these two tasks, and proceeds to the next.

 1 tasks: 2 - name: Add epel-release repo 3 yum: 4 name: epel-release 5 state: present 6 7 - name: Install nginx 8 yum: 9 name: nginx 10 state: present 

The default download page that nginx uses is great to verify that nginx was installed correctly, but you most likely will want to do this with your startup html file. In this example, for simplicity, the index file template is located in the same directory from which the Playbook is launched. Destination is just the default path in nginix with no sites configured.

 1 - name: Insert Index Page 2 template: 3 src: index.html 4 dest: /usr/share/nginx/html/index.html 

The last line in our Playbook serves only to verify that the nginx service has been successfully started (or to start it if not).

 1 - name: Start NGiNX 2 service: 3 name: nginx 4 state: started 

The entire Playbook script turned out to be about the same length as the opening paragraph in this post:

 1 --- 2 - name: Install nginx 3 hosts: host.name.ip 4 become: true 5 6 tasks: 7 - name: Add epel-release repo 8 yum: 9 name: epel-release 10 state: present 11 12 - name: Install nginx 13 yum: 14 name: nginx 15 state: present 16 17 - name: Insert Index Page 18 template: 19 src: index.html 20 dest: /usr/share/nginx/html/index.html 21 22 - name: Start NGiNX 23 service: 24 name: nginx 25 state: started 

Summary


Playbook scripts are a simple and convenient way to do many things with a small amount of code. In the example above, we used three modules — yum, template, and service — to install a repository and software package on the server, create a file from a local template, and then start the newly installed service. In this case, our Playbook script came out a little longer than this offer! And although we performed it on the same host, it might as well do it on dozens and hundreds of servers, this only requires making very small changes to it. In addition, Tower allows you to place a Playbook script in a job template to run on a group of servers in the AWS cloud or in a corporate data center.

The architectural features of Ansible and the ability to integrate with CI systems, like Jenkins, provide automation not only of configuration management processes, but also of a much wider range of IT tasks. That is why we affectionately call Ansible a comprehensive orchestration system, and not just a software deployment and configuration management tool.

Source: https://habr.com/ru/post/412725/


All Articles