Puppet + Hiera. Squeeze the maximum

In this article, I would like to talk about how we use Puppet and Hiera to configure the iron and virtual servers. Basically, it will talk about the architecture and the hierarchy we have invented to facilitate and systematize the configuration of servers.

Write this article led me to the fact that on the Internet, I especially did not find good, really working examples of how to work with hiera and why it is needed. Basically, these are tutorials with examples in order to enter the topic. But the real practical application of hiera is not written there. Maybe I was looking badly, but here is a real example for you, which, perhaps, will help you to dot the i, as I once did.

Who would be useful for this article?

If a:

You know what Puppet and Hiera are, but don't particularly use them in a bundle, because it is not clear how to do it and why
You have a lot of commands in your company and you need to somehow differentiate the server configuration at the command level.
You use the pappet, and the node files have grown to incredible sizes
You like to read server configuration in divine yaml format :)
You are basically interested in Configuration Management and system administration.

This article is for you.

Before you start

I'll warn you right away, the article turned out to be long, but, I hope, useful. In addition, it is assumed that you have already connected hiera in puppet, and you somehow, but are familiar with puppet. If hiera is not connected, it is not difficult to do.

You can read more about what Puppet is here: Puppet for beginners .
How does Hiera - Introducing in Hiera .

Input data

We have about 30 development teams in SEMrush, each of which has its own servers
Each team works with its own set of technologies (PL, DBMS, etc.)
Teams can and should (ideally) use a common configuration for some specific projects (code reuse)
The teams themselves manage the deployment of applications to their servers (this is not done via pappet)

A bit of history

Initially, we had everything in the papette of version 3, then we decided to implement the 4th pappet, and all the new servers began to be placed in it, and the old ones were gradually ported to the 4th.

In the third folder, we used to use the classic system of node files and modules. The modules were created in a special group of projects in Gitlab, they were cloned onto a web server (using r10k ), then agents came to the master and got a directory to apply it on the server.

Then they began to try not to do so and not to use local modules, but to place links to the necessary modules and their repositories in Puppetfile. Why? Because those modules are constantly maintained and improved (well, ideally) by the community and developers, but our local ones are not. Later they implemented hiera and switched to it completely, and node files (such as nodes.pp) have sunk into oblivion.

In the fourth folder we tried to completely abandon the local modules and use only remote modules. Unfortunately, a reservation should be inserted here again, since “completely” did not work out, sometimes you still have to clone something and finish it yourself. Of course, there is only hiera and no node files.

When you have 30 teams with technology zoo, the problem of how to maintain this zoo with> 1000 servers becomes especially acute. Then I will tell you how hiera helps us in this.

Hierarchy

The hiera (actually, from which it got its name) is configured hierarchy. We have it as follows:

--- :hierarchy: - "nodes/%{::fqdn}" - "teams/%{::team}_team/nodes/%{::fqdn}" - "teams/%{::team}_team/projects/%{::project}/tiers/%{::tier}" - "teams/%{::team}_team/projects/%{::project}/%{::role}" - "teams/%{::team}_team/projects/%{::project}" - "teams/%{::team}_team/roles/%{::role}" - "teams/%{::team}_team/%{::team}" - "projects/%{::project}/tiers/%{::tier}/%{::role}" - "projects/%{::project}/tiers/%{::tier}" - "projects/%{::project}/%{::role}" - "projects/%{::project}" - "tiers/%{::tier}" - "virtual/%{::virtual}" - "os/%{::operatingsystem}/%{::operatingsystemmajrelease}" - "os/%{::operatingsystem}" - users - common

First, let's deal with incomprehensible variables (facts).

Each server in SEMrush should ideally have 4 exposed, special facts describing its membership:

team fact - which team does he belong to
project fact - to which project it belongs
fact role - what role does this project have
tier fact - what styling he has (prod, test, dev)

How it works? Pappet agent comes to the pappet master and, on the basis of these facts, searches for files for himself, traversing folders according to our hierarchy. No need to specify the ownership of configuration files to servers. Instead, the servers themselves know which files belong to them, looking only at their path and their facts.

During the server setup, admins contact the developers and refine these parameters (often the other way around, knowledgeable people contact the admins themselves) to build a hierarchy in hiera, on the basis of which then describe the server configuration. Such a system helps reuse code and be more flexible in terms of server configuration.

For example, we have a special project. In this project, there may be some frontend server with nginx, a backend server with python, a db cluster with mysql, a redis server for caching. All of these servers should be placed in one project called special , and then assigned to the role servers.

In the project file, we describe the parameters common to the whole project. The first thing that comes to mind is the creation on all servers of the user for deployment with the issuance of the necessary rights to it and the rolling of its ssh-keys.

In the role for each server, the service is usually described and customized — what is the purpose of this server (nginx, python, mysql, etc.) Tier, in this case, we definitely need it if we also need to deploy a copy of the production environment on the dev platform, but change something in it (passwords, for example). In this case, the dev servers and the prod servers will only differ by the fact that the tier is set to the desired “position” (prod or dev). And then a little magic and hiera will do its job.

If we need to deploy two identical servers in the same role, but something in them should be different, for example, some lines in the configuration, then another part of the hierarchy will come to the rescue. We place the files with the name of the {fqdn }.yaml in the right place (for example, nodes/myserver.domain.net ), set the desired values of variables at the server level, and the applet will apply the same configuration for the role to both servers, and unique for each from servers.

Example: two backends with php-code are in the same role and completely identical. It is clear that we do not want to backup both servers - there is no point. We can create a role in which to describe the same configuration for both servers, and then create another file nodes/backend1.semrush.net in which to place the configuration for the backup.

The teams/team-name.yaml command teams/team-name.yaml specifies the configuration for all servers belonging to the team. Most often there are described users who can interact with these servers, as well as their access rights.

Based on these variables, we have built this hierarchy . The higher the found file by hierarchy, the higher the priority of the configuration specified in it.

From this it follows that variables can be overdriven based on this hierarchy. That is, the variable in the role file " projects/%{::project}/%{::role} " has a higher priority than the variable in the project file " projects/%{::project} ". Also, variables can merjitsya at all levels of the hierarchy, if you have a module and / or profile / role written so that you can do it. By specifying the common part of the mysql config for all servers of the project, you can add special parts that have weight for this role to the same variable at other levels of the hierarchy (for the slave there will be an additional section in the config).

It turns out that the file of a particular node, located along the path " hieradata/nodes/%{::fqdn} ", has the highest priority. Next comes the node file, but at the command level. Below is a block that describes other, more general facts:

  - "virtual/%{::virtual}" - "os/%{::operatingsystem}/%{::operatingsystemmajrelease}" - "os/%{::operatingsystem}" - users - common

Accordingly, we have a configuration in the common.yaml file, which exactly should come to all servers, all users are described in the users.yaml file (but not all of them are created on the servers, of course), in os/%{::operatingsystem} general configuration inherent to servers with a specific OS (using fact ::operatingsystem ) and so on.

I think, looking at this hierarchy, it becomes all clear. Below I will consider an example of using such a hierarchy. But first you need to talk about profiles.

Profiles

An important point in configuring servers using modules is the use of profiles. They are located on the site/profiles path and are entry points to modules. Thanks to them, it is possible to fine-tune the hanging modules on the server and create the necessary resources.

Consider a simple example. There is a module that installs and configures redis. And we also want when connecting this module to set the sysctl parameter vm.overcommit_memory to 1, because here . Then we write a small profile that provides this functionality:

 # standalone redis server class profiles::db::redis ( Hash $config = {}, String $output_buffer_limit_slave = '256mb 64mb 60', ) { # https://redis.io/topics/faq#background-saving-fails-with-a-fork-error-under-linux-even-if-i-have-a-lot-of-free-ram sysctl { 'vm.overcommit_memory': ensure => present, value => '1', } class { '::redis': * => $config, } }

As mentioned above, the profiles are a tool to change / improve the behavior of the module, as well as reduce the number of configurations in the hiera. If you use remote modules, then you may often encounter the problem that "approved" modules often do not have the functionality you need, or have some bugs / flaws. Then, in principle, you can clone this module and fix / add functionality. But the right decision would be, if possible, to write a good profile that is able to “prepare” a module in the way you want. Below are a few examples of profiles, and you can better understand what they are for.

Hiding secrets in hiera

One of the important advantages of hiera compared to the “bare” papet is its ability to store sensitive data in configuration files in an encrypted form in the repository. Your passwords will be safe.

In short, you encrypt the necessary information with the help of the public key and place it with such string in the hiera file. A private key is stored on the appet master, which allows decrypting this data. More on this can be found on the project page.

On the client (working computer) the tool is installed simply, it is possible through gem install hiera-eyaml . Next, using the eyaml encrypt --pkcs7-public-key=/path/to/public_key.pkcs7.pem -s 'hello' type eyaml encrypt --pkcs7-public-key=/path/to/public_key.pkcs7.pem -s 'hello' you can encrypt the data and insert it into the eyaml file or just yaml, depending on how you configure, and then pappet himself will figure it out. It will turn out something like:

 roles::postrgresql::password: 'ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAbIz1ihQlThMWa9T+Lq194Y6QdElMD1XTev5y+VPSHtkPTu6Al6TJaSrXF+7phJIjue+NF4ZVtJCLkHxUR6nJJqks0fcGS1vF2+6mmM9cy69sIU1A3HqpOHZLuqHAc7jUqljYxpwWSIGOK6I2FygdAp5FfOTewqfcVVmXj97EJdcv3DKrbAlSrIMO2iZRYwQvyv+qnptnZ7pilR2veOCPW2UMm6zagDLutX9Ft5vERbdaiCiEfTOpVa9Qx0GqveNRVJLV/5lfcL5ajdNBJXkvKqDbx8d3ZBtEVAAqeKlw0LqzScgmCbWQx2kUzukX5LSxbTpT0Th984Vp1sl7iPk7UTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCp5GcwidcEMA+0wjAMblkKgBCR/f9KGXUgLh3/Ok60OIT5]'

Or a multiline string:

 roles::postgresql::password: > ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEw DQYJKoZIhvcNAQEBBQAEggEAbIz1ihQlThMWa9T+Lq194Y6QdElMD1XTev5y +VPSHtkPTu6Al6TJaSrXF+7phJIjue+NF4ZVtJCLkHxUR6nJJqks0fcGS1vF 2+6mmM9cy69sIU1A3HqpOHZLuqHAc7jUqljYxpwWSIGOK6I2FygdAp5FfOTe wqfcVVmXj97EJdcv3DKrbAlSrIMO2iZRYwQvyv+qnptnZ7pilR2veOCPW2UM m6zagDLutX9Ft5vERbdaiCiEfTOpVa9Qx0GqveNRVJLV/5lfcL5ajdNBJXkv KqDbx8d3ZBtEVAAqeKlw0LqzScgmCbWQx2kUzukX5LSxbTpT0Th984Vp1sl7 iPk7UTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCp5GcwidcEMA+0wjAM blkKgBCR/f9KGXUgLh3/Ok60OIT5]

It seems that we have finished with the preparation, now we can consider an example.

Example on fingers

Spoiler : there will be a lot of configurations next, so those to whom this article was of purely theoretical interest can skip this section and go to the end.

Let's now look at an example of how to configure servers using hiera in puppet4. I will not publish the code of all profiles, because otherwise the post will be quite large. I will focus on the hiera hierarchy and configuration.

The task is this: we need to deploy:

Two identical database servers where postgresql is deployed
Two more servers - frontend with nginx
Fifth and Sixth Servers - python backends in docker
Everything is the same on the dev-environment, except for some server configuration

We will create our hierarchy in order and we will start with the project file.

Project

Create project projects/kicker.yaml file projects/kicker.yaml . Put in it what is common to all servers: we need some repositories and deployment folders, as well as the user deploy.

 --- classes: - apt::debian::semrush files: "/srv/data": ensure: 'directory' owner: 'deploy' group: 'www-data' mode: '0755' '/srv/data/shared_temp': ensure: 'directory' owner: 'deploy' group: 'www-data' mode: '0775' user_management::present: - deploy

Db role

Create a role file for the database servers projects/kicker/db.yaml . For now, we can do without dividing servers into environments:

 --- classes: - profiles::db::postgresql profiles::db::postgresql::globals: manage_package_repo: true version: '10' profiles::db::postgresql::db_configs: 'listen_addresses': value: '*' profiles::db::postgresql::databases: kicker: {} profiles::db::postgresql::hba_rules: 'local connect to kicker': type: 'local' database: 'kicker' user: 'kicker' auth_method: 'md5' order: '001' 'allow connect from 192.168.1.100': type: 'host' database: 'kicker' user: 'kicker' auth_method: 'md5' address: '192.168.1.100/32' order: '002'

Here we connect one profile, written for general use by all who want to install postgres on their servers. The profile is configurable and allows you to flexibly configure the module before using it.

For the most curious below the cut code of this profile:

Profiles :: db :: postgresql

 class profiles::db::postgresql ( Hash $globals = {}, Hash $params = {}, Hash $recovery = {}, Hash[String, Hash[String, Variant[String, Boolean, Integer]]] $roles = {}, Hash[String, Hash[String, Variant[String, Boolean]]] $db_configs = {}, Hash[String, Hash[String, Variant[String, Boolean]]] $databases = {}, Hash[String, String] $db_grants = {}, Hash[String, Hash[String, String]] $extensions = {}, Hash[String, String] $table_grants = {}, Hash[String, Hash[String, String]] $hba_rules = {}, Hash[String, String] $indent_rules = {}, Optional[String] $role = undef, # 'master', 'slave' Optional[String] $master_host = undef, Optional[String] $replication_password = undef, Integer $master_port = 5432, String $replication_user = 'repl', String $trigger_file = '/tmp/pg_trigger.file', ){ case $role { 'slave': { $_params = { manage_recovery_conf => true, } if $globals['datadir'] { file { "${globals['datadir']}/recovery.done": ensure => absent, } } $_recovery = { 'recovery config' => { standby_mode => 'on', primary_conninfo => "host=${master_host} port=${master_port} user=${replication_user} password=${replication_password}", trigger_file => $trigger_file, } } $_conf = { 'hot_standby' => { value => 'on', }, } file { $trigger_file: ensure => absent, } } 'master': { $_conf = { 'wal_level' => { value => 'replica', }, 'max_wal_senders' => { value => 5, }, 'wal_keep_segments' => { value => 32, }, } file { $trigger_file: ensure => present, } } default: { $_params = {} $_recovery = {} $_conf = {} } } class { '::postgresql::globals': * => $globals, } class { '::postgresql::server': * => deep_merge($_params, $params), } create_resources('::postgresql::server::config_entry', deep_merge($_conf, $db_configs)) create_resources('::postgresql::server::role', $roles) create_resources('::postgresql::server::database', $databases) create_resources('::postgresql::server::database_grant', $db_grants) create_resources('::postgresql::server::extension', $extensions) create_resources('::postgresql::server::table_grant', $table_grants) create_resources('::postgresql::server::pg_hba_rule', $hba_rules) create_resources('::postgresql::server::pg_indent_rule', $indent_rules) create_resources('::postgresql::server::recovery', deep_merge($_recovery, $recovery)) }

Thus, we install Postgresql 10 one fell swoop, set up the config ( listen ), create the kicker database, and write two rules to pg_hba.conf to access this database. Cool!

Role frontend

We take on the frontend . Create the projects/kicker/frontend.yaml file as follows:

 --- classes: - profiles::webserver::nginx profiles::webserver::nginx::servers: 'kicker.semrush.com': use_default_location: false listen_port: 80 server_name: - 'kicker.semrush.com' profiles::webserver::nginx::locations: 'kicker-root': location: '/' server: 'kicker.semrush.com' proxy: 'http://kicker-backend.semrush.com:8080' proxy_set_header: - 'X-Real-IP $remote_addr' - 'X-Forwarded-for $remote_addr' - 'Host kicker.semrush.com' location_cfg_append: 'proxy_next_upstream': 'error timeout invalid_header http_500 http_502 http_503 http_504' proxy_connect_timeout: '5'

Everything is simple here. We connect the profiles::webserver::nginx profile, which prepares entry into the nginx module, as well as defining variables, specifically server and location for this site.

The attentive reader will notice that it would be more correct to put the site description higher in the hierarchy, because we will have another dev-environment, and other variables will be used there ( server_name , proxy ), but this is not too important. By describing the role in this way, we can see how these variables are redefined only by hierarchy.

Role docker

The role of docker projects/kicker/docker.yaml :

 --- classes: - profiles::docker profiles::docker::params: version: '17.05.0~ce-0~debian-stretch' packages: 'python3-pip': provider: apt 'Fabric3': provider: pip3 ensure: 1.12.post1 user_management::users: deploy: groups: - docker

Profile profiles/docker.pp very simple and elegant. I will give his code:

Profile profiles :: docker

 class profiles::docker ( Hash $params = {}, Boolean $install_kernel = false, ){ class { 'docker': * => $params, } if ($install_kernel) { include profiles::docker::kernel } }

All is ready. This is already enough to deploy the product we need on multiple servers simply by assigning them a specific project and role (for example, putting the file in the required format in the facts.d directory, the location of which depends on how you install puppet).

Now we have the following file structure:

 . ├── kicker │ ├── db.yaml │ ├── docker.yaml │ └── frontend.yaml └── kicker.yaml 1 directory, 4 files

We will now understand the environments and the configuration definition, which is unique for a role on a specific site.

Environment and override

Create a common configuration for the entire sale. The projects/kicker/tiers/prod.yaml contains an indication that we need to connect a class with a firewall to this environment (well, after all), as well as a certain class that provides an increased level of security:

 --- classes: - semrush_firewall - strict_security_level

For the dev environment, if we need to describe something specific, the same file is created and the necessary configuration is entered into it.

Next, you need to redefine the variables for the nginx config of the frontend role in the dev environment. To do this, you need to create a file projects/kicker/tiers/dev/frontend.yaml . Notice the new level of hierarchy.

 --- profiles::webserver::nginx::servers: 'kicker-dev.semrush.com': use_default_location: false listen_port: 80 server_name: - 'kicker-dev.semrush.com' profiles::webserver::nginx::locations: 'kicker-root': location: '/' server: 'kicker-dev.semrush.com' proxy: 'http://kicker-backend-dev.semrush.com:8080' proxy_set_header: - 'X-Real-IP $remote_addr' - 'X-Forwarded-for $remote_addr' - 'Host kicker-dev.semrush.com' location_cfg_append: 'proxy_next_upstream': 'error timeout invalid_header http_500 http_502 http_503 http_504' proxy_connect_timeout: '5'

Class is no longer necessary, it is inherited from previous levels of the hierarchy. Here we have changed server_name and proxy_pass . The server with the role = frontend and tier = dev facts will first find the projects/kicker/frontend.yaml file, but then the variables from this file will be overridden by the file with the highest priority projects/kicker/tiers/dev/frontend.yaml .

Password Hiding for PostgreSQL

And so, we have the last item on the agenda - to set passwords for PostgreSQL.

Passwords must be different in environments. We will use eyaml for safe storage of passwords. Create passwords:

 eyaml encrypt -s 'verysecretpassword' eyaml encrypt -s 'testpassword'

Paste the lines into the **projects/kicker/tiers/prod/db.yaml** and **projects/kicker/tiers/dev/db.yaml** (or you can use the eyaml extension, which is customizable), respectively. Here is an example:

 --- profiles::db::postgresql::roles: 'kicker': password_hash: > 'ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAsdpb2P0axUJzyWr2duRKAjh0WooGYUmoQ5gw0nO9Ym5ftv6uZXv25DRMKh7vsbzrrOR5/lLesx/pAVmcs2qbhd/y0Vr1oc2ohHlZBBKtCSEYwem5VN+kTMhWPvlt93x/S9ERoBp8LrrsIvicSYZByNfpS2DXCFbogSXCfEPxTTmCOtlOnxdjidIc9Q1vfAXv7FRQanYIspr2UytScm56H/ueeAc/8RYK51/nXDMtdPOiAP5VARioUKyTDSk8FqNvdUZRqA3cl+hA+xD5PiBHn5T09pnH8HyE/39q09gE0pXRe5+mOnU/4qfqFPc/EvAgAq5mVawlCR6c/cCKln5wJTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBDNKijGHBLPCth0sfwAjfl/gBAaPsfvzZQ/Umgjy1n+im0s]'

Next, the password for the kicker role will arrive, is decrypted and applied on the database server in PostgreSQL.

On this, in fact, that's all. Yes, the example turned out to be massive, but, I hope, functional, not leaving questions, understandable and useful. The final hierarchy in hiera was:

 . ├── db.yaml ├── docker.yaml ├── frontend.yaml └── tiers ├── dev │ ├── db.yaml │ └── frontend.yaml ├── prod │ └── db.yaml └── prod.yaml 3 directories, 7 files

You can view these files live by clipping a specially created repository.

Conclusion

Puppet is nice and handy in conjunction with hiera. I would not call it an ideal configuration tool in the modern world, not at all, but it deserves attention. He copes with some tasks very well, and his “philosophy” of constantly maintaining the same state of resources and configuration can play an important role in ensuring security and uniformity of configurations.

The modern world is gradually synergizing and evolving. Few people now use only one configuration system, often in the arsenal of devops and admins several systems at once. And this is good, as there is plenty to choose from. The main thing is that everything is logical and clear how and where it can be configured.

Our goal as admins in the end is not to configure anything yourself. All this should ideally be done by the teams themselves. And we have to give them a tool or product that allows it to be done safely, easily and, most importantly, with an accurate result. Well, and help solve architectural and more serious tasks than "You need to install PostgreSQL on the server and create a user." Kamon, 2018th year in the yard! ~~So throw out puppet and ansible and move to a serverless future.~~

With the development of clouds, containerization and container orchestration systems, configuration management systems are slowly receding and receding into the background for users and customers. You can also raise a failover cluster of containers in the cloud and keep your applications in containers with auto-scaling, backup, replication, auto-discovery, etc., without writing a single line for ansible, puppet, chef etc. Nothing to worry about (well, almost). On the other hand, there are no fewer iron servers due to clouds. You just don’t need to configure them anymore, this action is the responsibility of the cloud provider. But they are unlikely to use the same systems as mere mortals.

Credits

Thank:

Dmitry Tupitsin, Dmitry Loginov, Stepan Fedorov and the whole team of system administrators for helping to prepare this article
Vladimir Legkostupov for the picture
Yana Tabakova for organizing all this and helping to complete all the pre-publishing stages
Nikita Zakharov for his help in licensing

Source: https://habr.com/ru/post/412587/

All Articles