On the eve of the
Siberian HighLoad ++, we talked with one of our speakers, Yuri Nasretdinov, asking him if he worked well on VKontakte, at the same time lifting the veil of secrecy over the internal kitchen of the social network.

As part of the conference itself, Yuri will talk about how the social network inserts data into ClickHouse from tens of thousands of servers, which we have touched on in the current conversation only in passing.
- Please tell us about your work.At the moment I work in VKontakte. True, not so long - from the beginning of this year. I am engaged in video infrastructure and site infrastructure. The site is mainly written in PHP, and I am developing services and utilities in PHP and Go.
- What prompted you to go to work in VK?I was invited to work in VK. And I thought - why not? VKontakte is the most highly loaded site in Russia and one of the largest sites in the whole Internet. It has always been interesting for me to work on such a large project - to take part in the development of the site and the mobile application. It is possible, even to influence their development, to somehow improve. Probably, this is the most motivating factor - VKontakte is well known and used by everyone. And it is very pleasant to work on such a product, to help it become better.
- Before that, did you have to deal with something similar in scale?Yes, for about five years I worked at Badoo in a similar position - in infrastructure development. But the load in VK is much higher.
- Did you have any problems when switching to VK?VKontakte office is located in St. Petersburg. And before that I lived in the Moscow region, so I had to move. The move itself was quite easy - the company helps. But in St. Petersburg it is very cold in winter. It must have been the hardest thing to deal with.
- There is no feeling that life has remained somewhere inside the Moscow Ring Road?At first it really seemed to me so, despite the fact that I had previously been in other cities of Russia, besides Moscow. But in fact, I like St. Petersburg, probably even more than Moscow. It is calmer - fewer people, they are not in a hurry, and this is nice.
- And in terms of technologies used in the work - what was fundamentally new for you?VKontakte is very large, respectively, there really are some nuances that I have never come across before. For example, in Badoo there are practically no very popular profiles that are visited by a significant percentage of people. VKontakte has this, as there are a number of interesting tools that allow you to quickly scale very popular accounts.
In addition, VKontakte is different in that for historical reasons, there is almost all his own. Unlike, for example, Badoo, which mainly uses MySQL and Memcache (plus its services), VKontakte uses its databases and even its version of Memcache. VK developers could afford to create more highly efficient services (against the background of the same MySQL) that work well on such a huge scale. Most of the finished tools without a file can not be used in the infrastructure, including tens of thousands of servers, like VK, and this creates significant difficulties.
- Was it difficult to quickly penetrate into such “internal” instruments?I work in the infrastructure department, but there are not so many non-standard things. Basically it is even more standard stack than the one I worked with earlier. But if I worked, for example, in a department that deals with features on the back-end, then, of course, it would be useful to know how the Highload system is built, but not specific details. In such situations, help pages for new employees, descriptions of internal mechanisms.
In principle, a tangible part of the VK infrastructure was laid out by Pavel Durov in open source along with the documentation. Anyone can get acquainted with it, read how it all works. But, of course, it is much easier to perceive it in the context of how it is used internally. You come and start doing tasks, gradually studying what you need to solve them. See how it is already done, and do the same. And that's enough. After all, even if you master reading all the documentation on the infrastructure of VK, until you start using it, you will most likely not understand how it all works in detail.
I note that all of the above applies to my department (in others it may be otherwise).
- Do you have any specialization inside VK? What tasks did you manage to participate in?There is no specialization as such. I do what is currently required.
I work in a department whose activities affect different parts of the social network infrastructure, and this is a huge project (it takes a lot of time to fully understand the VK device).
For example, I participated in a partial transition to PHP7. This basically concerns the entire site, but at the same time does not apply to any specific details.
Another example is the problem with collecting logs, which we used ClickHouse to solve. I will just tell about it on HighLoad ++.
- We will give a small spoiler - what was the feature of this problem?The snag was a combination of two factors: on the one hand, we have written a lot of logs, and on the other - we need to quickly view them.
The existing system, in fact, did not know how to store large amounts of data, quickly giving only fresh information. To get the story, you had to manually perform very heavy queries.
- Readers might be wondering why the column ClickHouse was used for the solution?Columnar - due to the specifics of the work. Often, when searching for information in the logs, we need to filter by server or user. If we are talking about reading from a disk, the column database allows you to speed up the reading many times (on the background of a lower case), which is achieved by reading only the necessary columns and more efficient column compression. In addition, ClickHouse well parallelizes queries on the cores. Those. unlike classic databases, it can perform one query even on a whole cluster, using almost all the resources of both the processor and the disk. There are not many such databases, especially free ones, if there are any. ClickHouse was suitable for the task of storing logs very well.
I also note that ClickHouse was used by admins in VK in a fairly highly specialized task, as a back-end Grafana tool. This is a data acquisition and charting system for servers. True, only a few ClickHouse servers were deployed, that is, in fact, it was not available to programmers.
After it was decided to use ClickHouse to store logs, I helped create the appropriate infrastructure so that it would be convenient and understandable for everyone.
- Does VK use any significant number of third-party solutions (besides those mentioned above)?Of course, used. For example, Linux, on which it all runs. Do not underestimate the share of work that the operating system does for us.
Oddly, PHP is used. We have a samopisny engine called KittenPHP, which translates PHP into C ++, but for a number of tasks, including in production, the usual PHP is used.
Used by nginx. Until now, MySQL has been involved in some places, but gradually we refuse it - we use a samopisny database.
- And how is the development process?I do not see big differences between the processes in VK and what is accepted in the industry. We have a bug tracker, departments that are engaged in different functionality, responsible for their part of the project; there are sprints responsible for components, etc.
- Instead of totals, is it possible to designate the direction in which the department in which you work develops?As far as I know, before the department in which I now work, consisted of 1-2 people.
And to engage in infrastructure in the form in which it is being done now (probably, it looks like DevOps), started not so long ago. Therefore, it is too early to talk about plans - we solve existing problems, and so far we have enough work. And then we'll see.
In more detail about VK internals, use of ClickHouse and other details, Yuri will tell in
his report on Siberian HighLoad ++ on June 25-26. Also, you will certainly be interested in these reports: