5.94 meter docker image with Telegram MTProxy

As everyone has already heard , at the end of May Telegram was released by the official MTProto Proxy (aka MTProxy) server, written on this page . In 2018, without Docker, there is little where, because it is accompanied in the same “official” way in the zero-config format. Everything would be fine, but three “but” slightly spoiled the impression of the release: image weighs> 130 Mb (Debian is rather thick there, but not the usual Alpine), because of “zero-config” it is not always conveniently configured (only with environmental parameters) and the guys forgot to hike, put the Dockerfile.



TL; DR We will do almost 1-in-1 official alpine-based docker image of 5.94MB in size and put it here (and Dockerfile here ); Along the way, we will understand how sometimes you can make friends software with Alpine with the help of pliers and a file, and we’ll play with a bit of size, just a fan for.

Image content


Once again, why all the fuss? Let's see what is the official image of the history team:

$ docker history --no-trunc --format "{{.Size}}\t{{.CreatedBy}}" telegrammessenger/proxy 

The layers are read from the bottom above, respectively:



The thickest is Debian Jessie, from which the original image is inherited, and we have to get rid of it in the first place (alpine: 3.6, by comparison, it weighs 3.97MB); next in size are curl and fresh certificates. To figure out what the other two files and the directory mean, take a look inside using the run command, replace CMD with bash (this way you can walk around the running image, get to know each other more closely, launch certain fragments, copy what is useful):

 $ docker run -it --rm telegrammessenger/proxy /bin/bash 

Now we can easily restore the picture of the incident - something like the lost official Dockerfile looked like:

 FROM debian:jessie-20180312 RUN set -eux \ && apt-get update \ && apt-get install -y --no-install-recommends curl ca-certificates \ && rm -rf /var/lib/apt/lists/* COPY ./mtproto-proxy /usr/local/bin RUN mkdir /data COPY ./secret/ /etc/telegram/ COPY ./run.sh /run.sh CMD ["/bin/sh", "-c", "/bin/bash /run.sh"] 

Where mtproto-proxy is a compiled server, the secret folder only contains the hello-explorers-how-are-you file with the AES encryption key (see the server commands, by the way, there is an official recommendation to get the key via the API, but put it in this way probably to avoid a situation where the API is also blocked), and run.sh makes all the preparations for starting the proxy.

Content original run.sh
 #!/bin/bash if [ ! -z "$DEBUG" ]; then set -x; fi mkdir /data 2>/dev/null >/dev/null RANDOM=$(printf "%d" "0x$(head -c4 /dev/urandom | od -t x1 -An | tr -d ' ')") if [ -z "$WORKERS" ]; then WORKERS=2 fi echo "####" echo "#### Telegram Proxy" echo "####" echo SECRET_CMD="" if [ ! -z "$SECRET" ]; then echo "[+] Using the explicitly passed secret: '$SECRET'." elif [ -f /data/secret ]; then SECRET="$(cat /data/secret)" echo "[+] Using the secret in /data/secret: '$SECRET'." else if [[ ! -z "$SECRET_COUNT" ]]; then if [[ ! ( "$SECRET_COUNT" -ge 1 && "$SECRET_COUNT" -le 16 ) ]]; then echo "[F] Can generate between 1 and 16 secrets." exit 5 fi else SECRET_COUNT="1" fi echo "[+] No secret passed. Will generate $SECRET_COUNT random ones." SECRET="$(dd if=/dev/urandom bs=16 count=1 2>&1 | od -tx1 | head -n1 | tail -c +9 | tr -d ' ')" for pass in $(seq 2 $SECRET_COUNT); do SECRET="$SECRET,$(dd if=/dev/urandom bs=16 count=1 2>&1 | od -tx1 | head -n1 | tail -c +9 | tr -d ' ')" done fi if echo "$SECRET" | grep -qE '^[0-9a-fA-F]{32}(,[0-9a-fA-F]{32}){,15}$'; then SECRET="$(echo "$SECRET" | tr '[:upper:]' '[:lower:]')" SECRET_CMD="-S $(echo "$SECRET" | sed 's/,/ -S /g')" echo -- "$SECRET_CMD" > /data/secret_cmd echo "$SECRET" > /data/secret else echo '[F] Bad secret format: should be 32 hex chars (for 16 bytes) for every secret; secrets should be comma-separated.' exit 1 fi if [ ! -z "$TAG" ]; then echo "[+] Using the explicitly passed tag: '$TAG'." fi TAG_CMD="" if [[ ! -z "$TAG" ]]; then if echo "$TAG" | grep -qE '^[0-9a-fA-F]{32}$'; then TAG="$(echo "$TAG" | tr '[:upper:]' '[:lower:]')" TAG_CMD="-P $TAG" else echo '[!] Bad tag format: should be 32 hex chars (for 16 bytes).' echo '[!] Continuing.' fi fi curl -s https://core.telegram.org/getProxyConfig -o /etc/telegram/backend.conf || { echo '[F] Cannot download proxy configuration from Telegram servers.' exit 2 } CONFIG=/etc/telegram/backend.conf IP="$(curl -s -4 "https://digitalresistance.dog/myIp")" INTERNAL_IP="$(ip -4 route get 8.8.8.8 | grep '^8\.8\.8\.8\s' | grep -Po 'src\s+\d+\.\d+\.\d+\.\d+' | awk '{print $2}')" if [[ -z "$IP" ]]; then echo "[F] Cannot determine external IP address." exit 3 fi if [[ -z "$INTERNAL_IP" ]]; then echo "[F] Cannot determine internal IP address." exit 4 fi echo "[*] Final configuration:" I=1 echo "$SECRET" | tr ',' '\n' | while read S; do echo "[*] Secret $I: $S" echo "[*] tg:// link for secret $I auto configuration: tg://proxy?server=${IP}&port=443&secret=${S}" echo "[*] t.me link for secret $I: https://t.me/proxy?server=${IP}&port=443&secret=${S}" I=$(($I+1)) done [ ! -z "$TAG" ] && echo "[*] Tag: $TAG" || echo "[*] Tag: no tag" echo "[*] External IP: $IP" echo "[*] Make sure to fix the links in case you run the proxy on a different port." echo echo '[+] Starting proxy...' sleep 1 exec /usr/local/bin/mtproto-proxy -p 2398 -H 443 -M "$WORKERS" -C 60000 --aes-pwd /etc/telegram/hello-explorers-how-are-you-doing -u root $CONFIG --allow-skip-dh --nat-info "$INTERNAL_IP:$IP" $SECRET_CMD $TAG_CMD 

Assembly


Under CentOS 7 MTProxy on Habré already collected , we will try to assemble an image under Alpine and save megabytes, 130, in the resulting docker image.

A distinctive feature of Alpine Linux is the use of musl instead of glibc. Both are standard C libraries. Musl is miniaturized (it doesn’t have the fifth part of the “standard”), but the volume and performance (promised at least) decide when it comes to Docker. Yes, and put glibc on Alpine is not racially correct, uncle Jakub Jirutka will not understand , for example.

We will also collect in docker, to isolate dependencies and get freedom for experiments, so create a new Dockerfile:

 FROM alpine:3.6 RUN apk add --no-cache git make gcc musl-dev linux-headers openssl-dev RUN git clone --single-branch --depth 1 https://github.com/TelegramMessenger/MTProxy.git /mtproxy/sources RUN cd /mtproxy/sources \ && make -j$(getconf _NPROCESSORS_ONLN) 

From dependencies, we can use git (and not only for cloning the official repository, the make file will embed the sha of the commit into the version), make, gcc and header files (the minimum set is obtained experimentally). We only clone the master branch with a depth of 1 commit (we definitely do not need a story). Well, let's try to utilize all host resources when compiling with the -j key. Deliberately divided into more layers in order to get convenient caching during reassembly (usually there are a lot of them).

We will run the build command (being in a directory with Dockerfile):

 $ docker build -t mtproxy:test . 

Here is the first problem:

 In file included from ./net/net-connections.h:34:0, from mtproto/mtproto-config.c:44: ./jobs/jobs.h:234:23: error: field 'rand_data' has incomplete type struct drand48_data rand_data; ^~~~~~~~~ 

Actually, all the following will be associated with it. First, for those who are not familiar with the syami, the compiler actually swears at the absence of the declaration of the drand48_data structure. Secondly, the musl developers scored on thread-safe random functions (with the _r postfix) and on everything connected with them (including structures). And the Telegram developers, in turn, did not bother with compiling for systems where random_r and its brethren are not implemented (in many OS libraries you can see the HAVE_RANDOM_R flag or its arbitrary + the presence or absence of this group of functions is usually taken into account in the auto-configurator).

Well, now we will definitely install glibc? Not! We will copy from glibc at a minimum what we need and make a patch for the MTProxy sources.

In addition to problems with random_r, we hapnem the problem with the backtrace function (execinfo.h), which is used to output stack backtrace in the case of an exception: you could try replacing it with implementation from libunwind, but it's not worth it, because the call was framed by checking for __GLIBC__.

The contents of the patch random_compat.patch
 Index: jobs/jobs.h IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- jobs/jobs.h (revision cdd348294d86e74442bb29bd6767e48321259bec) +++ jobs/jobs.h (date 1527996954000) @@ -28,6 +28,8 @@ #include "net/net-msg.h" #include "net/net-timers.h" +#include "common/randr_compat.h" + #define __joblocked #define __jobref Index: common/server-functions.c IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- common/server-functions.c (revision cdd348294d86e74442bb29bd6767e48321259bec) +++ common/server-functions.c (date 1527998325000) @@ -35,7 +35,9 @@ #include <arpa/inet.h> #include <assert.h> #include <errno.h> +#ifdef __GLIBC__ #include <execinfo.h> +#endif #include <fcntl.h> #include <getopt.h> #include <grp.h> @@ -168,6 +170,7 @@ } void print_backtrace (void) { +#ifdef __GLIBC__ void *buffer[64]; int nptrs = backtrace (buffer, 64); kwrite (2, "\n------- Stack Backtrace -------\n", 33); @@ -178,6 +181,7 @@ kwrite (2, s, strlen (s)); kwrite (2, "\n", 1); } +#endif } pthread_t debug_main_pthread_id; Index: common/randr_compat.h IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- common/randr_compat.h (date 1527998264000) +++ common/randr_compat.h (date 1527998264000) @@ -0,0 +1,72 @@ +/* + The GNU C Library is free software. See the file COPYING.LIB for copying + conditions, and LICENSES for notices about a few contributions that require + these additional notices to be distributed. License copyright years may be + listed using range notation, eg, 2000-2011, indicating that every year in + the range, inclusive, is a copyrightable year that would otherwise be listed + individually. +*/ + +#pragma once + +#include <endian.h> +#include <pthread.h> + +struct drand48_data { + unsigned short int __x[3]; /* Current state. */ + unsigned short int __old_x[3]; /* Old state. */ + unsigned short int __c; /* Additive const. in congruential formula. */ + unsigned short int __init; /* Flag for initializing. */ + unsigned long long int __a; /* Factor in congruential formula. */ +}; + +union ieee754_double +{ + double d; + + /* This is the IEEE 754 double-precision format. */ + struct + { +#if __BYTE_ORDER == __BIG_ENDIAN + unsigned int negative:1; + unsigned int exponent:11; + /* Together these comprise the mantissa. */ + unsigned int mantissa0:20; + unsigned int mantissa1:32; +#endif /* Big endian. */ +#if __BYTE_ORDER == __LITTLE_ENDIAN + /* Together these comprise the mantissa. */ + unsigned int mantissa1:32; + unsigned int mantissa0:20; + unsigned int exponent:11; + unsigned int negative:1; +#endif /* Little endian. */ + } ieee; + + /* This format makes it easier to see if a NaN is a signalling NaN. */ + struct + { +#if __BYTE_ORDER == __BIG_ENDIAN + unsigned int negative:1; + unsigned int exponent:11; + unsigned int quiet_nan:1; + /* Together these comprise the mantissa. */ + unsigned int mantissa0:19; + unsigned int mantissa1:32; +#else + /* Together these comprise the mantissa. */ + unsigned int mantissa1:32; + unsigned int mantissa0:19; + unsigned int quiet_nan:1; + unsigned int exponent:11; + unsigned int negative:1; +#endif + } ieee_nan; +}; + +#define IEEE754_DOUBLE_BIAS 0x3ff /* Added to exponent. */ + +int drand48_r (struct drand48_data *buffer, double *result); +int lrand48_r (struct drand48_data *buffer, long int *result); +int mrand48_r (struct drand48_data *buffer, long int *result); +int srand48_r (long int seedval, struct drand48_data *buffer); \ No newline at end of file Index: Makefile IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- Makefile (revision cdd348294d86e74442bb29bd6767e48321259bec) +++ Makefile (date 1527998107000) @@ -40,6 +40,7 @@ DEPENDENCE_NORM := $(subst ${OBJ}/,${DEP}/,$(patsubst %.o,%.d,${OBJECTS})) LIB_OBJS_NORMAL := \ + ${OBJ}/common/randr_compat.o \ ${OBJ}/common/crc32c.o \ ${OBJ}/common/pid.o \ ${OBJ}/common/sha1.o \ Index: common/randr_compat.c IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- common/randr_compat.c (date 1527998213000) +++ common/randr_compat.c (date 1527998213000) @@ -0,0 +1,120 @@ +/* + The GNU C Library is free software. See the file COPYING.LIB for copying + conditions, and LICENSES for notices about a few contributions that require + these additional notices to be distributed. License copyright years may be + listed using range notation, eg, 2000-2011, indicating that every year in + the range, inclusive, is a copyrightable year that would otherwise be listed + individually. +*/ + +#include <stddef.h> +#include "common/randr_compat.h" + +int __drand48_iterate (unsigned short int xsubi[3], struct drand48_data *buffer) { + uint64_t X; + uint64_t result; + + /* Initialize buffer, if not yet done. */ + if (!buffer->__init == 0) + { + buffer->__a = 0x5deece66dull; + buffer->__c = 0xb; + buffer->__init = 1; + } + + /* Do the real work. We choose a data type which contains at least + 48 bits. Because we compute the modulus it does not care how + many bits really are computed. */ + + X = (uint64_t) xsubi[2] << 32 | (uint32_t) xsubi[1] << 16 | xsubi[0]; + + result = X * buffer->__a + buffer->__c; + + xsubi[0] = result & 0xffff; + xsubi[1] = (result >> 16) & 0xffff; + xsubi[2] = (result >> 32) & 0xffff; + + return 0; +} + +int __erand48_r (unsigned short int xsubi[3], struct drand48_data *buffer, double *result) { + union ieee754_double temp; + + /* Compute next state. */ + if (__drand48_iterate (xsubi, buffer) < 0) + return -1; + + /* Construct a positive double with the 48 random bits distributed over + its fractional part so the resulting FP number is [0.0,1.0). */ + + temp.ieee.negative = 0; + temp.ieee.exponent = IEEE754_DOUBLE_BIAS; + temp.ieee.mantissa0 = (xsubi[2] << 4) | (xsubi[1] >> 12); + temp.ieee.mantissa1 = ((xsubi[1] & 0xfff) << 20) | (xsubi[0] << 4); + + /* Please note the lower 4 bits of mantissa1 are always 0. */ + *result = temp.d - 1.0; + + return 0; +} + +int __nrand48_r (unsigned short int xsubi[3], struct drand48_data *buffer, long int *result) { + /* Compute next state. */ + if (__drand48_iterate (xsubi, buffer) < 0) + return -1; + + /* Store the result. */ + if (sizeof (unsigned short int) == 2) + *result = xsubi[2] << 15 | xsubi[1] >> 1; + else + *result = xsubi[2] >> 1; + + return 0; +} + +int __jrand48_r (unsigned short int xsubi[3], struct drand48_data *buffer, long int *result) { + /* Compute next state. */ + if (__drand48_iterate (xsubi, buffer) < 0) + return -1; + + /* Store the result. */ + *result = (int32_t) ((xsubi[2] << 16) | xsubi[1]); + + return 0; +} + +int drand48_r (struct drand48_data *buffer, double *result) { + return __erand48_r (buffer->__x, buffer, result); +} + +int lrand48_r (struct drand48_data *buffer, long int *result) { + /* Be generous for the arguments, detect some errors. */ + if (buffer == NULL) + return -1; + + return __nrand48_r (buffer->__x, buffer, result); +} + +int mrand48_r (struct drand48_data *buffer, long int *result) { + /* Be generous for the arguments, detect some errors. */ + if (buffer == NULL) + return -1; + + return __jrand48_r (buffer->__x, buffer, result); +} + +int srand48_r (long int seedval, struct drand48_data *buffer) { + /* The standards say we only have 32 bits. */ + if (sizeof (long int) > 4) + seedval &= 0xffffffffl; + + buffer->__x[2] = seedval >> 16; + buffer->__x[1] = seedval & 0xffffl; + buffer->__x[0] = 0x330e; + + buffer->__a = 0x5deece66dull; + buffer->__c = 0xb; + buffer->__init = 1; + + return 0; +} \ No newline at end of file 

Put it in the ./patches folder and slightly modify our Dockerfile to apply the patch on the fly:

 FROM alpine:3.6 COPY ./patches /mtproxy/patches RUN apk add --no-cache --virtual .build-deps \ git make gcc musl-dev linux-headers openssl-dev \ && git clone --single-branch --depth 1 https://github.com/TelegramMessenger/MTProxy.git /mtproxy/sources \ && cd /mtproxy/sources \ && patch -p0 -i /mtproxy/patches/randr_compat.patch \ && make -j$(getconf _NPROCESSORS_ONLN) \ && cp /mtproxy/sources/objs/bin/mtproto-proxy /mtproxy/ \ && rm -rf /mtproxy/{sources,patches} \ && apk add --no-cache --virtual .rundeps libcrypto1.0 \ && apk del .build-deps 

Now the assembled mtproto-proxy binary is at least launched, and we can move on.

Registration


It's time to turn the original run.sh into docker-entrypoint.sh. In my opinion, this is logical when the “binding binding” goes to ENTRYPOINT (it can always be overloaded from the outside), and the arguments for launching a docked application can be maximized in the CMD (+ environment variables as an understudy).

We could install bash and full-fledged grep in our alpine image (I will explain later) to avoid the headache and use the original code as it is, but it will disgrace our miniature image, so we will grow the real, his mother, bonsai.

Let's start with shebanga, replace #!/bin/bash with #!/bin/sh . The default for alpine ash is able to digest almost the entire bash "as is" syntax, but we will still face one problem - for unknown reasons, he refused to adopt a parenthesis in one of the conditions, so we will deploy it, inverting the comparison logic:



Now we are waiting for disassembly with grep'om, which in a busybox delivery is slightly different from the usual (and, by the way, much slower, keep in mind in your projects). First, he does not understand the expression {,15} , it will be necessary to explicitly specify {0,15} . Secondly, it does not support the -P (perl style) flag, but it calmly digests the expression when extended (-E) is enabled.

All we have in dependencies is curl (there is no point in replacing it with wget from busybox) and libcrypto (it’s enough, directly openssl is not required at all in this build).

A couple of years ago, Docker had a wonderful multi-stage build , it is ideal, for example, for Go applications or for situations where the build is complex and it is easier to transfer artifacts from the image to the image than to do the final cleanup. For planting our bonsai we use it, it will save a little.

 FROM alpine:3.6 #         ( ) RUN apk add --no-cache --virtual .build-deps \ # ... ,    && make -j$(getconf _NPROCESSORS_ONLN) FROM alpine:3.6 #   ,  ,   WORKDIR /mtproxy COPY --from=0 /mtproxy/sources/objs/bin/mtproto-proxy . #          #          

Bonsai should be bonsai - get rid of installing libcrypto. When building, we needed the header files from the openssl-dev package, which will pull up libcrypto in dependencies and our executable file will be oriented towards using libcrypto.so.1.0.0. But this is the only dependency, moreover, it is preinstalled in Alpine (in version 3.6 it is libcrypto.so.41, 3.7 - libcrypto.so.42, lies in / lib /). They will scold me now, this is not the most reliable way, but it is worth it and we will still add a symlink to the existing version (if you have a better way, I will accept PR with pleasure).

Final touches and results:

Docker hub
Github

I would appreciate any advice and contribution.

Source: https://habr.com/ru/post/413881/


All Articles