LLTR Part 0: Automatic detection of network topology and unmanaged switches. Mission Impossible?

KDPV: LLTR Part 0 - Pneumatic transport from Futurama

How to build a network topology at the data link layer if only unmanaged switches are used in the desired subnet? In the article I will try to answer this question.

I'll start with the cause of LLTR ( Link Layer Topology Reveal ).

I had one “bike” - a large file synchronizer “at full network speed”, capable of completely pouring a 120 GiB file over Fast Ethernet ( 100 Mbps ; 100BASE-TX; duplex) for 1 hour , 10, 30, or > 200 pcs . It was a very useful "bike", because The file synchronization speed was almost independent of the number of PCs to which the file should be uploaded. Everything is good, but it requires knowledge of the network topology for its work.

Read more in the article about him:

“ RingSync: sync at full network speed ”.
(see PPPS )

Well, why did you need to “chase” a 120 GiB file over the network for so many PCs?

This file was a VHD with the operating system, programs, etc. The file was created on the master system, and then distributed to all other PCs. VHD was not only the way to deliver the system to the final PCs, but also made it possible to restore the original state of the system when the PC was restarted. Read more in the article: “ Freezing the system: the history of the transition from EWF to dVHD ”.

You can continue the chain further, but at this point I’ll break off.

Existing data link layer topology detection protocols ( LLDP , LLTD , CDP , ...) require appropriate support from all intermediate network nodes for their work. That is, they require at least managed switches that would support the appropriate protocol. On Habré, there was already an article on how to use these protocols to “ define the network topology at the 2/3 levels of the OSI model ”.

But what to do if intermediate nodes are simple unmanaged switches?

If you are interested in how this can be done, then welcome under cat. I promise the presence of many illustrations and examples.

{image size: 924 KiB; text: 69 KiB; emoticons: 9 pcs. }

Some usercss before reading

Note : initially this spoiler was placed before kata.

Surely everyone who wanted to customize the styles for themselves, have already done so. However, here is part of my UserCSS. Major changes:

see the end of an open spoiler (useful when a spoiler is big, and you don’t immediately notice the difference in the indent between the main text and the text in the spoiler), or rather returned the old frame and background of the spoiler;
the quote block returned its previous appearance (I showed several people who did not understand the Russian language and did not read Habr, new “yellow quotes”, and they said that these were inserts of contextual advertising ...);
the main font, its size, and line spacing, also went back (IMHO, with them the long text is easier to understand);
removed publication rating and number of views, but left the number of bookmarks.

In general, I returned (with minor modifications) the familiar look of the main elements of the article. With such a design has already been read a large number of good articles (pleasant memories emerge).

@charset "utf-8"; body { font-family: Verdana,sans-serif !important; } .nav-links__item-link { font-family: Helvetica,Arial,sans-serif !important; } .comment__message, .comment-form__preview { font-family:Arial !important; } .post__text { font-size:13px !important; line-height:1.60 !important; } .post__title-text, .post__title_link { font-family: Tahoma,sans-serif !important; line-height:118% !important; } .post__title-text { font-size:30px !important; } .post__title_link { font-size:26px !important; } /* .post__text-html p > br:first-child, .post__text p+br { display:none !important; } .post__text p { margin-bottom: 0.9em; margin-top: 0.9em; } /* for test: https://habr.com/post/315186 :（ */ .post__text br { line-height:normal !important; } /* or 1 ; https://habr.com/company/pt/blog/337450 */ .post__text img { -o-object-fit:contain; object-fit:scale-down; } /* https://stackoverflow.com/questions/787839/resize-image-proportionally-with-css and don't use "height:auto; width:auto;" (example: <img src="img32x32_2x.png" width="16" height="16">)*/ .post__text h1, .post__text h2, .post__text h3 { font-family: Helvetica,Arial,sans-serif !important; } .post__text h2 { font-size:1.5em !important; } .post__text h3 { font-size:1.375em !important; } .post__text h4, .post__text h5, .post__text h6 { font-family: Verdana,sans-serif !important; font-size:1.125em !important; font-weight:bold !important; } .post__text h5 { color:#3D3D3D !important; } .post__text h6 { color:#5C5C5C !important; } .post__text ul li, .post__text ul ul li, .post__text ul ol li, .post__text ol li, .post__text ol ul li, .post__text ol ol li { line-height:inherit !important; padding:0 !important; } .post__text ul, .post__text ul ul, .post__text ul ol, .post__text ol, .post__text ol ul, .post__text ol ol { margin-left:35px !important; } .post__text ul ul, .post__text ul ol, .post__text ol ul, .post__text ol ol { margin-bottom:9px !important; } .post__text .spoiler .spoiler_title { color:#6DA3BD !important; font-size:inherit !important; font-weight:400 !important; } .post__text .spoiler::before { margin-top:2px !important; } .post__text .spoiler .spoiler_text { color:#666 !important; background:#F9F9F9 !important; border:1px solid #EEE !important; } .post__text .spoiler .spoiler_text hr:first-child, .post__text .spoiler .spoiler_text hr:last-child { display:none !important; } .post__text code { font-size:13px !important; /*background-color:#F8F8F8 !important;*/ border:1px solid #F2F2F2 !important; border-radius:3px !important; padding:0 0.25em !important; white-space:pre-wrap !important; } .post__text strong > code { font-weight: normal !important; background-color: #F8F8F8 !important; } .post__text pre code { line-height:1.5 !important; background-color:#F8F8F8 !important; border-color:#DDD !important; padding:0.5em 1em !important; white-space:pre !important; overflow-x:auto !important; } .hljs-comment, .hljs-quote { color:#808080 !important; font-style:inherit !important; } .hljs-doctag,.hljs-keyword,.hljs-formula, .hljs-section,.hljs-name,.hljs-selector-tag,.hljs-deletion,.hljs-subst { color:#4d7386 !important; } .hljs-literal{ color:#7296a3 !important; } .hljs-string,.hljs-regexp,.hljs-addition,.hljs-meta-string{ color:#390 !important; } .hljs-built_in,.hljs-class .hljs-title{ color:#968e5b !important; } .hljs-attr,.hljs-attribute,.hljs-variable,.hljs-template-variable,.hljs-type,.hljs-selector-class,.hljs-selector-attr,.hljs-selector-pseudo,.hljs-number{ color:#2f98ff !important; } .hljs-symbol,.hljs-bullet,.hljs-link,.hljs-meta,.hljs-selector-id,.hljs-title{ color:#968e5b !important; } .post__text kbd { display:inline-block !important; padding:0.2em 0.5em 0.3em !important; vertical-align:middle !important; background-color:#FCFCFB !important; border:1px solid #D9D9D9 !important; border-radius:3px !important; border-bottom-color:#C6CBD1 !important; box-shadow:inset 0px -1px 0px #C6CBD1 !important; font:11px/10px Tahoma, sans-serif !important; color:#575757 !important; text-shadow:0px 1px 0px #FFF !important; } .post__text blockquote, .comment__message blockquote, .comment-form__preview blockquote { background:inherit !important; border-left:0.25em solid #DFE2E5 !important; color:#70767D !important; padding:0 1em !important; } .post__text .user_link { display:inline-block !important; padding-left:14px !important; color:#666 !important; font:92.4%/1.5em Arial !important; background:url("https://hsto.org/storage/habrastock/i/bg-user2.gif") 0px 60% no-repeat !important; } .post__text .user_link::before { content:'@' !important; color:transparent !important; position:absolute !important; left:0 !important; }/*for copy-paste (for Opera Presto)*/ .voting-wjt_post>.voting-wjt__counter, .post-stats__item_views { display:none !important; }

UPDATE : UserJS - Anti-quotes-hell and <code> (see more in this comment ):

 // ==UserScript== // @version    1.0.0 // @namespace  https://habr.com/users/ZiroKyl/ // @homepageURL https://habr.com/post/414799/ // @name       Anti-quotes-hell for Habr // @include    https://habr.com/post/* // @include    https://habr.com/company/*/blog/* // ==/UserScript== // Enable opera:config#UserPrefs|UserJavaScriptonHTTPS if(self == top){   window.opera.addEventListener("BeforeEvent.DOMContentLoaded", function(){ "use strict";       var code = document.querySelectorAll(".post__text :not(pre) > code");       var unQuote = function(el){           var aN = null, a = "",              bN = null, b = "";           if((aN = el.previousSibling) && (bN = el.nextSibling) &&              aN.nodeType == 3        && bN.nodeType == 3    ){               a = aN.data; a = a.charCodeAt(a.length-1);               b = bN.data; b = b.charCodeAt(0);               // a != " " && ( a+1 == b && (a == "“" || a == "'" ) || (a == "«" && b == "»") || a == b && (a == "'" || a == "`" || a == '"') )               if(a != 32 && ( a+1 == b && (a == 8220 || a == 8216) || (a == 171 && b == 187) || a == b && (a == 39 || a == 96 || a == 34 ) )){                   aN.data = aN.data.slice(0,-1);                   bN.data = bN.data.slice(1);                   return true;               }           }           return false;       };       var aN = null, bN = null, pElTagName = "";       for(var i=code.length;i--;){           // “<code>...</code>” -> <code>...</code>           if(!unQuote(code[i])                                                                 &&               ( code[i].previousElementSibling == null && code[i].nextElementSibling == null &&                (pElTagName = code[i].parentElement.tagName)                                &&                (pElTagName == "A" || pElTagName == "STRONG")                                ) &&               ( (!(aN = code[i].previousSibling) || aN.data.trim().length == 0) &&                (!(bN = code[i].nextSibling)    || bN.data.trim().length == 0) )             ){               // “<a><code>...</code></a>”    -> <a><code>...</code></a>               // “<a> <code>...</code> </a>”  -> <a> <code>...</code> </a>               // “<a><code>...</code>...</a>” -X               unQuote(code[i].parentElement);           }       }   }, false); }

The main styles are taken from previous versions of ~~the~~ Habr ~~Matrix~~ :

2012-06 (UserCSS) - the main font Verdana 13px, line spacing 160%
2012-06 - the first appearance of the spoiler + block code
2012-04 - table + headers
2012-05 - more headlines
2012-05 is just a good article.
2011-05 - 1.54em line spacing (in a narrow column, with empty space left and right, the text reads worse than at 160% interval), blocks with code changed the background color and lost the stroke, (what else: the “cap” of the hubs changed [one article can be in many hubs] become blogs [one article can be only in one blog])
2011-01 - the content is stretched to the full width of the screen (in this format, the text with a reduced line spacing looks optimal, IMHO), IMHO: the wide right column (with a screen width of 1600px) creates a feeling of comfort, as well as better here than in other versions the comment looks (good indentation sizes are selected)
2010-08 , 2009-12 , 2009-05 , 2009-03 - changes on the example of the same article
2010-02 , 2009-06 - articles with denser text (longer paragraphs)
2010-03 , 2010-02 , 2009-06 , 2008-12 - positive memories about Opera
2007-12 - Habrazaphistka ：） and tag cloud in the right column
2007-04 - the line spacing is even smaller

About LLTR I'm going to write a series of articles with a general topic “how to create a protocol”. This article is null.

# Analogy from the physical world - pneumatic

Imagine that we have 2 pneumatic pipes ...

Photo of a room with a pneumatic tubing terminal and empty containers

One pipe for incoming parcels (red containers), and one for outgoing (blue containers).

We have several friends who are connected to the same pneumatic transport system. We can send them containers, and they can send us containers. The shipping address of the container is put on the container itself (for example, in RFID or bar ‑ code).

At some point, everyone wanted to know the route that their packages go through every day and to find out which switches (switching centers) their pneumatic tubes are connected to, learn the topology of the pneumatic network.

The path from the terminal to the communication center

All that we and our friends can do is send and receive containers with some content.

I propose to think about how in this case it is possible to build a topology of this network. Several options are under the spoiler:

spoiler

1. If the pipes are transparent, as in pneumatic conveying in Futuram ( KDPV ), then you can simply send a video camera that captures the entire path of movement of the container.

2. You can place the sensors in the container (gyroscope, compass, accelerometer, ...), and build a map of movements using the data collected by them.

3. You can do without sensors, sending in special ways containers filled with air. Below is considered this option, because It is applicable to conventional wired Ethernet networks.

# LLTR Basic

Returning to wired Ethernet networks, let me remind you of the problem that caused the LLTR to be created. The problem is described in detail in the section “PCs are connected to different switches” of the article about RingSync - this is the problem of slowing down if the chain of peers in the network with several switches is not correctly constructed. To properly build a chain of peers, you need information about the network topology.

A small example (no problem):

Chart: RingSync no problem. Note: compactly describe what is drawn on the diagram is quite difficult, therefore in alt images I will indicate the type of the image and the translation of the image file name.

We have 2 switches (connected by one “wire”), 4 hosts (peers) , all connections are duplex, and the speed of all connections is the same. Arrows indicate the movement of data. In this case, there is no problem - the data is transmitted at the maximum network speed.

Another example (there is a problem):

In this example, the chain of peers was constructed less successfully (because we do not know the network topology), which led to the formation of a “bottleneck” (two unidirectional data streams in one connection) and a drop in the overall data transfer rate.

In this case, the solution to the problem (determination of the network topology) lies in the reason for the need to solve it — in the formation of a bottleneck. The whole chain of problems looks like this: you need to get rid of “bottlenecks” → you need to build the “correct” chain → you need to define the network topology. By the way, we will not go through all the possible combinations of the chain, in search of a chain without “bottlenecks” - this is too long, instead we’ll act smarter / smarter:

Fill the network with traffic - select one of the hosts as a source of broadcast traffic. On all other hosts we will start collecting statistics on traffic received by broadcast. Next, select any 2 non-broadcast hosts, and start sending traffic from the first unicast host to the second host. According to the statistics of broadcast traffic collected on the hosts at this moment, it will be seen that on some hosts the speed of receiving broadcast traffic has dropped - these hosts, in this case, were connected to the right switch. And on all hosts connected to the left switch, the speed of receiving broadcast traffic has not changed.

The connection between the two switches became a “bottleneck” and allowed us to select all the hosts connected to the right switch in a separate cluster.

Note : In normal cases, it is customary to fight with broadcasting by all means, especially with the one that “utilizes all the bandwidth”, but we are dealing with a network on uncontrolled switches, which may have suffered from broadcast ‑ storm / flood, and at least once Life wants such a broadcast to bring benefits. By the way, it is quite possible to replace broadcast with unicast, only such a scan will take longer. For pneumatic transport, you will also have to use unicast until you release a cloning plant and install it at each switching center ： allow.

To build the correct network topology, it remains to repeat the same for all possible combinations of oriented pairs of non-broadcast hosts. What is meant by “oriented pairs” - you must first send unicast traffic from the first host to the second, collect the statistics, and then change the direction (traffic from the second to the first), and collect separate statistics on this option.

The number of combinations that need to be checked can be calculated using the formula n × (n − 1) {each (n) should be “said hello” to all the others (n − 1) , even if they had already been greeting each other before}, where n is the number all hosts minus one (broadcast host).

As a result, all collected statistics are fed to the input of a special algorithm (more about it in one of the following articles), which builds the network topology (or rather, it does more - it immediately builds the correct chain of peers for RingSync).

By the way, the minimum network configuration that is advisable to scan consists of two switches, each of which is connected to two hosts. Regarding the speed of broadcast and unicast, the broadcast traffic can be kept in the range of 75% - 100% of the “ pure data transfer rate ” (net bitrate; search on “Ethernet 100Base-TX”), and unicast in the range of 80% - 100% .

And what will happen if you remove one of the hosts?

There will be a network in which one switch to which 3 hosts are connected, and another switch is connected between one of the hosts and the switch. That is, there is only one switch in the network (inserted into the cut turned into a “jumper” - we do not consider it), and this is an ideal case for RingSync “bottleneck” nowhere to take. Once there is no problem then there is nothing to fix ：

# LLTR Advanced

^UFO _flew and ^left _this ^gap _here ? Reminds one of the pictures IQ test

As it was written above, when using LLTR Basic, you need to scan the network n × (n − 1) times, which leads us to O (n ² ). For a small number of hosts, this is not a problem:

4 hosts, n = 3 ⇒ 6 scans;
30 hosts, n = 29 ⇒ 812 scans.

But if we have 200 hosts, n = 199 ⇒ 39 402 scans. Further worse ...

Let's see how to improve the situation. Grocken two simple tree topology options:

switch star;
serial connection of switches.

# Topology: “star of switches”

The action shown in this picture is explained step by step.

We have 5 switches and 12 hosts. Hosts are indicated by circles [●] inside the switch to which they are connected. A completely black switch is a “root” switch.

We will single out one of the hosts as a source of broadcast traffic (shown on the diagram by a circle [○]).

If you use LLTR Basic, then for 12 hosts, n = 11 ⇒ you need 110 scans (iterations).

Iteration 1 . Choose one of the hosts (indicated by blue ) as a source (src) of unicast traffic, and one host (indicated by blue ● ) as a recipient of (dst) unicast traffic. Run, in a certain period of time, scanning and collecting statistics.

The collected statistics showed a drop in the rate of broadcast traffic on 9 ‑ and hosts. For clarity, the speed drop on all hosts connected to the switch is denoted as .

Based on the identified drop in speed, you can distribute the hosts into two clusters:

yellow (initial) - hosts on which the broadcast speed remained close to the maximum;
green - hosts on which a significant drop in broadcast speed was recorded.

Iteration 2 . Select other unicast src and dst hosts, and again run scanning and statistics collection.

The speed drop is fixed on 3 hosts. Now they are in the green cluster, because in the previous iteration, a drop in speed was also recorded on these hosts. Move these 3 hosts from the green cluster to the new red cluster.

Iteration 3 . Again we will select other unicast src and dst hosts, and again we will start scanning and collecting statistics.

The speed drop is fixed on the other 3 hosts. Now they are in a green cluster. Move these 3 hosts from the green cluster to the new purple cluster.

The bottom line : after three iterations out of 110, all the hosts were divided into clusters corresponding to their switches. In the remaining 107 iterations, new clusters will not be formed.

It was a perfect case - we either knew the network topology, or we were very lucky.

# I wonder what could be the options for the first iteration?

Option 1: “unicast on broadcast sw” . Unicast src is located on the same switch as broadcast src (as well as in the example above in the figure in iteration 1). Unicast dst is located on any other (not broadcast src) switch.

Animation: LLTR unicast is located on the broadcast switch

In all cases, the response of the network to the scan is the same - the speed drop on all non-broadcast src switches. This is understandable - unicast src merges into the same channel start as broadcast src - hence the drop in speed on all other switches, and it doesn't matter which of them is unicast dst.

Option 2: “unicast general” . Unicast src is located on any non-broadcast src switch. Unicast dst is located on any other (“not broadcast src” and “not unicast src”) switch.

In all cases, the speed drop occurs on the one of the switches on which the unicast dst was located. The same network behavior was at iteration 2 and 3 (see figure) from the example at the beginning of the section.

Option 3: “unicast to broadcast sw” . Unicast src is located on any “non-broadcast src” switch (as in option 2). Unicast dst is located on the same switch as broadcast src.

Animation: LLTR unicast to broadcast switch

In these cases, the network response to scanning is missing. If in the previous versions (1 and 2) a new cluster was created, in this variant new clusters are not created. In part, this is bad - we do not receive new information about the network topology (in fact, in some cases, and if this iteration is not the first, then you can get some information about the location of unicast dst).

Option 4: “unicast in sw” . Unicast src and dst are located on the same switch.

Here, the network also does not respond to scanning in any way, and accordingly there are no new clusters.

The total . Variants 1 and 2 are good, the network responds to them, and gives us new information about its topology. And on the basis of this information new clusters are created.

Options 3 and 4 can only say that unicast dst is either on the same switch with broadcast src, or on the same switch with unicast src. Moreover, they cannot give full information for one iteration about the exact location of unicast dst - there will always be either next to broadcast src, or with unicast src. The exact location can be obtained only by combining the obtained information with information from other iterations.

# Was the selection of unicast src and dst in the example random?

Maybe the choice was not random, and there is a hidden pattern in it?

Ok, we just saw how the network responds to various scan options. Remember the example from the beginning of the section, perhaps it is time to look at it from a “new angle”, and to answer the question: did I accidentally choose unicast src and dst, or cheat?

Chart: LLTR Advanced Star 0 - Probabilities

This picture is similar to the picture from the beginning of the section, the difference is that for each iteration alternative choices were added unicast src, and something on the right ...

This “something” is the calculation of the probability of the formation of a new cluster (the number of options in which a new cluster is created divided by the number of options in which a new cluster is created + the number of options that do not lead to the creation of a new cluster). Variants were calculated relative to fixed unicast src, and free unicast dst. Unicast dst “free” - this means that all possible variants of its location were considered.

That is, in the example, I specifically chose those options that had the highest probability of forming a new cluster. Unfortunately, in reality, such an assessment (probabilities) we can not use. No wonder at the end I wrote:

It was a perfect case - we either knew the network topology, or we were very lucky.

However, the ability to assess the probability of the formation of a new cluster will be useful to us below.

# Scan inside the cluster, or use intercluster scanning - that is the question ...

In the example above, the last (3rd) iteration used intercluster scanning. Is it good, because at first they used scanning inside the cluster?

Note : If unicast src and dst are in the same cluster, it is a scan within the cluster . If unicast src is in one cluster, and unicast dst in the other is intercluster scanning .

If you look at the probabilities in the 3rd iteration, then the intercluster scan will have 0.60 versus 0.30 for the intra-cluster scan. Probability seems to tell us “use intercluster scanning, bro.”, But is it worth while blindly following her advice?

Note : Using only one type of scan (either within a cluster or inter cluster) will significantly reduce the number of iterations.

If in the example above, starting from the 4th iteration, to start scanning only inside the cluster , then it will take 20 iterations (3 × 2 + 3 × 2 + 2 × 1 + 3 × 2) , in total, we get 23 iterations instead of 110 (as was calculated at the beginning of the section). If from the same 4th iteration, start using only inter-cluster scanning , then it will take 87 iterations (3 × (3 + 2 + 3) + 3 × (3 + 2 + 3) + 2 × (3 + 3 + 3) + 3 × (3 + 3 + 2) - 3) , a total of 90 iterations will be obtained.

Intercluster scanning is noticeably losing, moreover, it cannot be used at the 1st iteration (at this moment there is only 1 cluster). If you pay attention to the 2nd iteration from the example above, then you can see that the last alternative scan option has a probability of forming a new cluster of 0.00. I believe that the answer to the question: “What kind of scanning did this alternative have?” Is already clear.

# The charm of parallel scanning

If, using scanning inside the cluster , it turns out to launch scanning in parallel at once in all clusters, then an additional 20 iterations (3 × 2 + 3 × 2 + 2 × 1 + 3 × 2) will be reduced to 6 iterations (3 × 2). In the case of intercluster scanning , it is also possible to obtain a benefit from parallel scanning.

The question is whether one scan will affect the results of another scan, being launched in parallel.

Note : ( ), – , ?

: – ; , 2‑ — .

– 6‑ . , unicast src dst , 2 . , 2‑ : “ 0.00”.

“ ”, . , ‑ …

, , . 2‑ (“ unicast on broadcast sw ”: unicast src , broadcast src) (“unicast in sw”: unicast src dst ).

“ unicast on broadcast sw ” , “unicast in sw”, . , “unicast in sw” , unicast src ( – ), 2 .

. , ( , ), . “” , broadcast src. , , broadcast src , ! .

Note : , unicast src , , ( ), .

: .

IQ …

, ( …)

# : “ ”

unicast ( ) broadcast ( ) , broadcast ( ). 5‑ , “” , – . , , , (↼), ( ) – (⇁), .. (⇋).

5 15 . LLTR Basic, 15 , n=14 ⇒ 182 .

. 1‑ , , unicast src , broadcast src, unicast dst broadcast src . Unicast ( ) broadcast broadcast src . ( ). 2‑ , , unicast dst broadcast src .

. ( ).

3 . broadcast unicast – .

4 . broadcast unicast – .

5 . Unicast src dst broadcast src – , unicast dst ( ). , 2‑ , , “ ”.

, , (2 ):