Comparing sorting exchanges



Spherical algorithms in a vacuum are great. However, let us go down from heaven to the sinful earth and see how all this theoretical beauty will show itself in practice.

Analysis of the next sorting class will end with tests for class sorting. Today we will drive away (not in the sense of throwing it out, but in the sense of running it on tests) sorting exchanges. Sorting other classes will be banished later.

In today's race are:

The weakest group


These algorithms do not claim anything due to depressing time complexity. We test them purely for reference.

Silly sorting
Sluggish sorting
Blunt sorting

Rather, it’s interesting not how good they are, but how bad they are, and which of them is the worst.

Core group


Here gathered exchange sorts a la O ( n 2 ). Dwarf sorting (and its optimization) + all sorts of variations of bubble sorting.

Dwarf sort
Dwarf sort (optimized)
Bubble Sort
Cocktail Sort
Even-odd sort

This is the most interesting part of today's measurements. It is among the representatives of the main group that the most persistent struggle is expected.

The strongest group


One type of bubble - sorting a comb - managed to overcome the O ( n 2 ) barrier and reach the respectable O ( n log n ). So in our competition, quick sorting has a worthy opponent. In addition, let's test the recently invented k-sort , which is a kind of quick sort.

Hairbrush sorting
Quick sort
K-sort

The strongest, by the way, is not only comparable with each other. For them on some data sets the main group will be a serious competitor.

Source


Sort exchanges in Python
def stooge_rec(data, i = 0, j = None): if j is None: j = len(data) - 1 if data[j] < data[i]: data[i], data[j] = data[j], data[i] if j - i > 1: t = (j - i + 1) // 3 stooge_rec(data, i, j - t) stooge_rec(data, i + t, j) stooge_rec(data, i, j - t) return data def stooge(data): return stooge_rec(data, 0, len(data) - 1) def slow_rec(data, i, j): if i >= j: return data m = (i + j) // 2 slow_rec(data, i, m) slow_rec(data, m + 1, j) if data[m] > data[j]: data[m], data[j] = data[j], data[m] slow_rec(data, i, j - 1) return data def slow(data): return slow_rec(data, 0, len(data) - 1) def stupid(data): i, size = 1, len(data) while i < size: if data[i - 1] > data[i]: data[i - 1], data[i] = data[i], data[i - 1] i = 1 else: i += 1 return data def gnome(data): i, size = 1, len(data) while i < size: if data[i - 1] <= data[i]: i += 1 else: data[i - 1], data[i] = data[i], data[i - 1] if i > 1: i -= 1 return data def gnome_opt(data): i, j, size = 1, 2, len(data) while i < size: if data[i - 1] <= data[i]: i, j = j, j + 1 else: data[i - 1], data[i] = data[i], data[i - 1] i -= 1 if i == 0: i, j = j, j + 1 return data def bubble(data): changed = True while changed: changed = False for i in range(len(data) - 1): if data[i] > data[i+1]: data[i], data[i+1] = data[i+1], data[i] changed = True return data def shaker(data): up = range(len(data) - 1) while True: for indices in (up, reversed(up)): swapped = False for i in indices: if data[i] > data[i+1]: data[i], data[i+1] = data[i+1], data[i] swapped = True if not swapped: return data def odd_even(data): n = len(data) isSorted = 0 while isSorted == 0: isSorted = 1 for i in range(1, n - 1, 2): if data[i] > data[i + 1]: data[i], data[i + 1] = data[i + 1], data[i] isSorted = 0 for i in range(0, n - 1, 2): if data[i] > data[i + 1]: data[i], data[i + 1] = data[i + 1], data[i] isSorted = 0 return data def comb(data): gap = len(data) swaps = True while gap > 1 or swaps: gap = max(1, int(gap / 1.25)) # minimum gap is 1 swaps = False for i in range(len(data) - gap): j = i + gap if data[i] > data[j]: data[i], data[j] = data[j], data[i] swaps = True return data def quick(data): less = [] pivotList = [] more = [] if len(data) <= 1: return data else: pivot = data[0] for i in data: if i < pivot: less.append(i) elif i > pivot: more.append(i) else: pivotList.append(i) less = quick(less) more = quick(more) return less + pivotList + more def k(data): return k_rec(data, 0, len(data) - 1) def k_rec(data, left, right): key = data[left] i = left j = right + 1 k = left + 1 p = left + 1 temp = False while j - i >= 2: if key <= data[p]: if p != j and j != right + 1: data[j] = data[p] elif j == right + 1: temp = data[p] j -= 1 p = j else: data[i] = data[p] i += 1 k += 1 p = k data[i] = key if temp: data[i + 1] = temp if left < i - 1: k_rec(data, left, i - 1) if right > i + 1: k_rec(data, i + 1, right) return data 

PHP sorting exchanges
  // function StoogeSort(&$arr, $i, $j) { if($arr[$j] < $arr[$i]) list($arr[$j], $arr[$i]) = array($arr[$i], $arr[$j]); if(($j - $i) > 1) { $t = ($j - $i + 1) / 3; StoogeSort($arr, $i, $j - $t); StoogeSort($arr, $i + $t, $j); StoogeSort($arr, $i, $j - $t); } return $arr; } // function SlowSort(&$arr, $i, $j) { if($i >= $j) return $arr; $m = ($i + $j) / 2; SlowSort($arr, $i, $m); SlowSort($arr, $m + 1, $j); if($arr[$m] > $arr[$j]) list($arr[$m], $arr[$j]) = array($arr[$j], $arr[$m]); SlowSort($arr, $i, $j - 1); return $arr; } // function StupidSort($arr) { $i = 1; $size = count($arr); while($i < $size) { if($arr[$i - 1] <= $arr[$i]){ ++$i; } else { list($arr[$i - 1], $arr[$i]) = array($arr[$i], $arr[$i - 1]); $i = 1; } } return $arr; } // function GnomeSort($arr) { $i = 1; $size = count($arr); while($i < $size) { if($arr[$i - 1] <= $arr[$i]){ ++$i; } else { list($arr[$i - 1], $arr[$i]) = array($arr[$i], $arr[$i - 1]); if($i > 1) --$i; } } return $arr; } // () function GnomeSort_opt($arr) { $i = 1; $j = 2; $size = count($arr); while($i < $size) { if($arr[$i - 1] <= $arr[$i]){ $i = $j; ++$j; } else { list($arr[$i - 1], $arr[$i]) = array($arr[$i], $arr[$i - 1]); if(--$i == 0){ $i = $j; $j++; } } } return $arr; } // function BubbleSort($arr) { $c = count($arr) - 1; do { $swapped = false; for ($i = 0; $i < $c; ++$i) { if ($arr[$i] > $arr[$i + 1]) { list($arr[$i + 1], $arr[$i]) = array($arr[$i], $arr[$i + 1]); $swapped = true; } } } while($swapped); return $arr; } // function ShakerSort($arr){ do{ $swapped = false; for($i = 0; $i < count($arr); $i++){ if(isset($arr[$i + 1])) { if($arr[$i] > $arr[$i+1]) { list($arr[$i], $arr[$i + 1]) = array($arr[$i + 1], $arr[$i]); $swapped = true; } } } if ($swapped == false) break; $swapped = false; for($i = count($arr) - 1; $i >= 0; $i--){ if(isset($arr[$i - 1])){ if($arr[$i] < $arr[$i - 1]) { list($arr[$i], $arr[$i - 1]) = array($arr[$i - 1], $arr[$i]); $swapped = true; } } } } while($swapped); return $arr; } //- function OddEvenSort($arr){ $n = count($arr); $sorted = false; while(!$sorted) { $sorted = true; for($i = 1; $i < $n - 2; $i += 2) { if($arr[$i] > $arr[$i + 1]) { list($arr[$i], $arr[$i + 1]) = array($arr[$i + 1], $arr[$i]); $sorted = false; } } for($i = 0; $i < $n - 1; $i += 2) { if($arr[$i] > $arr[$i + 1]) { list($arr[$i], $arr[$i + 1]) = array($arr[$i + 1], $arr[$i]); $sorted = false; } } } } // function CombSort($arr){ $gap = count($arr); $swap = true; while($gap > 1 || $swap){ if($gap > 1) $gap /= 1.25; $swap = false; $i = 0; while($i + $gap < count($arr)){ if($arr[$i] > $arr[$i + $gap]){ list($arr[$i], $arr[$i + $gap]) = array($arr[$i + $gap], $arr[$i]); $swap = true; } ++$i; } } return $arr; } // function QuickSort($arr) { $loe = $gt = array(); if(count($arr) < 2) { return $arr; } $pivot_key = key($arr); $pivot = array_shift($arr); foreach($arr as $val){ if($val <= $pivot){ $loe[] = $val; }elseif ($val > $pivot){ $gt[] = $val; } } return array_merge(QuickSort($loe), array($pivot_key => $pivot), QuickSort($gt)); } //k- function K_Sort($arr) { K_Sort_rec($arr, 0, count($arr) - 1); return $arr; } function K_Sort_rec(&$arr, $left, $right) { $key = $arr[$left]; $i = $left; $j = $right + 1; $k = $p = $left + 1; $temp = false; while($j - $i >= 2) { if($key <= $arr[$p]) { if(($p != $j) && ($j != ($right + 1))) { $arr[$j] = $arr[$p]; } elseif($j == ($right + 1)) { $temp = $arr[$p]; } --$j; $p = $j; } else { $arr[$i] = $arr[$p]; ++$i; ++$k; $p = $k; } } $arr[$i] = $key; if($temp) $arr[$i + 1] = $temp; if($left < ($i - 1)) K_Sort_rec($arr, $left, $i - 1); if($right > ($i + 1)) K_Sort_rec($arr, $i + 1, $right); } 

Time


Time is everywhere measured in seconds with an accuracy of microseconds.

If the algorithm is fed immediately with a pack of arrays (out of ten, one hundred, or even a million pieces), then the total time is indicated.

Iron - my faithful old laptop, who knew the best of times. So, it is quite possible that everything will work much faster for you.

Worst of the worst


Immediately deal with outsiders. For them, prepared arrays of 40/50/60 elements, containing random numbers from 0 to 9.
type = random, unique = False, min = 0, max = 9, count = 1
size
405060
PythonPhpPythonPhpPythonPhp
0.0040.0010.0050.0030.0070.004
0.0070.0090.0180.0090.0180.023
0.028.266470.05320.87220.1290145.9786

Blunt sorting is an order of magnitude faster than silly, and silly an order of magnitude faster than sluggish. There is nothing more to watch.

Middling


To test middling, packs of one hundred arrays of 100 elements each (unique numbers from 100 to 999), as well as packs of ten arrays of 1000 elements each (unique numbers from 1000 to 9999) are generated.

Randomized arrays, almost sorted arrays, and reverse-ordered arrays were prepared.

Serednyachki: Random


type = random, unique = True
size (from min to max) × count
100 (from 100 to 999) × 1001000 (1000 to 9999) × 10
PythonPhpPythonPhp
0.141010.189011.591091.7251
0.206010.223012.330132.09712
0.153010.229011.67112.23613
0.216010.264022.555152.73116
0.167010.334021.72513.13218

Approximately the same results for all. Implementations on Python are slightly faster than PHP.

Mets: Almost sorted


type = almost unique = true
size (from min to max) × count
100 (from 100 to 999) × 1001000 (1000 to 9999) × 10
PythonPhpPythonPhp
0.0090.0050.0090.005
0.0090.0140.010.014
0.010.010.0110.008
0.0110.0080.0130.008
0.0110.0170.0130.017

The same results for all, plus or minus. From the interesting: partial data ordering improves the performance of all algorithms tenfold.

Serednyachki: Reverse


type = reverse, unique = True
size (from min to max) × count
100 (from 100 to 999) × 1001000 (1000 to 9999) × 10
PythonPhpPythonPhp
0.268010.359023.100183.4292
0.396020.453034.497264.19524
0.227010.384022.481143.64421
0.302020.426033.344194.17524
0.304020.644043.365196.22036

Here is an interesting conclusion - the reverse ordering does not really affect the speed, which has fallen by only 2 times compared with the random number.

The Middle Games: Binary


type = random, unique = False, min = 0, max = 1
size × count
100 × 1001000 × 10
PythonPhpPythonPhp
0.0730.0940.811050.86905
0.104010.113011.163071.06606
0.088010.129010.864051.18207
0.115010.150011.301071.41608
0.114010.258011.319082.46314

From more or less interesting: if instead of numbers from 100 to 9999 to sort zeros and ones, then the speed indicators will not grow much.

Champions


Here the main intrigue is whether the sorting with a comb and k-sorting will be able to press the fast one off the pedestal?

Champions: Random


To find out, we will create packages of randomly arranged arrays: 10 pieces of 10 thousand elements and 10 pieces of 100 thousand elements each. I wanted to give millionaires to the input algorithms, but I didn’t pull the laptop. That RAM is not enough, the depth of recursion is too great.
type = random, unique = False, count = 10
size (from min to max)
10,000 (from 10,000 to 99999)100,000 (from 100,000 to 999999)
PythonPhpPythonPhp
0.802050.6310410.45068.24647
0.477031.640096.6513826.5435
0.900052.1861315.808929.4867

K-sorting on Python is slower, and PHP is faster than quick sort.
Sorting with a comb in comparison with a fast one looks solid, although it is inferior to it in all tests (and in these and subsequent ones) for any data sets.

However, there is a comb and an important distinct advantage. It has no recursion and therefore, unlike its colleagues, it successfully processes arrays consisting of millions of elements.

Special cases


Although champions are hundreds of times faster than average ones, on some specific data sets, sorting data more simply shows that they are not embroidered as well.

Special cases: Almost sorted


Let's try almost sorted arrays, just take a larger size than those that we tested for average ones.

A package of 10 arrays of 10 thousand each and a package of 10 arrays of 100 thousand elements each.
type = almost, unique = False
size (from min to max) × count
10,000 (from 10,000 to 99999) × 10100,000 (from 100,000 to 999999) × 10
PythonPhpPythonPhp
0.432020.0580.817050.58003
0.0840.0620.858050.54003
0.116010.124011.419081.36008
0.140010.119011.834111.41708
0.138010.231011.64912.56915
?122.088??
0.715041.589099.7835619.4731

Quick sorting is not present here at all - there was not enough RAM. At the same time, randomized arrays for 10/100 thousand quicksort elements process with a bang. k-sort faced similar problems, pulled only 10-thousandth of PHP. Large, almost sorted arrays are a degenerate case for quick sorting and its modifications. Because of this arrangement of elements, recursion divides an array into parts for almost every pair of adjacent elements, which leads to the maximum possible number of recursion calls.

By the way, the situation is similar with large reverse-ordered arrays. The same problem arises as with almost ordered - the algorithm has to call itself for each pair of adjacent elements.

Sorting a comb, though it works one and a half to two times slower than quick or k (in general), but it does not have the above-mentioned memory overflows. No recursion - no problem.

Well, we note that all middlings unanimously overtook the champions on large, almost sorted arrays. Here the principle has worked: "Hush and you go - you will continue."

Special cases: Million scarlet roses of small arrays


Small arrays - element middle. If you need to handle a huge number of small sets of numbers, you can get a time gain by taking a "primitive" bubble or dwarf sort. And quick sorting and others like it to use for larger tasks.
type = random, unique = True
size (from min to max) × count
5 (from 10 to 99) × 100000010 (10 to 99) × 1,000,000
PythonPhpPythonPhp
11.7726717.88824.2903933.6659
12.5187218.16529.3326835.202
15.4418817.86130.0667236.7911
14.1868119.783131.6598139.3533
13.6397824.337428.4166252.864
14.2188121.145225.8054832.5419
14.5868328.501624.8594250.3139
18.5380630.723829.664758.2493


With exchange sortings everything is clear. Now let's get down to the really interesting - sorting inserts.

Links


Excel-AlgoLab application , with which you can step by step to view the visualization of these (and not only these) sorts.

All arrays in the form of JSON are also laid out on the Google disk .

Wiki / Wiki - Silly / Stooge , Slow , Gnome / Gnome , Bubble / Bubble , Shaker / Shaker , Odd-Even / Odd-Even , Comb / Fast, Quick

Series articles


Source: https://habr.com/ru/post/415691/


All Articles