Dijkstra for linear time

Greetings to all, and especially those who are interested in the problems of discrete mathematics and graph theory.


Prehistory


It so happened that driven by interest, I was engaged in the development of the service of building a tour. routes. The task was to plan the best routes based on the city user of interest, the categories of establishments and the time frame. Well, one of the subtasks was to calculate travel time from one institution to another. Since I was young and stupid, I solved this problem head on with Dijkstra's algorithm, but it's fair to say that only with it you could run an iteration from one node to thousands of others, it was not an option to cache these distances; Moscow alone, and solutions like the Manhattan distance on our cities does not work at all from the word.


And so it turned out that the problem of productivity in the combinatorial problem was solved, but most of the time the request was eaten away by searching for uncached paths. The problem was complicated by the fact that the Osm road graph in Moscow is quite large (half a million nodes and 1.1 million arcs).


I will not talk about all the attempts and that in fact the problem could have been solved by cutting off the extra arcs of the graph, I’ll tell you only about the fact that at some point I was lit up and I realized that if I approach the Dijkstra algorithm from the point of view of the probabilistic approach, it can be linear.


Dijkstra for logarithmic time


Everyone knows, and it is not known to anyone, that the Dijkstra algorithm, by using a queue with logarithmic complexity of insertion and deletion, can lead to complexity of the form O (n log (n) + m log (n)). When using the Fibonacci heap, the complexity can be omitted to O (n * log (n) + m), but still not linear, but we would like.


In general, the queue algorithm is described as follows:


Let be:



While q is not empty:



If a red-black tree is used as a queue, where insertion and deletion take place behind log (n), and the search for the minimum element is similar to log (n), then the complexity of the algorithm is O (n log (n) + m log (n)) .


And here it is worth noting one important feature: nothing prevents, in theory, to examine the top several times. If the vertex was considered and the distance to it was updated to an incorrect, larger than the true value, then provided that sooner or later the system converges and the distance to u is updated to the correct value, it is possible to do such tricks. But it is worth noting that one vertex should be viewed more than 1 time with some small probability.


Sorting hash table


To reduce the time of Dijkstra’s algorithm to linear, a data structure is proposed, which is a hash table with node numbers (node_id) as values. I note that the need for the array d does not go anywhere, it is still needed, that getting the distance to the i-th node in constant time.


The figure below shows an example of the work of the proposed structure.


image


We describe in steps the proposed data structure:



Measurement results


The checks were made on the osm map of Moscow, unloaded via osm2po in postgres, and then loaded into memory. Tests were written in Java. There were 3 versions of the graph:



Below is a drawing with measurements on different versions of the graph:


image


Consider the dependence of the probability of re-viewing the vertex and the size of the graph:


number of node viewsnumber of vertices of the graphthe probability of re-viewing the site
1049151000154.8
1694291678920.9
4314904195942.8

It can be noted that the probability does not depend on the size of the graph and is rather specific to the request, but small and its range is configured by changing the power of the cell. I would be very grateful for the help in building a probabilistic modification of the algorithm with parameters guaranteeing a confidence interval within which the probability of repeated viewing will not exceed a certain percentage.


Qualitative measurements were also carried out to practically confirm the comparison of the correctness of the result of the algorithms with the new data structure, which showed the complete coincidence of the shortest path length from 1000 random nodes to 1000 other random nodes on the graph. (and so 250 iterations) when working with a sorting hash table and a red-ebony tree.


The source code of the proposed data structure is by reference.


PS: I know about Torup’s algorithm and the fact that it solves the same problem in linear time, but I couldn’t manage this fundamental work in one evening, although I understood the idea in general terms. At least, as I understand it, a different approach is proposed there, based on the construction of a minimum spanning tree.
PSS During the week I will try to find time and make a comparison with the Fibonacci heap and add github repu with examples and test codes a little later.

Source: https://habr.com/ru/post/416195/


All Articles