Automatic conflict resolution in an environment with more than one master node (in this article, a master node is a node that accepts data change requests) is a very interesting area of research. There are several different approaches and algorithms, depending on the application, and this article will consider Operational Transformations (OT) technology for conflict resolution in co-editing applications such as Google Docs and Etherpad.
1. Introduction
Collaboration is difficult from a technical point of view, because several people can make various changes to the same part of the text at almost the same points in time. Since the delivery of data via the Internet is not instantaneous (data transfer speed in fiber has limitations), then to simulate an instant response at each point in time, each client works with a local version (replica) of the edited document, which may differ from other versions. And the main snag is to ensure
consistency between local versions, or in other words, how to ensure that all local versions sooner or later
converge (converge) into the same, correct version of the document (we cannot require that all clients in some moments of time simultaneously had the same version, since the editing process can be endless).
Old version of google docs
Initially, Google Docs, like many other document co-editing applications, used a simple comparison of the contents of different versions of a document. Suppose that we have two clients, Petya and Vasya, and the current state of the document is synchronized between them. In the old version of Google Server, when receiving an update from Petya, it calculates the difference (diff) between its version and its version and tries to merge the two versions into one in the best available way. Then the server sends the received version of Vasya. If Vasya has any unsent changes sent to the server, then the process repeats - you need to merge the version from the server from the local Vasina and send it back to the server.
Very often, this approach does not work very well. For example, suppose that Peter, Vasya and the server begin with a synchronized document with the text “
The quick brown fox ”.
Petya bolds the words
brown fox , and Vasya at the same time replaces the word
fox with
dog . Let Petiny's changes come first to the server and he sends the updated version of Vasya. It is clear that the final result should be
The quick brown dog , but for the fusion algorithms there is not enough information to understand this, the options are absolutely correct. (For example, in git in such cases you will get a merge conflict and you will have to edit your hands). This is the main problem of this approach - it will not be possible to be confident in the results of the merger if you rely only on the contents of the document.
You can try to improve the situation and block the ability of other participants to edit parts of the text that someone already rules. But this is not what we want - such an approach is simply trying to avoid solving a technical problem through deteriorating user experience, and besides, there may be a situation when two participants started simultaneously editing the same sentence - and then one of them either loses the changes or it will have to manually merge the changes in case of the conflicts described above.
New version of Google Docs
In the new version of Google Docs, a completely different approach was applied: documents are stored as a sequence of changes and instead of comparing the content we roll changes
in order (Order here and hereinafter refers
to the order relationship ). Knowing how the user changed the document and considering his
intentions (intentions), we can correctly determine the final version after merging all the edits.
Okay, now we need to understand
when to apply changes - we need
a collaboration protocol .
In Google Docs, all changes to the document are reduced to three different
operations (operations) :
- Insert text
- Delete text
- Applying styles to text
Thus, when you edit a document, all your actions are saved exactly in this set of operations, and they are added to the end of the change log. When the document is displayed, the change log will be executed using the recorded operations.
A small remark: the first (public) product of Google with OT support was, apparently, Google Wave. He supported a much wider range of operations.
Example
Let Petya and Vasya start from the same state of “HABR 2017”.
Petya changes 2017 for 2018, it actually creates 2 operations:
At the same time, Vasya changes the text to “HELLO-HABR 2017”:
Vasya comes to Petina’s deletion operation, if he just applies it, then the wrong text will turn out (the number 7 should have been deleted):
To avoid this, Vasya has to
transform Petya's operation in accordance with his current local changes (in other words, the operations are context-sensitive):
\ {Delete \ \ @ 8 \} \ rightarrow \ {Delete \ \ @ 15 \}
Speaking a little more formally, consider the following example:
Then:
O1′(O2(X))=O2′(O1(X))
Voila! The described algorithm is called Operational Transformation.
2. Operational Transformation
Consistency model
To ensure consistency, several
consistency models have been developed, consider one of them - CCI.
CCI model provides the implementation of three properties:
- Convergence (converge): All replicas of a common document must be identical after performing all the operations:
In this example, both users start with identical replicas. Then they change the document and the replicas diverge (diverge) - this is how the minimum response time is achieved. After sending local changes to other clients, the convergence property requires that the final document state of all clients becomes identical. - Intentional intention (intentional preservation): The intention of an operation must be preserved on all replicas, regardless of the overlap of performing several operations simultaneously. The intention of the operation (intention of an operation) is defined as the effect of its execution on the copy where it was created.
Consider an example where this property does not hold:
Both clients start with identical replicas, then both make changes. For Petit, the intention of his operation is to insert '12' on the first index, and for Vasya - to delete characters with indexes 2 and 3. After synchronization, Petit Vasino has an intent and Vasya Petino's intention is violated. Note also that the replicas are diverged, but this is not a requirement for the property in question. The correct final version is proposed to define the reader as a small exercise.
- Causality Preservation: Operations must be performed in a causal order (based on the relationship happened- before ).
Consider an example where this property does not hold:
As you see, Petya sent Vasya and Agent Smith his own change of document, Vasya received it first and deleted the last character. Because of the network lag, Smith first gets Vasin the operation to remove a character that does not exist yet.
OT cannot ensure the fulfillment of the property of preserving causality; therefore, algorithms such as vector clocks are used for this purpose.
OT design
One of the strategy of the OT system design is the division into three parts:
- Algorithms of transformation control (transformation control algorithms): to determine when the operation (target) is ready for transformation, regarding what operations (reference) transformations to carry out, and in what order to perform them.
- Transformation functions (transformation functions): the transformation to the target operation, taking into account the influence of reference operations.
- Requirements and properties of transformations (properties and conditions): to provide communication between these components and to perform checks for correctness.
Conversion functions
Conversion functions can be divided into two types:
- Inclusion / Forward (Inclusion / Forward Transformation): Transform Operation Oa before surgery Ob in a way that takes into account the effect of Ob (for example, two inserts)
- Exceptions / Inverse (Exclusion / Backward Transformation): Transform Operation Oa before surgery Ob in such a way that the effect of Ob excluded (for example, insertion after deletion)
Example for character insertion / deletion (i - insert, d - delete):
Tii(Ins[p1, c1], Ins[p2, c2]) { if (p1 < p2) || ((p1 == p2) && (order() == -1)) // order() – return Ins[p1, c1]; // Tii(Ins[3, 'a'], Ins[4, 'b']) = Ins[3, 'a'] else return Ins[p1 + 1, c1]; // Tii(Ins[3, 'a'], Ins[1, 'b']) = Ins[4, 'a'] } Tid(Ins[p1, c1], Del[p2]) { if (p1 <= p2) return Ins[p1, c1]; // Tid(Ins[3, 'a'], Del[4]) = Ins[3, 'a'] else return Ins[p1 – 1, c1]; // Tid(Ins[3, 'a'], Del[1]) = Ins[2, 'a'] } Tdi(Del[p1], Ins[p2, c1]) { // , } Tdd(Del[p1], Del[p2]) { if (p1 < p2) return Del[p1]; // Tdd(Del[3], Del[4]) = Del[3] else if (p1 > p2) return Del[p1 – 1]; // Tdd(Del[3], Del[1]) = Del[2] else return Id; // Id – }
3. Interaction Protocol in Google Docs
Let's look at how the interaction protocol in Google Docs works in more detail. Suppose we have a server, Petya and Vasya, and they have a synchronized version of an empty document.
Each client remembers the following information:
- The version (id, revision) of the last change received from the server.
- All changes made locally but not sent to the server (pending submission)
- All changes made locally, sent to the server, but without confirmation from the server.
- The current state of the document that the user sees.
The server, in turn, remembers:
- A list of all received but not processed changes (pending).
- Full history of all processed changes (revision log).
- The status of the document at the time of the last processed change.
So, Peter begins with the word “Hello” at the beginning of the document:
The client first added this change to the waiting list, and then sent it to the server and moved the changes to the sent list.
Petya continues typing and has already added the word “Habr”. At the same time, Vasya printed “!” In his empty (he hadn’t received the change of Petina yet) document.
Petino
\ {Insert \ \ 'Habr', \ @ 5 \} was added to the waiting list, but has not yet been sent, because we are
not sending more than one change at a time, and the previous one has not yet been confirmed . Note also that the server has saved Petit's changes in its revision log. Then the server sends them to Vasya and sends confirmation to Petya (that Petiny’s first changes were successfully processed)
Vasya's client receives Petiny's changes from the server and applies OT with respect to those waiting to be sent to the server
\ {Insert \ \ '!', \ @ 0 \} . The result is a change in the insertion index from 0 to 5. Note also that both clients updated the number of the last synchronized revision with the server to 1. Finally, Petin the client removes the corresponding change from the list of pending confirmation from the server.
Then Petya and Vasya send their changes to the server.
The server receives the Petiny changes before (with respect to the entered order) Vasin, so he first processes them, and sends confirmation to Pet. He also sends them to Vasya, and his client transforms them in relation to as yet unconfirmed changes.
\ {Insert \ \ '!', \ @ 5 \} .
Then an important moment occurs. The server begins to process Vasya’s changes, those that Vasi has, have revision number 2. But the server has already recorded the changes under number 2, so it applies the conversion
to all changes that the client doesn’t yet know about (in this case -
\ {Insert \ \ 'Habr', \ @ 5 \} ), and saves the result at number 3.
As we can see, the index in Vasin’s change was increased by 5 to accommodate Petin’s text.
The process is completed for Petit, and Vasya is left to receive a new change from the server and send a confirmation.
4. Etherpad
Consider another similar application where OT is used - an open source project of an online editor with
etherpad.org co-editing
.In Etherpad, the conversion functions are slightly different — the changes are sent to the server as a
changeset , defined as
( ell1 rightarrow ell2)[c1,c2,c3,...],
Where
- ell1 : document length before editing.
- ell2 : document length after editing.
- c1,c2,c3,... - document description after editing. If a ci - a number (or a range of numbers), then these are the indexes of characters of the original document, which will remain after editing (retained), and if ci - a character (or string), this is an insertion.
Example:
- $ inline $ `` "+ \ (0 \ rightarrow 6) [` `Hello"] = `` Hello "$ inline $
- $ inline $ `` Beaver ”+ (4 \ rightarrow 4) [` `Ha”, \ 2-3] = `` Habr "$ inline $
As you already understand, the final document is formed as a sequence of such changes, applied in order to the empty document.
Note that we cannot simply apply changes from other participants (this is, in principle, possible in Google Docs), because the final document lengths may be different.
For example, if the source document was of length n, and we have two changes:
A=(n rightarrowna)[ cdots]B=(n rightarrownb)[ cdots],
that
A(B) impossible because
A requires length document
n , and then
B he will be length
nb .
Etherpad uses a
merge mechanism to resolve this situation: this is a function denoted as
m(A,B) , accepts two changes to the input
on the same document state (hereinafter referred to as
X ) and makes a new change.
Required that
m(A,B)=m(B,A)
Note that for the client with the changes
A that received changes
B there is no point in calculating
m(A,B) , because
m(A,B) It applies to
X , and
A Current state
A(x) . Actually, we need to calculate
A′ and
B′ such that
B′(A(X))=A′(B(X))=m(A,B)(X)
Function calculating
A′ and
B′ is called follow and is defined as:
f(A,B)(A)=f(B,A)(B)=m(A,B)=m(B,A)Construction algorithm
f(A,B) following:
- Insertion in A becomes indices in f(A,B)
- The insert in B becomes the insert in f(A,B)
- Matching indices in A and B are transferred to f(A,B)
Example:
$$ display $$ X = (0 \ rightarrow 8) [`` baseball ”] \\ A = (8 \ rightarrow 5) [0 - 1,` `si”, 7] \ / \! \! / == `` basil ”\\ B = (8 \ rightarrow 5) [0,` `e", 6, `ow"] \ / \! \! / == `` below "$$ display $$
Calculate:
A′=f(B,A)=(5 rightarrow6)[0,1,‘‘si",3,4]B′=f(A,B)=(5 rightarrow6)[0,“e”,2,3,‘‘ow"]m(A,B)=m(B,A)=A(B′)=B(A′)=(8 rightarrow6)[0,‘‘esiow"]
Calculate
m(A,B)(X) proposed as an exercise.
The interaction protocol is essentially the same as Google Docs, except that the calculations are a bit more complicated.
5. OT criticism
- Implementing OT is a very difficult task from a programming point of view. Quoting from Wikipedia : Joseph Gentle, developer of the Share.JS library and former Google Wave engineer, said that “Unfortunately, implementing OT sucks. There's a million algorithms with different tradeoffs, mostly trapped in academic papers. The algorithms are really hard and time consuming to implement correctly. [...] It’s been a long time. ”
(Unfortunately, it is very difficult to write OT. There are a million algorithms with their pros and cons, but most of them are only academic studies. Algorithms are actually very complex and take a lot of time to implement them correctly. [...] We spent 2 years on Writing Wave, and if I had to write it again today, we would need the same amount of time)
- You must save absolutely every change to the document.
6. References