Class serialization on JavaScript with support for circular references

Prologue


Currently, I am developing a schema editor for Javascript, and in the process of this work I encountered a problem to which this article will be devoted, namely, serialization and deserialization of complex data objects.


Without going into the details of the project, I note that, according to my idea, the diagram is an array of elements (vertices) inherited from the base class. Accordingly, each child class implements its own logic. In addition, vertices contain links to each other (arrows), which also need to be saved. In theory, vertices can refer to themselves directly or through other vertices. Standard JSON.stringify is not able to serialize such an array, so I decided to make my own serializer that solves the two problems described:


  1. The ability to save information about the class in the process of serialization and restore it during deserialization.
  2. The ability to save and restore object links, incl. cyclic.

Read more about the formulation of the problem and its solution under the cut.


The serializer project on github


Link to github project: link .
Complex examples are in the test-src folder.


Recursive serializer: link .
Flat serializer: link .


Formulation of the problem


As I have already noted, the initial task is the serialization of arbitrary schemes for the editor. In order not to waste time on the description of the editor, set the task easier. Suppose we want to make a formal description of a simple algorithm scheme using ES6 Javascript classes, and then serialize and deserialize this scheme.


On the Internet, I found a suitable image of the simplest algorithm for determining the maximum of two values:


image


Here I must say that I am not a Javascript developer, and my native language is C #, therefore the approach to solving the problem is dictated by the experience of object-oriented development in C #. Looking at this scheme, I see the vertices of the following types (the names are conditional and do not play a special role):



These vertices have some differences from each other in their data set or semantics, but all of them are inherited from the base vertex (Node). There, in the Node class, is described the links field, which contains links to other vertices, and the addLink method, which allows adding these links. The full code of all classes can be found here .


Let's write the code that collects the scheme from the image, and try to serialize the result.


Algorithm circuit creation code
//   let start = new Schema.Start(); let input = new Schema.Command(' A, B'); let check = new Schema.If('A > B'); let maxIsA = new Schema.Let('Max', 'A'); let maxIsB = new Schema.Let('Max', 'B'); let output = new Schema.Command(' Max'); let finish = new Schema.Finish(); //   start.addLink(input); input.addLink(check); check.addLink(maxIsA, { condition: 'true' }); check.addLink(maxIsB, { condition: 'false' }); maxIsA.addLink(output); maxIsB.addLink(output); output.addLink(finish); //    ( ) let schema = [ start, input, check, maxIsA, maxIsB, output, finish ]; 

If we serialize this scheme using JSON.stringify, we get something terrible. I will give the first few lines of the result, in which I added my comments:


JSON.stringify result
 [ /*    */ { "id": "d9c8ab69-e4fa-4433-80bb-1cc7173024d6", "name": "Start", "links": { "2e3d482b-187f-4c96-95cd-b3cde9e55a43": { "id": "2e3d482b-187f-4c96-95cd-b3cde9e55a43", "target": /*    */ { "id": "f87a3913-84b0-4b70-8927-6111c6628a1f", "name": "Command", "links": { "4f623116-1b70-42bf-8a47-da1e9be5e4b2": { "id": "4f623116-1b70-42bf-8a47-da1e9be5e4b2", "target": /*     */ { "id": "94a47403-13ab-4c83-98fe-3b201744c8f2", "name": "If", "links": { ... 

Since the first vertex contained a link to the second, and that to subsequent ones, then as a result of its serialization, the whole scheme was serialized. Then the second vertex and all its dependencies were serialized, and so on. It is possible to restore the original links from this mix only by identifiers, but they will not help either if any of the vertices refers to itself directly or through other vertices. In this case, the serializer will generate an Uncaught TypeError error : Converting the circular structure to JSON . If it is not clear, here is the simplest example that generates this error: https://jsfiddle.net/L4guo86w/ .


In addition, JSON does not contain any information about the source classes, so there is no way to understand what type each of the vertices was before serialization.


Realizing these problems, I got into the Internet and began to look for ready-made solutions. There were a lot of them, but most were very cumbersome or required a special description of the classes being serialized, so it was decided to make my own bike. And yes, I love bikes.


Serializer concept


This section is for those who want to participate in the creation of a serialization algorithm with me, albeit virtually.


Saving information about data types


One of the problems with Javascript is the lack of metadata that allows you to work wonders in languages ​​such as C # or Java (attributes and reflection). On the other hand, I do not need super complex serialization with the ability to define a list of serializable fields, validation and other chips. Therefore, the main idea is to add information about its type to the object and serialize it with the usual JSON.stringify.


During the search for solutions, I came across an interesting article, the title of which translates as "6 wrong ways to add type information to JSON" . In fact, the methods are quite good, and I chose the one that is number 5. If you are too lazy to read the article, and I strongly recommend it, I will briefly describe this method: when serializing an object, we wrap it into another object with a single field whose name has the format "@<type>" , and the value - the data object. During deserialization, we retrieve the type name, recreate the object from the constructor, and read the data of its fields.


If we remove the links from our example above, then the standard JSON.stringify serializes the data like this:


JSON.stringify
 [ { "id": "d04d6a58-7215-4102-aed0-32122e331cf4", "name": "Start", "links": {} }, { "id": "5c58c3fc-8ce1-45a5-9e44-90d5cebe11d3", "name": "Command", "links": {}, "command": " A, B" }, ... } 

And our serializer will wrap up like this:


Result of serialization
 [ { "@Schema.Start": { "id": "d04d6a58-7215-4102-aed0-32122e331cf4", "name": "Start", "links": {} } }, { "@Schema.Command": { "id": "5c58c3fc-8ce1-45a5-9e44-90d5cebe11d3", "name": "Command", "links": {}, "command": " A, B" } }, ... } 

Of course, there is a drawback: the serializer must know about the types that it can serialize, and the objects themselves should not contain fields whose name begins with a dog. However, the second problem is solved by agreement with the developers or replacing the dog symbol with something else, and the first problem is solved in one line of code (below is an example). We know what we are going to serialize, right?


Solving the problem with links


Here it is still simpler from the point of view of the algorithm, but more difficult with implementation.


When serializing instances of classes registered in the serializer, we will memorize them in the cache and assign them a sequence number. If we later meet this instance again, then we will add this number to its first definition (the field name takes the form "@<type>|<index>" ), and in the serialization place we insert the link as an object


  { "@<type>": <index> } 

Thus, during deserialization, we look at exactly what the field value is. If this number, then we retrieve the object from the cache by this number. Otherwise, this is his first definition.


Return the link from the first vertex of the circuit to the second and see the result:


Result of serialization
 [ { "@Schema.Start": { "id": "a26a3a29-9462-4c92-8d24-6a93dd5c819a", "name": "Start", "links": { "25fa2c44-0446-4471-a013-8b24ffb33bac": { "@Schema.Link": { "id": "25fa2c44-0446-4471-a013-8b24ffb33bac", "target": { "@Schema.Command|1": { "id": "4f4f5521-a2ee-4576-8aec-f61a08ed38dc", "name": "Command", "links": {}, "command": " A, B" } } } } } } }, { "@Schema.Command": 1 }, ... } 

It does not look very clear at first glance, because The second vertex is first defined internally in the first Link object in the link, but it is important that this approach works. In addition, I created the second version of the serializer, which bypasses the tree not in depth, but in width, which allows to avoid such "ladders".


Creating a serializer


This section is intended for those who are interested in the implementation of the ideas described above.


Serializer stub


Like any other, our serializer will have two main methods - serialize and deserialize. In addition, we need a method that tells the serializer about the classes that it should serialize (register) and the classes that should not (ignore). The latter is necessary in order not to serialize DOM elements, jQuery objects, or any other data types that cannot be serialized or which serialization is not needed. For example, in my editor I keep a visual element corresponding to the top or connection. It is created during initialization and, of course, should not fall into the database.


Serializer shell code
 /** *  */ export default class Serializer { /** *  */ constructor() { this._nameToCtor = []; //    this._ctorToName = []; //    this._ignore = [Element]; //    } /** *   * @param {string} alias  * @param {Function} ctor  */ register(alias, ctor) { if (typeof ctor === 'undefined' && typeof alias === 'function') { //    -  ctor = alias; alias = ctor.name; } this._nameToCtor[alias] = ctor; this._ctorToName[ctor] = alias; } /** *     * @param {Function} ctor  */ ignore(ctor) { if (this._ignore.indexOf(ctor) < 0) { this._ignore.push(ctor); } } /** *   * @param {any} val  * @param {Function} [replacer]       * @param {string} [space]   * @returns {string}  */ serialize(val, replacer, space) { return JSON.stringify(new SerializationContext(this).serialize(val), replacer, space); } /** *     json * @param {any} val    json * @returns {any}  */ deserialize(val) { //     if (isString(val)) val = JSON.parse(val); return new DeserializationContext(this).deserialize(val); } } 

Explanations.


To register a class, you must pass its constructor to the register method in one of two ways:


  1. register (MyClass)
  2. register ('MyNamespace.MyClass', MyClass)

In the first case, the class name will be extracted from the name of the constructor function (not supported in IE), in the second you specify the name yourself. The second method is preferable, because allows you to use namespaces, and the first, as planned, is designed to register built-in Javascript types with overridden serialization logic.


For our example, the initialization of the serializer is as follows:


 import Schema from './schema'; ... //   let serializer = new Serializer(); //   Object.keys(Schema).forEach(key => serializer.register(`Schema.${key}`, Schema[key])); 

The Schema object contains descriptions of all classes of vertices, so the code for registering classes fits into one line.


Context serialization and deserialization


You could pay attention to the cryptic classes SerializationContext and DeserializationContext. They are the ones who do all the work, and they are primarily needed to separate the data from different serialization / deserialization processes, since for each of their calls, it is necessary to store intermediate information — the cache of serialized objects and the sequence number for the link.


SerializationContext


I will analyze in detail only the recursive serializer, since their "flat" analog is somewhat more complicated, and differs only in the approach to processing the tree of objects.


Let's start with the constructor:


 /** *  * @param {Serializer} ser  */ constructor(ser) { this.__proto__.__proto__ = ser; this.cache = []; //    this.index = 0; //     } 

I this.__proto__.__proto__ = ser; to explain the mysterious line this.__proto__.__proto__ = ser;
At the input of the constructor, we take the object of the serializer itself, and this line inherits our class from it. This allows access to the serializer data through this .
For example, this._ignore refers to the list of ignored classes of the serializer itself ("black list"), which is very useful. Otherwise, we would have to write something like this._serializer._ignore .


Main serialization method:


 /** *   * @param {any} val  * @returns {string}  */ serialize(val) { if (Array.isArray(val)) { //  return this.serializeArray(val); } else if (isObject(val)) { //  if (this._ignore.some(e => val instanceof e)) { //   return undefined; } else { return this.serializeObject(val); } } else { //   return val; } } 

Here it should be noted that there are three basic data types that we process: arrays, objects and simple values. If the object constructor is blacklisted, then this object is not serialized.


Array serialization:


 /** *   * @param {Array} val  * @returns {Array}  */ serializeArray(val) { let res = []; for (let item of val) { let e = this.serialize(item); if (typeof e !== 'undefined') res.push(e); } return res; } 

You can write shorter via map, but this is not critical. Only one thing is important - checking the value to undefined. If there is a non-serializable class in the array, then without this verification it will fall into the array as undefined, which is not very good. Also in my implementation, arrays are serialized without keys. Theoretically, it is possible to refine the algorithm for serializing associative arrays, but for these purposes I prefer to use objects. In addition, JSON.stringify also does not like associative arrays.


Object serialization:


Code
 /** *   * @param {Object} val  * @returns {Object}  */ serializeObject(val) { let name = this._ctorToName[val.constructor]; if (name) { //     if (!val.__uuid) val.__uuid = ++uuid; let cached = this.cache[val.__uuid]; if (cached) { //     if (!cached.index) { //     cached.index = ++this.index; let key = Object.keys(cached.ref)[0]; let old = cached.ref[key]; cached.ref[`@${name}|${cached.index}`] = old; delete cached.ref[key]; } //     return { [`@${name}`]: cached.index }; } else { let res; let cached = { ref: { [`@${name}`]: {} } }; this.cache[val.__uuid] = cached; if (typeof val.serialize === 'function') { //     res = val.serialize(); } else { //   res = this.serializeObjectInner(val); } cached.ref[Object.keys(cached.ref)[0]] = res; return cached.ref; } } else { //   return this.serializeObjectInner(val); } } 

Obviously, this is the hardest part of the serializer, his heart. Let's sort it out on the shelves.


To begin with, we check whether the class constructor is registered in the serializer. If not, then this is a simple object for which the utility method serializeObjectInner called.


Otherwise, we check whether the object is assigned a unique identifier __uuid . This is a simple counter variable common to all serializers, and it is used to keep a reference to a class instance in the cache. It would have been possible to do without it, and store the instance itself without a key in the cache, but then to check whether the object is stored in the cache, one would have to run through the entire cache, and then it suffices to check the key. I think it is faster in terms of the internal implementation of objects in browsers. In addition, I deliberately do not serialize fields that begin with two underscores, so the __uid field will not fall into the final json, like other private class fields. If this is unacceptable for your task, you can change this logic.


Next, by the value of __uuid, we look for an object that describes an instance of the class in the cache ( cached ).


If there is such an object, then the value has already been serialized. In this case, we assign a sequence number to the object, if this has not been done before:


 if (!cached.index) { //     cached.index = ++this.index; let key = Object.keys(cached.ref)[0]; let old = cached.ref[key]; cached.ref[`@${name}|${cached.index}`] = old; delete cached.ref[key]; } 

The code looks confusing and can be simplified by assigning the number to all classes that we serialize. But for debugging and perception of the result it is better when the number is assigned only to those classes that are further referenced.


When the number is assigned, we return the link according to the algorithm:


 //     return { [`@${name}`]: cached.index }; 

If the object is serialized for the first time, we create an instance of its cache:


 let res; let cached = { ref: { [`@${name}`]: {} } }; this.cache[val.__uuid] = cached; 

And then serialize it:


 if (typeof val.serialize === 'function') { //     res = val.serialize(); } else { //   res = this.serializeObjectInner(val); } cached.ref[Object.keys(cached.ref)[0]] = res; 

There is a check for the implementation of the class serialization interface (which will be discussed below), as well as the construction Object.keys(cached.ref)[0] . The fact is that cached.ref stores a reference to the wrapper object { "@<type>[|<index>]": <> } , but we don’t know the name of the field of the object, since At this stage, we still do not know whether the name will contain the object number (index). This construct simply retrieves the first and only field of the object.


Finally, the utility method for serializing the viscera of an object:


 /** *   * @param {Object} val  * @returns {Object}  */ serializeObjectInner(val) { let res = {}; for (let key of Object.getOwnPropertyNames(val)) { if (!(isString(key) && key.startsWith('__'))) { //  ,      res[key] = this.serialize(val[key]); } } return res; } 

We create a new object and copy fields from the old one into it.


DeserializationContext


The deserialization process works in the reverse order and does not need any special comments.


Code
 /** *   */ class DeserializationContext { /** *  * @param {Serializer} ser  */ constructor(ser) { this.__proto__.__proto__ = ser; this.cache = []; //    } /** *   json * @param {any} val  json * @returns {any}  */ deserialize(val) { if (Array.isArray(val)) { //  return this.deserializeArray(val); } else if (isObject(val)) { //  return this.deserializeObject(val); } else { //   return val; } } /** *   * @param {Object} val  * @returns {Object}  */ deserializeArray(val) { return val.map(item => this.deserialize(item)); } /** *   * @param {Array} val  * @returns {Array}  */ deserializeObject(val) { let res = {}; for (let key of Object.getOwnPropertyNames(val)) { let data = val[key]; if (isString(key) && key.startsWith('@')) { //   if (isInteger(data)) { //  res = this.cache[data]; if (res) { return res; } else { console.error(`     ${data}`); return data; } } else { //   let [name, id] = key.substr(1).split('|'); let ctor = this._nameToCtor[name]; if (ctor) { //     res = new ctor(); //   ,    if (id) this.cache[id] = res; if (typeof res.deserialize === 'function') { //     res.deserialize(data); } else { //    for (let key of Object.getOwnPropertyNames(data)) { res[key] = this.deserialize(data[key]); } } return res; } else { //    console.error(`  "${name}"  .`); return val[key]; } } } else { //   res[key] = this.deserialize(val[key]); } } return res; } } 

Additional features


Serialization interface


There is no interface support in Javascript, but we can agree that if the class implements the serialize and deserialize methods, then these methods will be used for serialization / deserialization, respectively.


In addition, Javascript allows you to implement these methods for built-in types, for example, for Date:


ISO serialization
 Date.prototype.serialize = function () { return this.toISOString(); }; Date.prototype.deserialize = function (val) { let date = new Date(val); this.setDate(date.getDate()); this.setTime(date.getTime()); }; 

The main thing do not forget to register the type Date: serializer.register(Date); .


Result:


 { "@Date": "2018-06-02T20:41:06.861Z" } 

The only restriction: the result of serialization should not be an integer, since in this case, it will be interpreted as an object reference.


Similarly, you can serialize simple classes into strings. An example with serialization of the Color class describing a color to the string #rrggbb is on github .


Flat serializer


Especially for you, dear readers, I wrote the second version of the serializer , which bypasses the tree of objects not recursively into depth, but iteratively in width using a queue.


For comparison, I will give an example of the serialization of the first two vertices of our scheme in both versions.


Recursive serializer (deep serialization)
 [ { "@Schema.Start": { "id": "5ec74f26-9515-4789-b852-12feeb258949", "name": "Start", "links": { "102c3dca-8e08-4389-bc7f-68862f2061ef": { "@Schema.Link": { "id": "102c3dca-8e08-4389-bc7f-68862f2061ef", "target": { "@Schema.Command|1": { "id": "447f6299-4bd4-48e4-b271-016a0d47fc0e", "name": "Command", "links": {}, "command": " A, B" } } } } } } }, { "@Schema.Command": 1 } ] 

Flat serializer (serialized wide)
 [ { "@Schema.Start": { "id": "1412603f-24c2-4513-836e-f2b0c0392483", "name": "Start", "links": { "b94ac7e5-d75f-44c1-960f-a02f52c994da": { "@Schema.Link": { "id": "b94ac7e5-d75f-44c1-960f-a02f52c994da", "target": { "@Schema.Command": 1 } } } } } }, { "@Schema.Command|1": { "id": "a93e452e-4276-4d6a-86a1-0681226d79f0", "name": "Command", "links": {}, "command": " A, B" } } ] 

Personally, I like the second option even more than the first one, but it should be remembered that by choosing one of the options, you cannot use the other. It's all about the links. Notice that in the flat serializer the link to the second vertex goes before its description.


Pros and cons of the serializer


Pros:



Minuses:



Also a minus is that the code is written on ES6. Of course, conversion to earlier versions of Javascript is possible, but I did not check the compatibility of the resulting code with different browsers.


My other publications


  1. Localization of projects on .NET with an interpreter of functions
  2. Filling text templates with model-based data. Implementing on .NET using dynamic functions in bytecode (IL)

Source: https://habr.com/ru/post/413113/


All Articles