JOIN in NoSQL Databases

This post will look at how to connect collections in NoSQL mongodb, arangodb, orientdb and rethinkdb databases (besides being a NoSQL database, they are also united by the availability of a free version with a fairly loyal license). In relational databases, similar functionality is implemented using SQL JOIN. Despite the fact that CRUD operations in NoSQL databases are very similar and differ only in details, for example, in one database the function create ({...}) is used to create an object, in the other, insert ({...}), and the third is save ({...}), - the implementation of a sample of two or more collections in each of the databases is implemented quite differently. Therefore, it will be interesting to perform the same sample on all databases. For all databases, sampling (a many-to-many relationship) for two tables will be considered.

For example, tables will be used.

Author


Book


Books of Authors


For mongodb, the sample from the tables will be implemented as follows:

const mongo = require('mongodb-bluebird'); mongo.connect("mongodb://localhost:27017/test").then(async function(db) { const author = db.collection('author'); const book = db.collection('book'); const bookauthor = db.collection('bookauthor'); ['Joe', 'John', 'Jack', 'Jeremy'].map(async (name) => await author.insert({name}) ); ['Art', 'Paint'].map(async (title) => await book.insert({title}) ); let Author = await author.findOne({ name: 'Joe' }); let Book = await book.findOne({ title: 'Paint' }); await bookauthor.insert({author: Author._id, book: Book._id}) Author = await author.findOne({ name: 'John' }); await bookauthor.insert({author: Author._id, book: Book._id}) Book = await book.findOne({ title: 'Art' }); await bookauthor.insert({author: Author._id, book: Book._id}) const result = await author.aggregate([{ $lookup:{ from: 'bookauthor', localField: '_id', foreignField: 'author', as: 'ba' }}, { $lookup: { from: 'book', localField: 'ba.book', foreignField: '_id', as: 'books' }}],{ }) }); 

Unlike the SQL JOIN sample, the resulting sample will not be a flat table in which the Author will be repeated as many times as the books he composed, but a hierarchical object in which each Author will be represented by one object, which will have the books property containing an array of objects. . From my point of view, this is a very big plus in NoSQL databases. But it may be necessary to use a “flat” version, similar to SQL JOIN. In order to get it in the request, you need to add the “expanding” of arrays: { $unwind: '$books' } .

The sample that is presented in the example is analogous to SQL LEFT JOIN, that is, all authors will be included in the sample, even if they do not have books. In order to make an analogue of SQL [INNER] JOIN, you must add the condition { $match: { books: { $ne: [ ] } } } , or in the case of $ unwind:

 { $unwind: { path: "$role", preserveNullAndEmptyArrays: false } } 

So, go to arangodb, which is a hybrid database. In addition to working with documents, it implements work with graphs. First, let's see how in arangodb you can make a selection using only documents (not columns):

  FOR a IN author FOR ba IN bookauthor FILTER a._id == ba.author FOR b IN book FILTER b._id == ba.book SORT a.name, b.title RETURN { author: a, book: b } 

To connect collections, arangodb uses the FOR keyword, and to set the join condition, FILTER. The sample obtained in this case will be similar to SQL [INNER] JOIN (that is, it will be a “flat” object and will not contain empty values)

But it is much more convenient in arangodb for samples from several objects to use the possibilities of working with graphs:

 const { Database, aql } = require('arangojs'); const db = new Database({ url: "http://localhost:8529" }); db.useDatabase("test"); db.useBasicAuth("test", "test"); const author = db.collection('author') const book = db.collection('book') const bookauthor = db.edgeCollection('bookauthor') void async function() { ['Joe', 'John', 'Jack', 'Jeremy'].map(async (name) => await author.save({name}) ); ['Art', 'Paint'].map(async (title) => await book.save({title}) ); let Author = await author.firstExample({ name: 'Joe' }); let Book = await book.firstExample({ title: 'Paint' }); await bookauthor.save({date: 'Some data'}, Author._id, Book._id) Author = await author.firstExample({ name: 'John' }); await bookauthor.save({date: 'Some data'}, Author._id, Book._id) Book = await book.firstExample({ title: 'Art' }); await bookauthor.save({date: 'Some data'}, Author._id, Book._id) const cursor = await db.query(aql` FOR a IN author FOR book_vertex, book_edge IN OUTBOUND a bookauthor COLLECT a1 = a INTO b1 RETURN {author: a1, books: b1[*].book_vertex} `); }(); 

For us, the link is now not a document, but a collection of edges of the graph (edge) bookauthor. The IN OUTBOUND a bookauthor selects for a given Author a collection of related documents, which it places in the response under the name book_vertex . The COLLECT a1 = a INTO b1 operator COLLECT a1 = a INTO b1 is an analogue of SQL GROUP - it accumulates a value into an array, which in the response will be available under the name b1 for each value of Author , which in the answer will be available under the name a1 . The design b1[*].book_vertex allows b1[*].book_vertex to remove unnecessary levels of nesting from an object so that the result is convenient for further work.

Implementing queries like SQL LEFT JOIN in arangodb is more difficult, because the FOR - FILTER statement sets restrictions similar to SQL [INNER] JOIN. To implement the "left connections", the LET operator and the subquery are used:

  const cursor = await db.query(aql` FOR a IN author LET books = ( FOR book_vertex, book_edge IN OUTBOUND a bookauthor RETURN book_vertex ) RETURN {author: a, books} `); 

In this case, data grouping is not required, since A subquery is executed for each Author and in the response already contains a Gothic array of Book objects.

Moving on to the orientdb database. It is also a hybrid database that allows you to work with both documents and graphs. The ideology of working with graphs is similar to the previous example in arangodb. That is, for the connection of collections is the collection of edges of the graph (edge) bookauthor.

 const OrientDB = require('orientjs'); const server = OrientDB({ host: 'localhost', port: 2424, }); void async function() { const db = server.use({ name:'test', username: 'test', password: 'test' }); await db.open(); try { await db.class.drop('Author UNSAFE'); } catch(ex) { console.log(ex) } try { await db.class.drop('Book UNSAFE'); } catch(ex) { console.log(ex) } try { await db.class.drop('BookAuthor UNSAFE'); } catch(ex) { console.log(ex) } const author = await db.class.create('Author', 'V'); const book = await db.class.create('Book', 'V'); const bookauthor = await db.class.create('BookAuthor', 'E'); ['Joe', 'John', 'Jack', 'Jeremy'].map(async (name) => await author.create({name}) ); ['Art', 'Paint'].map(async (title) => await book.create({title}) ); await author.list(); await book.list(); let Author = await db.select().from('Author').where({name: 'Joe'}).one(); let Book = await db.select().from('book').where({ title: 'Paint' }).one(); await db.create('EDGE', 'BookAuthor').from(Author['@rid']).to(Book['@rid']).set({date: 'Some data'}).one(); Author = await db.select().from('Author').where({name: 'John'}).one(); await db.create('EDGE', 'BookAuthor').from(Author['@rid']).to(Book['@rid']).set({date: 'Some data'}).one(); Book = await db.select().from('book').where({ title: 'Art' }).one(); await db.create('EDGE', 'BookAuthor').from(Author['@rid']).to(Book['@rid']).set({date: 'Some data'}).one(); const cursor = await db.query(`select name, out('BookAuthor').title as books from Author`).all() } () 

Perhaps, in orientdb, the implementation is the most successful, since it is closest to the syntax of SQL and concise with regard to working with graphs. The out('BookAuthor').title as books from Author expression out('BookAuthor').title as books from Author means to select for the Author collection all outgoing edges from the BookAuthor collection that link the Author collection to the Book collection. In this case, the resulting object will be hierarchical (one object for each Author with an array of Book objects). If you need to "expand" an array into a flat object, use the UNWIND operator.

And finally, consider rethinkdb. Not so long ago, the team that developed this database ceased to exist and transferred the development to an open community. I say this immediately, because Someone might have missed the news. Before a more detailed acquaintance, the implementation of JOIN in rethinkdb seemed to be the most convenient. Perhaps because such a possibility was immediately incorporated into the database API, and even so was called join (). But then it turned out that everything is not so rosy, and not all functions implementing JOIN work equally efficiently and have enough flexibility to build the necessary queries. The same our end-to-end example is now implemented on rethinkdb:

 r = require('rethinkdb') void async function() { const conn = await r.connect({ host: 'localhost', port: 28015 }); try { await r.db('test').tableDrop('author').run(conn); await r.db('test').tableDrop('book').run(conn); await r.db('test').tableDrop('bookauthor').run(conn); } catch (ex) { console.log(ex) } await r.db('test').tableCreate('author').run(conn); await r.db('test').tableCreate('book').run(conn); await r.db('test').tableCreate('bookauthor').run(conn); await r.db('test').table('bookauthor').indexCreate('author').run(conn); await r.db('test').table('bookauthor').indexCreate('book').run(conn); await r.db('test').table('bookauthor').indexWait('author', 'book').run(conn); ['Joe', 'John', 'Jack', 'Jeremy'].map(async (name) => await r.db('test').table('author').insert({ name }).run(conn) ); ['Art', 'Paint'].map(async (title) => await r.db('test').table('book').insert({ title }).run(conn) ); let Author = await r.db('test').table('author').filter({ name: 'Joe' }).run(conn).then(authors => authors.next()); let Book = await r.db('test').table('book').filter({ title: 'Paint' }).run(conn).then(books => books.next()); await r.db('test').table('bookauthor').insert({author: Author.id, book: Book.id}).run(conn); Author = await r.db('test').table('author').filter({ name: 'John' }).run(conn).then(authors => authors.next()); await r.db('test').table('bookauthor').insert({author: Author.id, book: Book.id}).run(conn); Book = await r.db('test').table('book').filter({ title: 'Art' }).run(conn).then(books => books.next()); await r.db('test').table('bookauthor').insert({author: Author.id, book: Book.id}).run(conn); const cursor = await r.db('test').table('author') .eqJoin('id', r.db('test').table('bookauthor'), {index: 'author'}).zip() .eqJoin('book', r.db('test').table('book')).zip().run(conn); }(); 

Attention should be paid to such moments. In this example, the secondary index was implemented using the eqJoin () function, which can use pairs when connecting objects: a primary key with a primary key or a primary key with a secondary key (but not a secondary key with a secondary key). For more complex conditions, the map () function is used, which is an order of magnitude more difficult to understand. The rest of the functions that implement the JOIN are not optimized (it must be assumed that the full enumeration of values ​​is implemented).

The text of the examples is located in the repository .

apapacy@gmail.com
June 4, 2018

Source: https://habr.com/ru/post/413123/


All Articles