graphql - database query optimization

When working with databases, there is a problem which is called “SELECT N + 1” - this is when the application instead of a single database query that selects all the necessary data from several related tables, collections, makes an additional subquery for each row of the result of the first query, to get related data. For example, first we get a list of university students in which his specialty is identified by an identifier, and then for each of the students we make an additional subquery into a table or collection of specialties in order to get the name of the specialty by the specialty identifier. Since each of the subqueries may require another subquery, and another subquery - the number of database queries begins to grow exponentially.

When working with graphql, it’s very easy to generate the “SELECT N + 1” problem if you make a subquery to the related table in the resolver function. The first thing that comes to mind is to make a request at once, taking into account all the related data, but this, you see, is irrational if the associated data is not requested by the client.

One solution to the “SELECT N + 1” problem for graphql will be covered in this post.

For example, take two collections: "Authors" (Author) and "Books" (Book). Communication, as it should be supposed, is many-to-many. One Author may have several Books, and one Book may be written by several Authors. We will use the mongodb database and the mongoose.js library to store information.

The connection between the many-to-many collections is realized with the help of the auxiliary collection “BookAuthor” and “virtual” fields.

// Author.js const mongoose = require('mongoose'); const Schema = mongoose.Schema; const schema = new Schema({ name: String }); schema.virtual('books', { ref: 'BookAuthor', localField: '_id', foreignField: 'author' }); module.exports = schema; 

 // Book.js const mongoose = require('mongoose'); const Schema = mongoose.Schema; const schema = new Schema({ title: String }); schema.virtual('authors', { ref: 'BookAuthor', localField: '_id', foreignField: 'book' }); module.exports = schema; 

 // BookAuthor.js const mongoose = require('mongoose'); const Schema = mongoose.Schema; const schema = new Schema({ author: { type: mongoose.Schema.Types.ObjectId, ref: 'Author' }, book: { type: mongoose.Schema.Types.ObjectId, ref: 'Book' } }); module.exports = schema; 

 // mongoSchema.js const mongoose = require('mongoose'); const Author = require('./Author'); const Book = require('./Book'); const BookAuthor = require('./BookAuthor'); mongoose.connect('mongodb://localhost:27017/books') mongoose.set('debug', true); exports.Author = mongoose.model('Author', Author); exports.Book = mongoose.model('Book', Book); exports.BookAuthor = mongoose.model('BookAuthor', BookAuthor); 

Now we define the types Author and Book in graphql. There is a small problem with the fact that these types mutually refer to each other. Therefore, for their mutual access, the binding of links to the object of the exports module is used, and not the binding of the new object to module.exports (which replaces the original object), and also the fields field is implemented as a function, which allows you to “postpone” reading the link to the object when it is created until the point when all circular links become available:

 // graphqlType.js exports.Author = require('./Author'); exports.Book = require('./Book'); 

 // Author.js const graphql = require('graphql') const graphqlType = require('./index') module.exports = new graphql.GraphQLObjectType({ name: 'author', description: '', fields: () => ({ _id: {type: graphql.GraphQLString}, name: { type: graphql.GraphQLString, }, books: { type: new graphql.GraphQLList(graphqlType.Book), resolve: obj => obj.books && obj.books.map(book => book.book) } }) }); 

 // Book.js const graphql = require('graphql') const graphqlType = require('./index') module.exports = new graphql.GraphQLObjectType({ name: 'book', description: '', fields: () => ({ _id: {type: graphql.GraphQLString}, title: { type: graphql.GraphQLString, }, authors: { type: new graphql.GraphQLList(graphqlType.Author), resolve: obj => obj.authors && obj.authors.map(author => author.author) } }) }); 

Now we will define the request of Authors, perhaps, with the list of their books, and, possibly, with the list of authors (co-authors) of these books.

 const graphql = require('graphql'); const getFieldNames = require('graphql-list-fields'); const graphqlType = require('../graphqlType'); const mongoSchema = require('../mongoSchema'); module.exports = { type: new graphql.GraphQLList(graphqlType.Author), args: { _id: { type: graphql.GraphQLString } }, resolve: (_, {_id}, context, info) => { const fields = getFieldNames(info); const where = _id ? {_id} : {}; const authors = mongoSchema.Author.find(where) if (fields.indexOf('books.authors.name') > -1 ) { authors.populate({ path: 'books', populate: { path: 'book', populate: {path: 'authors', populate: {path: 'author'}} } }) } else if (fields.indexOf('books.title') > -1 ) { authors.populate({path: 'books', populate: {path: 'book'}}) } return authors.exec(); } }; 

To determine which fields came from the client, the graphql-list-fields library is used. And if a request comes with nested objects, the mongoose library's populate () method is called.

Now we experiment with queries. Maximum possible request for our implementation:

 { author { _id name books { _id title authors { _id name } } } } 

It will be executed by 5 calls to the database:

 authors.find({}, { fields: {} }) bookauthors.find({ author: { '$in': [ ObjectId("5b0fcab305b15d38f672357d"), ObjectId("5b0fcabd05b15d38f672357e"), ObjectId("5b0fcac405b15d38f672357f"), ObjectId("5b0fcad705b15d38f6723580"), ObjectId("5b0fcae305b15d38f6723581"), ObjectId("5b0fedb94ad5435896079cf1"), ObjectId("5b0fedbd4ad5435896079cf2") ] } }, { fields: {} }) books.find({ _id: { '$in': [ ObjectId("5b0fcb7105b15d38f6723582") ] } }, { fields: {} }) bookauthors.find({ book: { '$in': [ ObjectId("5b0fcb7105b15d38f6723582") ] } }, { fields: {} }) authors.find({ _id: { '$in': [ ObjectId("5b0fcab305b15d38f672357d"), ObjectId("5b0fcad705b15d38f6723580") ] } }, { fields: {} }) 

As you can see, the function mongoose.js - populate () - does not use the relatively new mongodb feature - $ lookup, but creates additional queries. But this is not a problem "SELECT N + 1" because A new query is created not for each row, but for the entire collection. (The desire to check how mongoose.js populate () function actually works — with one query or several — was one of the reasons for choosing a non-relational base for this example).

If we use a minimalist query:

 { author { _id name } } 

then it will generate only one database access:

  authors.find({}, { fields: {} }) 

This, in fact, I wanted. In conclusion, I will say that when I began to look for solutions for this task, I found very convenient and advanced libraries that solve this problem. One of them, for example, which I really liked, based on the structure of the relational database, formed a graphql diagram with all the necessary operations. However, this approach is valid if graphql is used on the backend side of the application. If you open access to such services from the frontend of the application (which I needed), then this is similar to placing an admin in the open access to the database server, since all tables become available out of the box

For the convenience of readers, the working example is located in the repository .

Addition on the comment of user joniks

User joniks in the feed referred to the https://github.com/facebook/dataloader library. Let's see how to allow this library to cope with the problem "SELECT N + 1"

Given this library, the definition of the graphql Authors type will look like this:

 // Autors.js const graphql = require('graphql') const DataLoader = require('dataloader') const graphqlType = require('./index') const mongoSchema = require('../mongoSchema'); const bookLoader = new DataLoader(async ids => { const data = await mongoSchema.Book.find({ _id: { $in: ids }}).populate('authors').exec(); const books = data.reduce((obj, item) => (obj[item._id] = item) && obj, {}) const response = ids.map(id => books[id]); return response; }); module.exports = new graphql.GraphQLObjectType({ name: 'authors', description: '', fields: () => ({ _id: {type: graphql.GraphQLString}, name: { type: graphql.GraphQLString, }, books: { type: new graphql.GraphQLList(graphqlType.Books), resolve: obj => obj.books && obj.books.map(book => bookLoader.load(book.book)) } }) }); 


What is the point of using this library: single bookLoader.load (id) requests are accumulated and sent for processing with an array of identifiers const bookLoader = new DataLoader (async ids => {...
At the output, we must return promises of array or array of promises which are arranged in the same order as the input array ids.

Now we can rewrite our Authors request like this:

 // authors.js const graphql = require('graphql'); const getFieldNames = require('graphql-list-fields'); const graphqlType = require('../graphqlType'); const mongoSchema = require('../mongoSchema'); module.exports = { type: new graphql.GraphQLList(graphqlType.Authors), args: { _id: { type: graphql.GraphQLString } }, resolve: (_, {_id}, context, info) => { const fields = getFieldNames(info); const where = _id ? {_id} : {}; const authors = mongoSchema.Author.find(where).populate('books') return authors.exec(); } }; 


As a result, we can conduct a query of related objects of arbitrary nesting level without worrying about the SELECT N + 1 problem (though at the price of an unconditional call to populate (), even where it was not needed):

 { authors { _id name books { _id title authors { _id name books { _id title authors { _id name } } } } } } 


But here you need to really understand that if we move on to working with SQL servers, then there will be one aggregated query for each level of nesting of objects. At the same time, sometimes it is required that it was nevertheless exactly one request. But this is not so direct when using the dataloader library. The modified example is available in the data-loader repository branch.

apapacy@gmail.com
May 31, 2018

Source: https://habr.com/ru/post/412847/


All Articles