19

From MongoDB documentation it is mentioned that:

When you need only a subset of fields from documents, you can achieve better performance by returning only the fields you need

How filtering fields affect performance? Is the performance related to the size of data that is transmitted over network? or the size of data that will be hold in memory? How exactly this performance is improved? What is this performance that is been mentioned in documentation?

I have slow MongoDB queries. Is returning a subset affect my slow query (I have compound index on the field)?

Alireza
  • 3,586
  • 10
  • 35
  • 43
  • Without the code , it's not possible to suggest you. what actually going to affect the performance in "MongoDB" projection query. It's always to better at least you mention the code. – Md Haidar Ali Khan Feb 21 '18 at 11:07
  • 1
    @MdHaidarAliKhan it is not about the code and my payload. I just want to know why mongoDB says filtering affects the performance? From what point of view this performance has been measured? For example does it help on memory usage of Mongo because of less data or less disk IO (for instance) and so on. – Alireza Feb 21 '18 at 11:19
  • I just want to know why mongoDB says filtering affects the performance? well, Use Projections to Return Only Necessary Data, I mean to say that you can achieve better performance by returning only that fields you need. For example db.posts.find({},{}).sort({}). – Md Haidar Ali Khan Feb 21 '18 at 11:46
  • ,For example does it help on memory usage of Mongo because of less data or less disk IO (for instance) and so on, well..could you update what MongoDB version and operating system in your environment? – Md Haidar Ali Khan Feb 21 '18 at 12:14
  • @MdHaidarAliKhan OS is `Debian 8`, `MongoDB 3.6.2` – Alireza Feb 21 '18 at 12:29

2 Answers2

27

By default, queries return all fields in matching documents. If you need all the fields, returning full documents is going to be more efficient than having the server manipulate the result set with projection criteria.

However, using projection to limit fields to return from query results can improve performance by:

  • removing unneeded fields from query results (saving on network bandwidth)
  • limiting result fields to achieve a covered query (returning indexed query results without fetching full documents)

When using projection to remove unused fields, the MongoDB server will have to fetch each full document into memory (if it isn't already there) and filter the results to return. This use of projection doesn't reduce the memory usage or working set on the MongoDB server, but can save significant network bandwidth for query results depending on your data model and the fields projected.

A covered query is a special case where all requested fields in a query result are included in the index used, so the server does not have to fetch the full document. Covered queries can improve performance (by avoiding fetching documents) and memory usage (if other queries don't require fetching the same document).

Examples

For demonstration purposes via the mongo shell, imagine you have a document that looks like this:

db.data.insert({
    a: 'webscale',
    b: new Array(10*1024*1024).join('z')
})

The field b might represent a selection of values (or in this case a very long string).

Next, create an index on {a:1} which is a commonly used field queried by your use case:

db.data.createIndex({a:1})

A simple findOne() with no projection criteria returns a query result which is about 10MB:

> bsonsize(db.data.findOne({}))
10485805

Adding the projection {a:1} will limit the output to the field a and the document _id (which is included by default). The MongoDB server is still manipulating a 10MB document to select two fields, but the query result is now only 33 bytes:

> bsonsize(db.data.findOne({}, {a:1}))
33

This query isn't covered because the full document has to be fetched to discover the _id value. The _id field is included in query results by default since it is the unique identifier for a document, but _id won't be included in a secondary index unless explicitly added.

The totalDocsExamined and totalKeysExamined metrics in explain() results will show how many documents and index keys were examined:

 > db.data.find(
     {a:'webscale'}, 
     {a:1}
 ).explain('executionStats').executionStats.totalDocsExamined
 > 1

This query can be improved using projection to exclude the _id field and achieve a covered query using only the {a:1} index. The covered query no longer needs to fetch a ~10MB document into memory, so will be efficient in both network and memory usage:

 > db.data.find(
     {a:'webscale'},
     {a:1, _id:0}
 ).explain('executionStats').executionStats.totalDocsExamined
 0

 > bsonsize(db.data.findOne( {a:'webscale'},{a:1, _id:0}))
 21

I have slow MongoDB queries. Is returning a subset affect my slow query (I have compound index on the field)?

This isn't answerable without the context of a specific query, example document, and the full explain output. However, you could run some benchmarks in your own environment for the same query with and without projection to compare the outcome. If your projection is adding significant overhead to the overall query execution time (processing and transferring results), this may be a strong hint that your data model could be improved.

If it's not clear why a query is slow, it would be best to post a new question with specific details to investigate.

Stennie
  • 10,020
  • 2
  • 27
  • 45
  • 1
    I really appreciate for the thorough explanation of the problem. It seems that it is not possible to have covered queries as my response has much more data than inside of index. My main question is here, I would be happy if you could have a look: https://dba.stackexchange.com/questions/195065/why-mongodb-is-so-slow-on-some-queries-while-using-ixscan-as-a-winning-plan – Alireza Feb 21 '18 at 15:02
2

With a projection, you can achieve a situation where the result set comes directly from the index.

If you have compound index {x:1, y:1, z:1} where none of x,y,z is _id, you need to project {_id:0, x:1, y:1, z:1} because _id is always returned as part of result set (when it's not projected away) and engine needs to read datafiles to get it. This because, index don't have value of _id, only pointer to that document where value is stored.

charles ross
  • 163
  • 10
JJussi
  • 5,558
  • 1
  • 14
  • 19
  • So if I remove `_id` from returned response, does that fit in RAM? Does that help? – Alireza Feb 21 '18 at 14:50
  • 1
    MongoD (tries) to keep at least indexes at memory (and as much data what fits). If you query can be filled directly from index and you project `_id:0` then result is returned fully from RAM, without reading data from disk. – JJussi Feb 21 '18 at 14:59