Thursday 3 May 2018

Mongo DB Schema Design

Hey Guys,

I just learned how to do better schema design in mongo db.

So basically a blog post in mongo db official website suggests 3 ways.


1. One to Few
2. One to Many
3. One to Squillions

So , one can choose which one is the best based your requirements, data(queries & updates) and read - update ratio


1. One to Few

Go for this when you a relation with less reference. In this case you can embed the referencing elements in the main document itself.

Consider an example of person and address. Instead of having address as separate collection, you can embed them in person. it will save read time but a con here is you cannot access them stand alone. Like if you want to get address alone you cannot.  So, choose this if you don't have requirements to pull standalone and few references and cardinality .


2. One to Many

Apt this if you have referencing documents less than 1000.  you can put all object references in an array of referencing object.

Consider an example of Product and it parts . so, you can put all the parts references in product document as an parts[] array. This help to retrieve data faster.  you can also refer the product id in parts side if you need more data access.

If you want to get the data then you should do application level Join on tables.

 db.products.findOne()
{
    name : 'left-handed smoke shifter',
    manufacturer : 'Acme Corp',
    catalog_number: 1234,
    parts : [     // array of references to Part documents
        ObjectID('AAAA'),    // reference to the #4 grommet above
        ObjectID('F17C'),    // reference to a different Part
        ObjectID('D2AA'),
        // etc
    ]

Application level join

// Fetch the Product document identified by this catalog number
> product = db.products.findOne({catalog_number: 1234});
   // Fetch all the Parts that are linked to this Product
> product_parts = db.parts.find({_id: { $in : product.parts } } ).toArray() ;

You can still denormalise this by placing needed fields in the product parts array like below but one drawback is that  if you change name of part collection which is referenced in product array then you need to update name in the parts array of entire product collection.
> db.products.findOne()
{
    name : 'left-handed smoke shifter',
    manufacturer : 'Acme Corp',
    catalog_number: 1234,
    parts : [
        { id : ObjectID('AAAA'), name : '#4 grommet' },         // Part name is denormalized
        { id: ObjectID('F17C'), name : 'fan blade assembly' },
        { id: ObjectID('D2AA'), name : 'power switch' },
        // etc
    ]
}
Making easier to get the part names would increase a bit of client-side work to the application-level join:
// Fetch the product document
> product = db.products.findOne({catalog_number: 1234});
  // Create an array of ObjectID()s containing just the part numbers
> part_ids = product.parts.map( function(doc) { return doc.id } );
  // Fetch all the Parts that are linked to this Product
> product_parts = db.parts.find({_id: { $in : part_ids } } ).toArray() ;

3. One to Squillions

Choose this if you have more than 1000 reference objects and you think it exceeds 16 MB of document  size.  Place the One side id in every document of referencing side. Its an example of parent-referencing.

> db.hosts.findOne()
{
    _id : ObjectID('AAAB'),
    name : 'goofy.example.com',
    ipaddr : '127.66.66.66'
}

>db.logmsg.findOne()
{
    time : ISODate("2014-03-28T09:42:41.382Z"),
    message : 'cpu is on fire!',
    host: ObjectID('AAAB')       // Reference to the Host document
}
Application Level Query :

// find the parent ‘host’ document
> host = db.hosts.findOne({ipaddr : '127.66.66.66'});  // assumes unique index
   // find the most recent 5000 log message documents linked to that host
> last_5k_msg = db.logmsg.find({host: host._id}).sort({time : -1}).limit(5000).toArray()






No comments:

Post a Comment