DF100

Storage

MongoDB stores data as BSON:

In the above example, you can enforce uniqueness of "Dental" value for the "type" key by creating an index
Additionally, you can put validation on the data itself, but their may be downsides to that

Terminology

namespace is nothing more than database name + collection name that helps you distinguish between two similarly named collections

Benefits of MongoDB

Agility
- The documents within a given collection are not required to have the same schema
  - ex. Data types do not have to match across documents
- MongoDB's take is that the application should be doing the validation, not the database (although it is possible to do so on the DB side)
Usability
- Option to use a wide array of tools and languages to query MongoDB
Utility
- Complex indexed queries, smart edits, aggregation frameworks
Availability and scalability
- HA via replica sets, multiple copies of data, different hosts/locations, continuous replication
- Scale via sharding
  - Partition data over multiple replica sets
  - Provides unlimited hardware scaling
  - Sharding can be complex so it is important to consider this as your data grows
    Do not plan to add sharding on day 1, but around 2 TB is when you want to start having that conversation
- Compression data
Enterprise tooling
- Atlas, Ops manager, cloud manager, K8s, terraform, etc
- Ops manager - Very good to use once you start spinning up multiple replica sets and your environment gets more complicated. It is very difficult to do a PITR correctly on a sharded DB without using this tool

When should MongoDB be used?

High speed access to complex objects
- atomic partial updates
- fast retrieval
- secondary indexes
- aggregation capabilities
When you want to store larger data structures together
- Large arrays
  - Exception: keep arrays under 200 elements for performance
- text fields
- binary data
Rapid development
When you need to store structures of varying shapes
Large data volumes
Distributed data

Things to be aware of

Easy to get things wrong and performance can suffer
DBAs need to be trained and certified
- Devs perform traditional DBA tasks, but DBAs have very important tasks as well

CRUD Operations

If multiple documents satisfy the query for a single document command, it will return the first 1 on disk if it exists
- Searching the index takes precedence
- The same document may not always be returned unless you specify additional criteria

// db.customers.insertOne({
    _id: "[email protected]",
    name: "Bob Smith",
    spend: 0,
    orders: [],
    lastpurchase: null
})

In the above command, mongo will create the customers collection if it does not already exist
at least one ID field must exists and one will be generated for you if you do not specify it
- It is ALWAYS called _id
- This ID must be unique
- You can not change the ID of a document for any reason
- Mongo can and will generate a unique value for you
If you are querying the ID field repeatedly and you know it is going to unique, then it makes sense to make that the ID
- For example, an email value.. although if the value changes you must delete the document
If you let mongoDB generate the ID, it will be unique across the entire database, but it only MUST be unique across the collection
- i.e. you could easily have the same ID across multiple collections if you set it to something like "[email protected]"

let friends = [ 
    {_id: "joe" }, 
    {_id: "bob" }, 
    {_id: "joe" }, 
    {_id: "jen" } 
]

db.collection1.insertMany(friends)

In the above example.. Joe & Bob will all be inserted into the database
- Both Joe (the duplicate record) and Jen are NOT inserted
  - This is the default insertMany behavior
    Atomicity is at the document level
    You can do multi-document transactions, but there are tradeoffs!
    It is also possible to modify the default behavior
    db.collection2.insertMany(friends, {ordered: false} )
    In the above snippet, 3 records would be inserted, which includes "Jen"
Tip: mongo is indexed at 0

If you have to Insert 200 documents, is it better to use insertMany or insertOne in a FOR loop?

insertMany is better because the network roundtrips are lessened

You can insert up to 100,000 documents or 48mb in a single trip

Reading Documents

db.customers.findOne({})
- Returns the first record
db.customers.findOne({name: "Andy Smith"})
- Strings are case-sensitive
db.customers.findOne({name: "Andy Smith", spend: 0})
- The above is an AND operation
db.customers.findOne({name: /smi/i, spend: 0})
- REGEX expressions work, but you should be using a text index or Atlas Search if you use Atlas assuming you do this operation frequently
db.customers.findOne({name: /smi/i, spend: 0.0000})
- For numerical values, regardless of precision, Mongodb will still return the spend for customers with a spend of 0
- In application code, you should use the proper data types as it will be mapped within Mongo

Looking for multiple documents?

db.customers.find({})
- Equivalent of SELECT *
db.customers.find({},{name: 1, spend: 1})
- 1 specifies that the field will be returned
- _id is always returned
- conversely, if you have 0, it will exclude just those fields
- you can NOT mix and match inclusion/exclusion with ONE exception
  - You can explicitly exclude the _id field while also including others
db.customers.find({lastpurchase: null})
- Will return documents even where lastpurchase field does not exist as it implicitly defines the value is null
db.customers.find({gibberish:null})
- Returns every single document
- find vs findOne
  - find will return a cursor that has a maximum of 100 documents or in some drivers, 16mb (C# is 48mb)
    A lot of drivers will obfuscate this behavior and iterate through the cursor for you
  - If no documents are found, it will still return the object, but it will be empty
db.customers.find({}).sort({age: -1}).skip(30).limit(10)
- Commands can be chained
- -1 is descending
- order of chained operations does not matter, but in order to not confuse people, always write by sort, skip, and then limit for best practices
  - You can change this behavior with an "aggregate query", but we have not yet learned what that is
db.customers.find({}).sort({age: -1, plate: -1}).skip(30).limit(10)
.count() vs .countDocuments() vs .countDocuments({})
- they should almost always be the same, but their is a VERY extreme edge case where you are dealing with ultra precise application and high frequency apps
  - .countDocuments() is faster
  - .count()
  - .countDocuments({}) with an empty document cheats and uses the metadata stats, but it could be off by +/- 1
db.people.find({address: {city: "Houston"}})
- This will work ONLY if the field names, field order, and values match identically because mongodb creates a blob of the document
- The reason for this is because in Mongodb allows for 100 levels of nesting so it is more performant to just hash the document
- In the real world, you would typically not use the above syntax, but rather would use the below syntax
- VERY IMPORTANT: querying on an embedded document like above will rehash every single document on the fly, it does not save this hash on an INSERT
- Embedded documents include arrays and nested documents
db.people.find({"address.city": "Houston"})
- when referencing a child field (nested document), you MUST have double quotes around the key in the JSON document in your FIND operation
db.fun.find({hobbies: "rockets"})
- MongoDB will walk an array and return a document if rockets exists in the array
db.fun.find({hobbies: ["rockets, "cars"]})
- Since this is an embedded document, it must be an exact match
db.taxis.find({age: {$gt: 37} } )
- Operator documents are always prefixed with a $
db.taxis.find({age: {$gt: 37}, plate: {$lt: 20, $lte: 50, $ne: 38, $in: [40,44,45] } } )
- This operates in AND behavior
- operators can be on the same field as well
db.pets.find({$or: [{species: "cat", color: "black"},{species: "dog", color: "brown"}] })
- nesting boolean logic is valid
- ORDER within the query document does not matter unless you are referencing embedded documents
- ORDER within the OR array does matter in the sense that the query will stop executing when the first condition is satisfied
  - In terms of performance, it may be faster for your query to start with the condition that is more frequent
db.customers.find({lastpurchase: {$exists: false})
- Returns any document where lastpurchase does not exist
db.customers.find({lastpurchase: {$exists: true, $eq: null})
- Returns any document where last purchase field exists and the value is set to NULL
db.fun.find({hobbies: {$all: ["rockets", "cars"]}})
- returns any documents where the array contains all of those values, but the array itself could contain more values
- It is an AND operator
db.fun.find({hobbies: {$in: ["rockets", "cars"]}})
- OR variant
db.ages.find({age: {$lt: 39, $gt: 21}})
- when testing an array,. the moment you implement an operator document, you are now testing whether the SET of values in the array are going to meet the criteria
- age: [40,20,8] would evaluate as true for the above logic because a value of less than 39 exists and a value of greater than 21 exists
db.ages.find({age: {$elemMatch: {$lt: 39, $gt: 21}}})
- elemMatch is an array operator that will walk the value of the array until it finds one element that must match ALL of the criteria
- The caveat to this is that $elemMatch only executes against an array and will not return documents that have ints

Updating Documents

updateOne vs updateMany
- updateOne(query, change) - changes only the first matching document
- updateMany(query, change) - changes all matching documents

Operators

Ex. {$inc: {score: 50, numGames: 1}, $push: {gameId: 22, winLoss: "win"}}
- Because all of this is being done in the same "mutation" document, you are guaranteed an atomic operation
$set - assign or replace a value on an existing document
- use dot notation to set a field in an embedded document
  - be extra cautious when setting a NEW value inside of an embedded document as you are going to erase the entire existing field and replace it with your new value unless you use the dot notation
  - {$set : { staff: {principal: "jones"} } vs {$set : { "staff.principal": "jones"} }
$unset - remove a field from a document
- the value you set can be a "" blank string, it does not matter
- it does NOT set the value to NULL, it physically removes the data
- Ex. $unset: {"Singer":""}}) is the same as $unset: {"Singer":"myrandomvalue"}})
  - This is because it needs to be the standard across the mongodb environment
$inc / $mul - self explanatory
- Their are no decrement or divide operators
  - Instead you will use a fractual value between 1 and 0 or a negative value
$max / $min - can modify a field depending on its current value
- you could do a read + update, but their are problems with that..
  - problem #1, the value you are trying to update could change between your read and update
  - problem #2, you incur additional load on the database because it is two operations
- $max / $min is essentially like adding a conditional to prevent an update
- $max makes it so that you can avoid an index on a field and increases performance
  - This is more efficient than using $gt
  - You find all objects that satisfy your conditional and on that document evaluate whether the value is less than, if so, update the value, otherwise do nothing
- Most common use cases are dates and numbers
  - e.g. you want to $max date when you want to update a date changed field

Deleting Documents

Recommended to find or findOne what you want to delete before deleting to verify the command is valid
deleteOne() and deleteMany()
- $unset of the UPDATE command only removes a field whereas delete removes a document
replaceOne() is typically not used
- It erases everything on the document except the _id field and replaces it with the content you are trying to set
- Typically you would just use $set

Updating, Locking, and Concurrency

If two processes attempt to update the same document at the same time they are serialised
The conditions in the query must always match for the update to take place
In the example, if the two updates take place in parallel - the result is the same
Locks are at the document level

Ex. In below example, transaction B does nothing

This is where stuff starts to back up and you run out of CPU
Furthermore, every single time the lead blocker runs it's operation, all of the queries in queue must re-evaluate

Advanced Arrays

$push - append and element to the end of an array

Can be used in updateOne and updateMany
Fails if the field is not an array
Creates an array field if it does not already exist
Can be used with multiple modifiers
Ex. db.playlists.updateOne({name:"funky"}, {$push: {name: { artist: "AC/DC, track: "Thunderstruck}}
- name must not be a string, it has to be an array

$pop - removes last or first element from an array

Can be used in updateOne or updateMany commands
Fails if the field is not an array
Removing the first element renumbers all array elements
Ex. db.playlists.updateOne({name:"Funky",{$pop: {tracks: 1}})
- You can also use -1 to delete the first element in the array

$pull - remove specified elements from an array

Elements can be specified by value or condition
Will throw an error if not an array

$addToSet - appends an element to an array if it does not already exist

Does not affect existing duplicates in the array
Elements in the modified array can have any order
Fails if the field is not an array

$each - if you use $push to add an array to an existing array, it will nest the array so you need to use $each to add multiple values

Faster performance than using a FOR loop in code
You can also combine $each and $sort to insert new data in an ordered fashion
- To be clear, this reorganizes the entire array as the push occurs
- Note: the benefits of this command don't make much sense if you can't trust that all applications are modifying documents with $sort as the benefit only comes when you read a document and do not have to specify $sort

$sort and $slice - sort and keep the top (or bottom) N elements

This is an example of a design pattern
Used for high/low lists - high scores, top 10 temperatures, etc
Order of operations applied (left to right) matters, which is contradictory to an earlier discussion

Modifying a specific element in an array

Use the index of the array to target and modify the value

You can also use "hrs.$" syntax to change the first index that matches the condition:

Modifying all matching elements

Query to find documents is not used to decide what elements to change
separate arrayFilters(s) apply update to matching array elements
this example adds 2 to everything less than 1 hr
- nohrs is like a variable in the below example
You must use arrayFilters when updating multiple items within an array

Expressive Updates

Mongo, unlike SQL Server, will actually persist the value of area so that subsequent reads do not have to recalculate the value
Note: if somebody modifies $w or $h field, I am not sure what happens as the instructor did not cover it

Upsert

Most mongodb operations taht update also allow the flag "upsert: true"
Upsert inserts a new document if none are found to update
Values in both the query and update are used to create a new record

fineOneAndUpdate()

To understand this command, you must first understand updateOne()
- updateOne() finds and changes document atomicly and doesn't return the updated document unless you do a fineOne() afterwards (two separate transactions)
Imagine getting the next one-up number from a sequence
fineOneAndUpdate() prevents a potential race condition

PreviousInstructor Led Training NextDF200

Last updated 1 year ago