🥒
Dill's Knowledge Base
  • Hello World
  • 💻SQL
    • ❌Error Handling
    • 🧀Parameter Sniffing
      • Indexes
      • Query Hints
      • RECOMPILE
      • Branching
      • Memory Grants
      • Summary
      • Bonus
    • SQL Server Buffer Pool
  • 🖱️MongoDB
    • Instructor Led Training
      • DF100
      • DF200
      • DF300
      • DF400
    • MongoDB DBA University
      • DBA Admin Tools
      • DBA Basics
      • Metrics & Monitoring
  • 💻Web Design
    • Oxygen Tips
    • Bricks Builder
      • Tips
      • Discovery Call
      • Utility vs Custom Classes
      • Math Functions
      • Static vs Relative Units
  • Azure
    • AZ-900
      • Benefit of Cloud Computing
      • CapEx, OpEx and Consumption-based
      • Differences Between Cloud Service Categories
      • Identify The Right Service Type
      • Differences Between Types of Cloud Computing
      • Reliability and Predictability
      • Regions and Region Pairs
      • Availability Zones
      • Resource Groups
      • Subscriptions
      • Management Groups
      • Azure Resource Manager
      • Azure ARC
      • Resources Required for VM
      • Benefits and Usage of Core Compute Resources
      • Benefits and Usage of Core Network Resources
      • Public/Private Endpoints
      • Benefits and Usage of Storage Accounts
      • Benefits and Usage of Database Resources
      • Data Movement and Migration Options
      • Benefits and Usage of IoT Services
      • Benefits and Usage of Big Data and Analytics Services
      • Benefits and Usage of AI Services
      • Benefits and Usage of Serverless Technologies
      • Benefits and Usage of DevOps Technologies
      • Functionality of Azure Management Solutions
      • Functionality and Usage of Azure Advisor
      • Functionality and Usage of ARM Templates
      • Functionality and Usage of Azure Monitor
      • Functionality and Usage of Azure Service Health
      • Functionality of Microsoft Defender for Cloud
      • Functionality and Usage of Key Vault
      • Functionality and Usage of Microsoft Sentinel
      • Azure Dedicated Host
      • Defense in Depth
      • Describe the Concept of Zero Trust
      • Functionality and Usage of NSGs
      • Functionality and Usage of Azure Firewall
      • Functionality and Usage of Azure DDoS Protection
      • Explain Authentication and Authorization
      • Functionality and Usage of Azure AD
      • Microsoft Entra Overview
      • Functionality of Conditional Access, MFA and SSO
      • Functionality and Usage of RBAC
      • Functionality and Usage of Resource Locks
      • Functionality and Usage of Tags
      • Functionality and Usage of Azure Policy
      • Governance Hierarchy Constructs
      • Azure Blueprints
      • Describe Microsoft Privacy Statement, OST and DPA
      • Purpose of Trust Center and Azure Compliance Documentation
      • Purpose of Azure Sovereign Regions
      • Factors That Affect Costs
      • Factors to Reduce Cost
      • Functionality and Usage of Azure Cost Management
      • Purpose of Service Level Agreements
    • DP-900
      • Study Cram
    • DP-300
      • Deploy IaaS Soluton with Azure SQL
  • 📦Kubernetes
    • Udemy: Kubernetes for Beginners
Powered by GitBook
On this page
  • Expression Stages
  • $project
  • $set
  • $group
  • $lookup
  • Output Stages
  • $out
  • $merge
  • $set
  • $replaceRoot
  • $sample
  • Database Internal Stats Stages
  • Aggregation Optimization
  • Aggregation and Memory Usage
  • Schema Validation
  • Change Streams
  • Sessions
  • Retryable Read and Writes
  • Multi-Document Transactions
  • Bulk Write Operations
  • Views
  • BSON Data Types
  • NULL Handling
  • Collation and Sort Ordering
  • Sorting by Objects
  • Sorting by Arrays
  • Type Bracketing
  • Locking
  • Longer Locking Semantics
  • Developer Best Practices
  • Idempotent Operations
  • Object Document Mappers (ODM)
  1. MongoDB
  2. Instructor Led Training

DF300

PreviousDF200NextDF400

Last updated 1 year ago

Expression Stages

$project

  • Good for creating brand new fields or renaming fields

  • Only sends new fields you specify from one stage to the next

$set

  • $set is very similar to project, but the major difference is that $set will forward ALL of the fields of the document as well as the ones you $set

  • From a performance perspective, the instructor said that $project is not necessarily faster than $set even though you are likely returning less fields in $project

$group

  • Roughly equivalent to GROUP BY in SQL

  • Groups documents by a "group key"

  • Output is one document per group

  • Additional fields can contain results of accumulator expressions

  • Common $group accumulators

    • $addToSet, $avg, $first, $last, $max, $min, $mergeObjects, $push, $stdDevSamp, $sum

    • Also.. $bucket, $bucketAuto, $facet

      • $bucket is good to segregate data

        • ex. Group data by month

    • $sortByCount

      • Really common for DBAs and developers

      • Returns a frequency count of a field across all of your documents

    • $unwind

      • Opposite of $group, it will unwind an array into multiple documents with the same _id

        • Because this is a stage during the pipeline in memory, this does not cause any issues

$lookup

  • Like a left outer join, but much more closely aligns to a nested select in terms of performance

    • MongoDB is NOT a relational database!

  • Good for analytic purposes mostly

    • Their is one edge case that is acceptable in OLTP applications..

      • ex. When one collection is very static and another is extremely active (updates every few milliseconds)

  • No referential integrity is enforced with this operation

  • If their is more than one match on the right side of the join, it will put the results into an array

  • For each document in the left side of the join, you will have to iteratively scan the right side collection

    • You MUST index the foreign field (right side)

    • It is also recommended to index the local field (left side)

  • If their is no match, their is nothing to attach the join to and thus that field will not be attached

  • Be careful of cross joining NULL values as NULL is considered a value in Mongo

Output Stages

$out

  • Write results to a new collection

$merge

  • Update an existing collection

    • Add new documents or edit existing

  • Updates existing fields or appends new ones

$set

  • Add extra fields without $projecting all of them

$replaceRoot

  • Create a whole new shape of top-level document

  • Very similar to $merge, but $replaceRoot will just fully replace the document

  • _id remains the same

$sample

  • Choose a random set of docs from the input

Database Internal Stats Stages

Requires elevated permissions to access diagnostic information

  • $collStats

    • Describe collection statistics

  • $currentOp

    • List ongoing database operations

  • $indexStats

    • Are people actually using indexes since boot

  • $listSessions

    • Show ho is connected and what are they doing

  • $planCacheStats

    • Show what query shapes are cached and the query plans

Aggregation Optimization

Common mistakes:

  • Often developers will think that $project reduces the fields outputted into the next stage, but Mongo will actually use the entire document throughout the pipeline

    • Generally speaking, $project should be towards the end

      • This doesn't actually have any performance impact, but it makes it clear that their is no advantage to putting it at the beginning

  • Stages can be streaming or blocking

    • A blocking stage will block subsequent streaming stages

Aggregation and Memory Usage

  • Documents inside the pipeline can be up to 64mb in size

  • Final stage must be 16mb or less as they need to be BSON

    • This can be broken during the pipeline/computation phase, but at the end of the process you have to be under 16mb

  • Blocking stages can use up to 100mb of heap RAM

    • If you need more, you can write to disk

      • $sort, $group, $bucket, and $bucketAuto can use disk with allowDiskUse:true

        • WARNING: THIS WILL INCREASE IOPs FOR THE ENTIRE DATBASE AND IT SHOULD BE AVOIDED AT ALL COSTS IN PRODUCTION

        • You can see this in db.currentOp and in the explain plan look for that specific type of operation

Schema Validation

  • Moderate validation allows documents out of validation to stay out of validation

  • Strict validation does not retroactively reapply to everything on disk

  • The rules are applied on new updates

Change Streams

  • Reads from the op log

  • Listen for writes to a collection, database, or instance

    • Filter notifications we want to receive

    • On notification, receive the delta or whole document (before or after change)

    • Can stop listening and resume where we left off

  • Drivers handle change stream differently

    • Some block until an event with or without a timeout

      • For C#, the default is technically blocking, but if you have an async methodology you can change that

      • Atlas triggers uses this same technology

    • Some allow to ask if there are any changes or not by setting timeout zero

    • Provide a function to be called if there are changes

Sessions

  • Support retryable writes and causal consistency

    • Diff than eventual consistency.. in mongo things have to occur in the correct order

  • Logical server sessions that can be synchronized between clients

    • Sessions are kept track of on the database side

  • Allows them to have the same causal view of the world

    • Allows read your own writes

      • findOneAndUpdate()

    • One sessions knows another's writes have completed

    • Allows retryable writes - ability to safely auto-retry a failover

    • Underpin transaction handling

  • Ultimately sessions revolve around a defined timestamp in the internal operation log

Retryable Read and Writes

  • Retryable writes automatically retry a write in the event of HA failover

    • Default for 4.2+ drivers and Atlas connection strings

    • Operations can be made idempotent but a simple $inc:{a:1} isn't

    • Retryable writes, with implicit sessions are able to ensure things happen once

    • Waits until the cluster is available then repeats the single write

    • Checks that the newly elected Primary has not already seen the operation

  • Retryable reads redo a read sent to a server if that if that server fails before responding

    • Does not apply to getmore operations

  • If a secondary goes down and comes back up, it will automatically recover itself

Multi-Document Transactions

  • Most locking occurs at the document level

  • Enable full ACID properties in MonogDB

    • Snapshot level isolation

    • Require at least a replica set

    • Work across sharded clusters from MongoDB 4.2 (distributed data)

  • Be aware transactions should not be the default approach

    • Can introduce contention

    • Might be used for <1% of update operations or less

  • Correct schema design often provides a simpler, more efficient solution

  • The majority of MongoDB developers do not use Transactions

  • Deadlocks do occur in Mongo

    • The second transaction is always the one that is deadlocked

  • Writes block writes, but not other reads

Bulk Write Operations

  • Group together multiple write operations in a single network call

    • insertMany is a bulk operation

    • updateMany is a SINGLE operation

    • bulkWrite API sends many changes in a single command

  • Avoids latency cost of network round trips

  • Can be ordered or unordered:

    • Unordered is faster on a sharded cluster as parallel

    • Ordered must stop on error

  • Can be used inside a transaction

Views

  • Views are virtual collections

    • Same concept as an RDBMS

    • Strictly read-only

    • Defined by an aggregation pipeline

    • Hard to index

      • You can not index the view itself

      • Views are not materialized

    • Allow us to encapsulate aggregation pipelines on a server

    • Have own security and can be used to redact fields

    • To materialize a view, you basically just need to create a new collection and use MERGE to maintain it

      • Of course.. you are adding more data to disk

BSON Data Types

  • Using the correct data type is important

    • Avoid data loss/rounding, e.g. double versus decimal

    • Enable range queries

    • Minimize data sizes

  • Default number in mongosh is int32 or double (if initialized with a decimal point)

    • ObjectId() is a compact data type - never store it as a string

    • Always store Dates as dates, not strings

    • Binary() is a useful data type

    • Booleans should not be strings

    • Special types: MinKey, MaxKey, Timestamp, Regex

NULL Handling

Collation and Sort Ordering

  • All text in MongoDB is unicode stored as UTF8

    • 1-9, A-Z, a-z, etc etc

  • Default sort order in MongoDB is the unicode code point order

  • The collation changes the sort order to be correct for a language

    • Can define a collation for a collection or an index, or a query

    • Can say whether to have case sensitivity in sort and query

    • Can say whether to match diacritics or not in sort/query

  • MongoDB also explicitly defines how different data types order:

    • NULL < Numbers < Strings < Objects

      • NULLs come first ^

  • If indexing text fields like a name, you need to make sure your queries match your index collations!

Sorting by Objects

  • Objects can be compared to each other

  • When objects are compared, the field order matters

    • Some programming languages don't preserve the field order by default

    • Objects are compared field by field

    • If field names differ, then the order is by first changed field name

    • If values differ, then the order is by value

      • {a : 1} < { a:1, b:2} < {b:1, a:1}

  • When ordering...

    • 1) Fields first (not values)

    • 2) If fields match, THEN you look at values

  • The order of the fields in each document MATTERS!

Sorting by Arrays

  • When querying an array, the semantics refers to the whole array or any member

  • So {$gt: 5} is true where any array member is greater than 5

  • When sorting, the highest or the lowest value in the array is used as the sort value

  • This also applies to strings

Ex.

Type Bracketing

  • MongoDB auto converts numeric data types when comparing

    • Searching for 5 will find Integer, Long, double, or decimal versions

    • It will NOT find "5" as a string

    • Indexes are created and stored in a special format to facilitate this - not BSON

    • Also works for range queries and sorting

  • Type bracketing prevents comparison of different type brackets in find *and $match)

    • ALLOWS comparison in aggregation and $expr

Locking

Longer Locking Semantics

  • Database locks are very short lived

  • Your application may need to reserve a document whilst manually editing it

  • One thing you could do is update a field and set a "status" field to something like "being_edited"

    • The problem with this is that other apps might not respect this behavior as it is not enforcable

    • Another issue is that an app could abandon the document during a crash

  • One way to improve the above process is to add owners and timestamps

    • Be careful of a race condition where new are prioritized over abandoned

    • If using this without a specific _id, what function do you need and why?

Ex.

Developer Best Practices

Idempotent Operations

  • For scale, operations need to be independent of each other

  • An operation changing many things should not be in a transaction

  • Use roll-forward, not roll-back

  • These write operation are idempotent:

    • Inserts with a supplied _id (2nd attempt fails)

    • Deletes cannot delete a thing twice

    • Updates is using explicitly set values

  • Set {paid:false} in 100,000 documents with updateMany() if it fails part way through, we can retry the whole operation

    • For efficiency, we might only set it where it's not set already

  • non-updating updates do NOT lock the document

Object Document Mappers (ODM)

🖱️