🥒
Dill's Knowledge Base
  • Hello World
  • 💻SQL
    • ❌Error Handling
    • 🧀Parameter Sniffing
      • Indexes
      • Query Hints
      • RECOMPILE
      • Branching
      • Memory Grants
      • Summary
      • Bonus
    • SQL Server Buffer Pool
  • 🖱️MongoDB
    • Instructor Led Training
      • DF100
      • DF200
      • DF300
      • DF400
    • MongoDB DBA University
      • DBA Admin Tools
      • DBA Basics
      • Metrics & Monitoring
  • 💻Web Design
    • Oxygen Tips
    • Bricks Builder
      • Tips
      • Discovery Call
      • Utility vs Custom Classes
      • Math Functions
      • Static vs Relative Units
  • Azure
    • AZ-900
      • Benefit of Cloud Computing
      • CapEx, OpEx and Consumption-based
      • Differences Between Cloud Service Categories
      • Identify The Right Service Type
      • Differences Between Types of Cloud Computing
      • Reliability and Predictability
      • Regions and Region Pairs
      • Availability Zones
      • Resource Groups
      • Subscriptions
      • Management Groups
      • Azure Resource Manager
      • Azure ARC
      • Resources Required for VM
      • Benefits and Usage of Core Compute Resources
      • Benefits and Usage of Core Network Resources
      • Public/Private Endpoints
      • Benefits and Usage of Storage Accounts
      • Benefits and Usage of Database Resources
      • Data Movement and Migration Options
      • Benefits and Usage of IoT Services
      • Benefits and Usage of Big Data and Analytics Services
      • Benefits and Usage of AI Services
      • Benefits and Usage of Serverless Technologies
      • Benefits and Usage of DevOps Technologies
      • Functionality of Azure Management Solutions
      • Functionality and Usage of Azure Advisor
      • Functionality and Usage of ARM Templates
      • Functionality and Usage of Azure Monitor
      • Functionality and Usage of Azure Service Health
      • Functionality of Microsoft Defender for Cloud
      • Functionality and Usage of Key Vault
      • Functionality and Usage of Microsoft Sentinel
      • Azure Dedicated Host
      • Defense in Depth
      • Describe the Concept of Zero Trust
      • Functionality and Usage of NSGs
      • Functionality and Usage of Azure Firewall
      • Functionality and Usage of Azure DDoS Protection
      • Explain Authentication and Authorization
      • Functionality and Usage of Azure AD
      • Microsoft Entra Overview
      • Functionality of Conditional Access, MFA and SSO
      • Functionality and Usage of RBAC
      • Functionality and Usage of Resource Locks
      • Functionality and Usage of Tags
      • Functionality and Usage of Azure Policy
      • Governance Hierarchy Constructs
      • Azure Blueprints
      • Describe Microsoft Privacy Statement, OST and DPA
      • Purpose of Trust Center and Azure Compliance Documentation
      • Purpose of Azure Sovereign Regions
      • Factors That Affect Costs
      • Factors to Reduce Cost
      • Functionality and Usage of Azure Cost Management
      • Purpose of Service Level Agreements
    • DP-900
      • Study Cram
    • DP-300
      • Deploy IaaS Soluton with Azure SQL
  • 📦Kubernetes
    • Udemy: Kubernetes for Beginners
Powered by GitBook
On this page
  • Indexes and Optimization
  • Index Prefix Compression
  • Index Misconceptions
  • When to use an Index
  • Using and choosing indexes
  • Explain Verbosity
  • Creating an Index
  • Index Types
  • Single-Field Indexes
  • Unique Indexes
  • Partial Indexes
  • Sparse Indexes
  • Hashed Indexes
  • Multikey Indexes
  • Compound Indexes
  • Time to Live (TTL) Indexes
  • Listing Indexes
  • Performance
  • Index Limitations
  • Indexing Arrays
  • Finding Slow Operations
  • Aggregations
  • Dollar Overloading
  • Expressions
  1. MongoDB
  2. Instructor Led Training

DF200

Indexes and Optimization

  • Values in an index point to document identity

    • As a result, if the document moves the index doesn't change

Index Prefix Compression

  • MongoDB indexes use a special compressed format and you can not change it

  • Each entry is a delta from the previous one

  • If there are identical entries, they need only one byte

  • As indexes are inherently sorted, this makes them much smaller

  • Small indexes require less RAM

  • Indexes and collections exist in memory

Index Misconceptions

  • MongoDB is so fast that it doesn't need indexes

  • Every field is automatically indexed

    • The only field is automatically indexed is _id

  • NoSQL uses hashes, not indexes

When to use an Index

  • A good index should support every query or update

  • Scanning records is very inefficient, even if it is not all of them

  • The developer should determine the best index

Using and choosing indexes

  • Mongodb checks in the plan cache to see if an optimal index has been chosen before and if not:

    • picks all candidate indexes

    • runs query using them to score

  • Plan cache entries are evicted when:

    • using that index becomes less efficient

    • a new relevant index is added

    • the server is restarted

Explain Verbosity

  • queryPlanner shows the winning query plan but does not execute query

  • executionStats executes the query and gathers stats

  • allPlansExecutions runs all candidate plans and gathers stats

  • default is "queryPlanner"

Ex.

  • key metrics are in green

    • nReturned is number of documents returns

    • totalKeysExamined is number of index keys examined

    • totalDocsExamined is number of documents examined

  • explain plans are complicated with nested stages of processing

Applying Explain()

  • A flag is sent to the server with the operation to say it's an explain command

  • If the fucntion does not return a cursor, the explain flag needs to be set in the colleciton object instead before calling the function

  • We can set this one on the cursor as we do for sort() or limit()

Explainable Operations:

Creating an Index

  • db.listings.createIndex({num ber_of_review:1})

    • name defaults to the field, but you can rename it

  • In production, you need to be careful running this as it will cause blocking

    • Their are workarounds, but the training does not talk about this

Index Types

Single-Field Indexes

  • Specified as field name and direction

  • The direction is irrelevant for a single field index

  • The field itself can be any data type including an object

    • Indexes the whole object essentially as a comparable blob

    • Can be used to run range searches or exact matches

  • Indexing an array is also possible, but it will be covered later

Unique Indexes

  • Indexes can enforce a unique constraint with the following flag:

    • {unique: true}

  • NULL is a value and so only one record can have a NULL in unique field

Partial Indexes

  • Partial indexes index a subset of documents based on values

  • Can greatly reduce index size

    • {partialFilterExpression: {archived: false } }

Sparse Indexes

  • Sparse indexes don't index missing fields

    • Ex. db.scores.createIndex({score:1},{sparse:true})

  • These indexes are superseded by partial indexes

Hashed Indexes

  • Hashed indexes index a 20 byte md5 of the BSON value

  • Support exact match only!

  • Cannot be used for unique constraints

  • Can potentially reduce index size if original values are large

  • Downside: random values in a BTree use excessive resources

    • Ex. ({name: "hashed"})

Multikey Indexes

  • A multikey index is an index that is on an array

  • Can index primitives, documents, or sub-arrays

  • Are created using createIndexes() just like single-field indexes

    • If any field in the index is ever found to be an array than the index is multikey

  • An index entry is created on each unique value found in the array

Compound Indexes

  • Based on more than one field

    • Most common type of index

    • Same concept as RDBMS indexes

    • Up to 32 fields

    • Created like a single field index

  • The field order and direction is very important

  • Can be used as long as the first field in index is in the query

    • Other fields in the index do not need to be in the query

Field Order Matters

  • Equality First

    • What fields, for a typical query, are filtered the most

    • Selectivity is NOT cardinaility, selective can be a boolean choice

    • Ex. Normally male/female is not selective

    • Dispatched vs delivered is selective though

  • Then sort and range

    • Sorts are much more expensive without indexes

    • Directions matter when doing range queries

ex:

  • equality, sort, range (ESR) - methodology used 85% of the time

    • 15% of the time, you may want to use ERS instead

Ex.

  • In above example, the index should swap and put the equality operator first like so:

  • Another example:

If you are creating a multi-key index, only one of those values can be an array

  • An error will be thrown if you try to index on multiple arrays within the same document

  • The error thrown can happen on either an INSERT or the creation of the index itself

Time to Live (TTL) Indexes

  • Not a special index, but rather a flag

  • MongoDB automatically deletes documents using the index based on time

  • Background thread in server runs regularly to delete expired documents

  • Alternative to the above example, you can use a field and set expireAfterSeconds: 0

  • Sometimes it is better to write your own programmatic data cleaner and schedule via CRON

Listing Indexes

  • getIndexes() on a collection gives us the info of existing indexes

Performance

  • Indexes improve read performance and potentially write performance

  • Each index adds ~10% overhead for writes (Hashed indexes, multikey indexes, text indexes, and wildcard indexes can add more)

  • An index is modified any time a document:

    • Is inserted (applies to most indexes)

    • Is delete (applies to most indexes)

    • Is updated in such a way that its indexed fields change

Index Limitations

  • Up to 64 indexes per collection, but you should avoid getting even remotely close to that

    • Write performance degrades to unusable between 20 and 30

  • 4 indexes per collection is a good recommended number, but the instructor recommended less than 10 in the real world

Indexing Arrays

  • 1st example below uses index 1

  • 2nd example below uses index 1

  • 3rd example below uses index 2

  • elements in a nested array cannot be indexed, but you can change the structure like this and index it:

Finding Slow Operations

  • Missing index is usually the most common cause

  • Bad client code is a close second cause

  • Two ways to locate:

    • database log

      • Gets last 1024 log entries

      • Returns a single document

    • profiling a database

  • You can filter on "s" attribute for severity

  • You can also download all of the logs directly from the mongo atlas

  • db.setProfilingLevels(level,slowms)

    • 0 - profiler is off

    • 1 - store only slow queries

    • 2 - store all queries

    • slowms: threshold in milliseconds for slow operations

Aggregations

  • Aggregation allows us to compute new data

    • Calculated fields

    • summarized and grouped values

    • reshaped documents

  • Aggregation is a pipeline

    • Each transformation is a single step known as a stage

    • This is easier to:

      • debug

      • understand

      • rewrite and optimize

    • Think of a powershell pipe!

Stages:

Ex.

  • Advised to put the $project at the end

  • Everything done in aggregate is done in memory

  • Unlike a find command, Aggregate is more controllable and reads from top to bottom

Dollar Overloading

  • In above image, the 3rd line reads like this:

    • Set a new field in the document called area with a value that is 5x10

  • 1st line

    • Match stage means you are filtering

  • 2nd line

    • You are altering or adding a new field

  • 4th line

    • Set a new field and use a $map function against prices, assigning p as your iterator variable and walk over the array multiplying each value $$p by 1.08

Expressions

  • In the above, the most nested value gets executed first and it goes up the chain

  • Other arithmetic expressions:

    • $subtract, $add, $multiply, $divide

  • String expressions:

    • $concat, $ltrim, $indexOf, etc

  • Expressions can affect index usage just like SQL Server

General recommendation when writing aggregations is to break up the commands into stages/variables:

PreviousDF100NextDF300

Last updated 1 year ago

🖱️