DF200

Indexes and Optimization

Values in an index point to document identity
- As a result, if the document moves the index doesn't change

Index Prefix Compression

MongoDB indexes use a special compressed format and you can not change it
Each entry is a delta from the previous one
If there are identical entries, they need only one byte
As indexes are inherently sorted, this makes them much smaller
Small indexes require less RAM
Indexes and collections exist in memory

Index Misconceptions

MongoDB is so fast that it doesn't need indexes
Every field is automatically indexed
- The only field is automatically indexed is _id
NoSQL uses hashes, not indexes

When to use an Index

A good index should support every query or update
Scanning records is very inefficient, even if it is not all of them
The developer should determine the best index

Using and choosing indexes

Mongodb checks in the plan cache to see if an optimal index has been chosen before and if not:
- picks all candidate indexes
- runs query using them to score
Plan cache entries are evicted when:
- using that index becomes less efficient
- a new relevant index is added
- the server is restarted

Explain Verbosity

queryPlanner shows the winning query plan but does not execute query
executionStats executes the query and gathers stats
allPlansExecutions runs all candidate plans and gathers stats
default is "queryPlanner"

Ex.

key metrics are in green
- nReturned is number of documents returns
- totalKeysExamined is number of index keys examined
- totalDocsExamined is number of documents examined
explain plans are complicated with nested stages of processing

Applying Explain()

A flag is sent to the server with the operation to say it's an explain command
If the fucntion does not return a cursor, the explain flag needs to be set in the colleciton object instead before calling the function
We can set this one on the cursor as we do for sort() or limit()

Explainable Operations:

Creating an Index

db.listings.createIndex({num ber_of_review:1})
- name defaults to the field, but you can rename it
In production, you need to be careful running this as it will cause blocking
- Their are workarounds, but the training does not talk about this

Index Types

Single-Field Indexes

Specified as field name and direction
The direction is irrelevant for a single field index
The field itself can be any data type including an object
- Indexes the whole object essentially as a comparable blob
- Can be used to run range searches or exact matches
Indexing an array is also possible, but it will be covered later

Unique Indexes

Indexes can enforce a unique constraint with the following flag:
- {unique: true}
NULL is a value and so only one record can have a NULL in unique field

Partial Indexes

Partial indexes index a subset of documents based on values
Can greatly reduce index size
- {partialFilterExpression: {archived: false } }

Sparse Indexes

Sparse indexes don't index missing fields
- Ex. db.scores.createIndex({score:1},{sparse:true})
These indexes are superseded by partial indexes

Hashed Indexes

Hashed indexes index a 20 byte md5 of the BSON value
Support exact match only!
Cannot be used for unique constraints
Can potentially reduce index size if original values are large
Downside: random values in a BTree use excessive resources
- Ex. ({name: "hashed"})

Multikey Indexes

A multikey index is an index that is on an array
Can index primitives, documents, or sub-arrays
Are created using createIndexes() just like single-field indexes
- If any field in the index is ever found to be an array than the index is multikey
An index entry is created on each unique value found in the array

Compound Indexes

Based on more than one field
- Most common type of index
- Same concept as RDBMS indexes
- Up to 32 fields
- Created like a single field index
The field order and direction is very important
Can be used as long as the first field in index is in the query
- Other fields in the index do not need to be in the query

Field Order Matters

Equality First
- What fields, for a typical query, are filtered the most
- Selectivity is NOT cardinaility, selective can be a boolean choice
- Ex. Normally male/female is not selective
- Dispatched vs delivered is selective though
Then sort and range
- Sorts are much more expensive without indexes
- Directions matter when doing range queries

ex:

equality, sort, range (ESR) - methodology used 85% of the time
- 15% of the time, you may want to use ERS instead

Ex.

In above example, the index should swap and put the equality operator first like so:
Another example:

If you are creating a multi-key index, only one of those values can be an array

An error will be thrown if you try to index on multiple arrays within the same document
The error thrown can happen on either an INSERT or the creation of the index itself

Time to Live (TTL) Indexes

Not a special index, but rather a flag
MongoDB automatically deletes documents using the index based on time
Background thread in server runs regularly to delete expired documents

Alternative to the above example, you can use a field and set expireAfterSeconds: 0
Sometimes it is better to write your own programmatic data cleaner and schedule via CRON

Listing Indexes

getIndexes() on a collection gives us the info of existing indexes

Performance

Indexes improve read performance and potentially write performance
Each index adds ~10% overhead for writes (Hashed indexes, multikey indexes, text indexes, and wildcard indexes can add more)
An index is modified any time a document:
- Is inserted (applies to most indexes)
- Is delete (applies to most indexes)
- Is updated in such a way that its indexed fields change

Index Limitations

Up to 64 indexes per collection, but you should avoid getting even remotely close to that
- Write performance degrades to unusable between 20 and 30
4 indexes per collection is a good recommended number, but the instructor recommended less than 10 in the real world

Indexing Arrays

1st example below uses index 1
2nd example below uses index 1
3rd example below uses index 2

elements in a nested array cannot be indexed, but you can change the structure like this and index it:

Finding Slow Operations

Missing index is usually the most common cause
Bad client code is a close second cause
Two ways to locate:
- database log
  - Gets last 1024 log entries
  - Returns a single document
- profiling a database
You can filter on "s" attribute for severity
You can also download all of the logs directly from the mongo atlas
db.setProfilingLevels(level,slowms)
- 0 - profiler is off
- 1 - store only slow queries
- 2 - store all queries
- slowms: threshold in milliseconds for slow operations

Aggregations

Aggregation allows us to compute new data
- Calculated fields
- summarized and grouped values
- reshaped documents
Aggregation is a pipeline
- Each transformation is a single step known as a stage
- This is easier to:
  - debug
  - understand
  - rewrite and optimize
- Think of a powershell pipe!

Stages:

Ex.

Advised to put the $project at the end
Everything done in aggregate is done in memory
Unlike a find command, Aggregate is more controllable and reads from top to bottom

Dollar Overloading

In above image, the 3rd line reads like this:
- Set a new field in the document called area with a value that is 5x10
1st line
- Match stage means you are filtering
2nd line
- You are altering or adding a new field
4th line
- Set a new field and use a $map function against prices, assigning p as your iterator variable and walk over the array multiplying each value $$p by 1.08

Expressions

In the above, the most nested value gets executed first and it goes up the chain
Other arithmetic expressions:
- $subtract, $add, $multiply, $divide
String expressions:
- $concat, $ltrim, $indexOf, etc
Expressions can affect index usage just like SQL Server

General recommendation when writing aggregations is to break up the commands into stages/variables:

PreviousDF100 NextDF300

Last updated 1 year ago