DF400

BSON

Binary Serialized Object Notation
- Cross language object -> binary -> object
- Used internally and for network traffic
- Converted to hash/dict objects in most drivers
- Basically, a list of named, typed values:
  - Type, FieldName, <length>, Data
  - Type, FieldName, <length>, Data
    <length> will tell a program how many bytes to skip to get to the next field, which improves performance

Arrays vs Documents

The diffrence is in how they map to a programming lanugage and what constraints
- Can only $push to an array, not to a document
- Arrays can contain duplicates
- Arrays cannot be sparse, must store nulls for empty values
Large arrays or large flat documents are equally bad
- Max 200 elements strongly recommended in either
  - Recall previous discussions about 200 element arrays
- Large or unbounded arrays can have a significant speed impact
Arrays are often used where document would be a better choice!

Ex.

In the above, we redesigned the schema to be 30% smaller
However, the way you query the 2nd result is significantly different
- Furthermore, indexing the 2nd example is very difficult
  - In the first example, you could index Results.Player

When to Use Arrays

Use arrays when:
- Just need to $push to the end
- Don't have unique identifiers
- Having sparse data is not an issue
- Need to index the values to search them
Otherwise, consider objects
- Projection is faster than $filter
- Querying a single member is faster
One big advantage of arrays over joined tables i not needing an indexed (foreign) key for each element

Document model vs Relational

Histogram Exercise

24 hrs x 60 minutes gets you 1440
The question is, do you have a single array with 1440 elements or do a 2 dimensional array:

In regards to updating, one of the above will be significantly faster...
- In the 1440 element array, on average, you will have to scan 720 places before finding the value you need
- By switching to a 2 dimensional array, that number is 360

Schema Design in MongoDB

3 Considerations:

The data the application needs
Application's read usage of the data
Application's write usage of the data

Document Design Basics

To link or embed?
- Do I want the embedded info a lot of the time?
  - Put it in the same collection!
- Do I need to search with the embedded info?
- Does the embedded info change often?
- Do I need the latest version or the same versions?
- Is the embedded info shared - really?

Linking

One-to-One Linked

It is very important to index the proper fields to make these lookups faster

One-to-One Embedded

The problem with embedding this info, is that the author could have changed their name
- If you embed, it becomes difficult to keep up with updating the data

Hybrid

On a webpage, you can render all of the books with first name, last name.. if the user clicks on the auth, you can then use the _id to query the authors collection
Yes, you are duplicating data, but that is why you only include very basic information

One-to-Many

Scalar in child

Array in Parent

Could be vulnerable in a scale up situation

Many-to-Many

Arrays in either side..

Payload vs Process Fields

Two Types of Fields

Payload - Data we just store and retrieve
Processing - metadata we examine inside MongoDB
- Querying, aggregating, filtering and projecting, using for worklow, using to implement patterns
Payload might be predefined, sometimes better to ignore that and add processing fields
For example, existing corporate data/object models, might be used only for storage
JSON files you want to store is Payload, fields you want to filter by are processing fields
Generally speaking, payload fields

Ex.

Evolutionary Dynamic Schema

Schema changes as application changes - big benefit top using MongoDB
Slow changes to the application, but no big conversion needed
- Include a schema version number with the data as a field
- Retain the ability to read previous schemas in application
- Either convert in the background or on modification
- Lossely coupled database objects and code objects

Data Driven Dynamic Schema

Field names are the data values
- Uncomfortable new concept for many designers
- Requires a truly dynamic coding approach
- Clean and performant for many things
- Can be used to do some amazing optimization of retrieval using Tries

Design Patterns

Attribute Pattern

Used where documents have a number of searchable named attributes
- Not all values may be present in a given record
- Any new record might have a new gield name not seen before
Ex. ecommerce, CRM, content management

The reason you use k/v, you can more easily index those fields for how they will be searched
Also, it would cover NEW attributes being added
If you are looking for a specific sku, then the attributes is a payload, but otherwise it could be used for processing so keep that in mind

Bucket Pattern

Most common design pattern - stringly matches document model
Used to store many small, related data items
- Bank transactions - related by account and date
- IoT Readings - related by sensor and date
Reduces index sizes up to 200x
Speeds up retrieval of related data
Enables computed pattern
MOngoDB 5.0 timeseries collections abstract this concept

Ex.

In the above, you fill up the ddocument with 200 readings, you then fail to find a document because in your query you have less than 200 defined, because upsert it set to true, a new document will be created
- If you then turn around and query on sensor id, you can easily seek to those documents

Computed Pattern

In the above, you have a top 20 constantly updated list
You are precomputing vaslues to avoid computation during a subsequent reads

Subset Pattern

Outlier or Overspill Pattern

Creates a separate collection if too many array items
- In this pattern, this is not the typical case
- There is a main record and a flag to say if there is an overspill situation
- Additional collections only store the extra array items
For example, in a movie database:
- Main cast and crew in the parent document
- Full cast and crew in a second additional document (can be the same collection)

Anti-Patterns

Replication

Multiple copies of each collection
On different physical servers
As far away from each other as possible
Nature is the best example of replication for resilience

Different Access Patterns

If 90% of the users look at the latest 1% of data - small enough to remain in cache =fast
If the rest (10%) look at all data (analysts)
- Don't need such a fast response
- Shouldn't be bad neighbors
- Protect the cache and resources from these heavy users

Replica Set component

1 primary at most
- Responsible for accepting all writes
- If primary goes down, a voting stage occurs
Up to 50 secondary members
- Up to 7 can be voting members
- Apps can read from a secondary
- The largest mongo has seen is 9 in a replica set and that was a massive bank
3 members is the most common replica set
- 5 is for more critical processes

Secondary Server - more info

Priority 0 replica set member
Hidden replica set member
Delayed replica set member
- Generally speaking, you should completely avoid hidden and delayed replica set members!

Drivers and Replica Sets

When coding with MongoDB, Drivers keep track of the topology
- Drivers know where and how to route requests
- You don't need to know what is a primary or secondary at any point
- You can specify where you want a read to go logically
- When upgrading the server, it is important to check driver compatibility

Oplog

each write is a new entry in the oplog:

Oplog is always idempotent

Oplog Window

Oplogs are capped collections (fixed size in bytes)
990mb to 1 petabyte configurable
- The bigger you make it, the less space you have for your data
Guarantee the preservation of insertion order
Support high-throughput operation
Once a collection fills its allocated space:
- Makes room for new documents by deleting the oldest documents in the collection
- Like circular buffers
If a secondary goes down for longer than the Oplog Window, the secondary will be out of sync once it comes back up
- The DBA will need to manually trigger initial sync

Initial Sync

New/replacement replica set members need a full copy of the data
As do any that have been down too long where the oplog has rolled over
By default, initial sync pulls data from the primary, which is bad!
- Their is a way to sync from the secondary
Can take a long time and be relatively fragile on large systems
Recent changes (4.2/4.4):
- Can auto restart on non-transient error
- Resumable on transient error
- Copy Oplog at the same time
- Overall more resilient
- Can use a secondary as source

Majority / Elections

In distributed systems, the idea of a majority or quorum is important:
- Group that together have MORE than half of the total members
How to choose a primary (leader)
- Agreeing on a primary is surprisingly hard
- There are many academi papers on how to do this
- MongoDB uses the RAFT protocol
- Only one primary is chosen at a time (the chosen one)
  - Must be able to communicate with a majority of nodes
  - Should be the most up to date with the latest information
  - May prefer some over the others due to geography

Elections Explained

A secondary determines:
- Has not heard from the primary recently - primary might be up, but not getting through
- Can contact a majority of secondaries, including itself
- Can vote and is not specifically ineligible
Secondary then proposes an election - with itself as Primary
- States its latest transaction time
- States the election term - that goes up by one every election
- Votes for itself by default
Any server that can vote, only votes for the candidate in an election if:
- The candidate is at the same or a higher transaction time
- Has not already voted in that election term
If it's currently the primary, stands down
Once a candidate received a majority of votes:
- Converts itself from a secondary to a primary
- Goes through an onboarding
- Starts to accept writes
If it keeps tying, you basically just keep electing (flipping quarters) until a primary is chosen

Ex.

if a connection blip occurs with 4 nodes, the primary (top left) will turn itself from a primary to a secondary
- Apps will still read, but fail to write

Ex 2:

If the entire data center 1 goes down, you can still read from data center 2, but you can't write

Ex 3:

This protects you against hurricane/tornado/etc, but it causes the app to fail to a new data center
Depending on where the application sits, reads could actually become faster

Ex 4:

Ideal setup assuming cost is not a concern

Stepping Down

Rolling Upgrade for Maintenance

Go into a secondary host, upgrade the version, restart the secondary
- Apps will still write to the primary and the app will be fine
You will then go to your other secondary and repeat the process
Go to primary and do an rs.stepdown
Ops and cloud manager automate this process
- This is a MUST if you have 3+ replicas with sharding all in the correct sequence
Versions are backwards/forwards compatible +- 1 version
- You can always rolling upgrade up multiple versions, but rolling back down may be more difficult

Write Concerns

Devs are responsible for this!
On the DB side, you can only set the default
In the below example, if their is a failure in-between steps 3 and 4.. you may lose data!
When the old primary comes back up, it has writes that don't exist on the new primary
- It will silently roll back the original committed record

Majority Commit Point

Primary knows what timestamp each secondary is asking for, there it knows:
- They have everything before that durable
- Up to what timestamp exists on a majority of nodes
- Up to what point data is 100% safe
Until a change is 100% safe, the primary will keep it in memory
In a system with automated failover, "Safe" means "if it fails over, this won't be silently lost"

Read Concerns

ONLY devs can control read concerns
In Read Majority, that point always lags behind "real-time"
- Read Linearizable will add latency to your query until the Majority points catches up to "real-time" at the point where the query began

Read Preferences

When do you think you would use each of the following?

Read from primary only (default)
Read from primary unless no primary exists (primary preferred)
Read from any secondary only
- Otherwise read from primary if no secondary exists
Read from secondary unless no secondary exists
Read from nearest geographically
Read from a specific set of servers
- Data analysts and BI users should be using

Best Practice

Generally speaking, always use write concern majority
- Read preference, use secondary preferred
  - Read concern, it depends

Arbiter

Acts as a tie-breaker in elections
Stores no data
Cannot become a primary
Is strongly advised against in production systems
A system with arbiters can be highly available OR guarantee durability - but not both

When to Shard?

Sharding is a solution to Big Data problems
Sharding is needed when resource limitations require horizontal scale
How to tell if you need to Shard
- Resources are maxed out but schema and code are already optimized
- Need to increase overall throughpout or increase in-memory cache/storage with affordable costs
- Need to meet a backup restore time (RTO) target and need parallelism to do so

Architecture

PreviousDF300 NextMongoDB DBA University

Last updated 1 year ago