DF400
BSON
Binary Serialized Object Notation
Cross language object -> binary -> object
Used internally and for network traffic
Converted to hash/dict objects in most drivers
Basically, a list of named, typed values:
Type, FieldName, <length>, Data
Type, FieldName, <length>, Data
<length> will tell a program how many bytes to skip to get to the next field, which improves performance
Arrays vs Documents
The diffrence is in how they map to a programming lanugage and what constraints
Can only $push to an array, not to a document
Arrays can contain duplicates
Arrays cannot be sparse, must store nulls for empty values
Large arrays or large flat documents are equally bad
Max 200 elements strongly recommended in either
Recall previous discussions about 200 element arrays
Large or unbounded arrays can have a significant speed impact
Arrays are often used where document would be a better choice!
Ex.
In the above, we redesigned the schema to be 30% smaller
However, the way you query the 2nd result is significantly different
Furthermore, indexing the 2nd example is very difficult
In the first example, you could index Results.Player
When to Use Arrays
Use arrays when:
Just need to $push to the end
Don't have unique identifiers
Having sparse data is not an issue
Need to index the values to search them
Otherwise, consider objects
Projection is faster than $filter
Querying a single member is faster
One big advantage of arrays over joined tables i not needing an indexed (foreign) key for each element
Document model vs Relational
Histogram Exercise
24 hrs x 60 minutes gets you 1440
The question is, do you have a single array with 1440 elements or do a 2 dimensional array:
In regards to updating, one of the above will be significantly faster...
In the 1440 element array, on average, you will have to scan 720 places before finding the value you need
By switching to a 2 dimensional array, that number is 360
Schema Design in MongoDB
3 Considerations:
The data the application needs
Application's read usage of the data
Application's write usage of the data
Document Design Basics
To link or embed?
Do I want the embedded info a lot of the time?
Put it in the same collection!
Do I need to search with the embedded info?
Does the embedded info change often?
Do I need the latest version or the same versions?
Is the embedded info shared - really?
Linking
One-to-One Linked
It is very important to index the proper fields to make these lookups faster
One-to-One Embedded
The problem with embedding this info, is that the author could have changed their name
If you embed, it becomes difficult to keep up with updating the data
Hybrid
On a webpage, you can render all of the books with first name, last name.. if the user clicks on the auth, you can then use the _id to query the authors collection
Yes, you are duplicating data, but that is why you only include very basic information
One-to-Many
Scalar in child
Array in Parent
Could be vulnerable in a scale up situation
Many-to-Many
Arrays in either side..
Payload vs Process Fields
Two Types of Fields
Payload - Data we just store and retrieve
Processing - metadata we examine inside MongoDB
Querying, aggregating, filtering and projecting, using for worklow, using to implement patterns
Payload might be predefined, sometimes better to ignore that and add processing fields
For example, existing corporate data/object models, might be used only for storage
JSON files you want to store is Payload, fields you want to filter by are processing fields
Generally speaking, payload fields
Ex.
Evolutionary Dynamic Schema
Schema changes as application changes - big benefit top using MongoDB
Slow changes to the application, but no big conversion needed
Include a schema version number with the data as a field
Retain the ability to read previous schemas in application
Either convert in the background or on modification
Lossely coupled database objects and code objects
Data Driven Dynamic Schema
Field names are the data values
Uncomfortable new concept for many designers
Requires a truly dynamic coding approach
Clean and performant for many things
Can be used to do some amazing optimization of retrieval using Tries
Design Patterns
Attribute Pattern
Used where documents have a number of searchable named attributes
Not all values may be present in a given record
Any new record might have a new gield name not seen before
Ex. ecommerce, CRM, content management
The reason you use k/v, you can more easily index those fields for how they will be searched
Also, it would cover NEW attributes being added
If you are looking for a specific sku, then the attributes is a payload, but otherwise it could be used for processing so keep that in mind
Bucket Pattern
Most common design pattern - stringly matches document model
Used to store many small, related data items
Bank transactions - related by account and date
IoT Readings - related by sensor and date
Reduces index sizes up to 200x
Speeds up retrieval of related data
Enables computed pattern
MOngoDB 5.0 timeseries collections abstract this concept
Ex.
In the above, you fill up the ddocument with 200 readings, you then fail to find a document because in your query you have less than 200 defined, because upsert it set to true, a new document will be created
If you then turn around and query on sensor id, you can easily seek to those documents
Computed Pattern
In the above, you have a top 20 constantly updated list
You are precomputing vaslues to avoid computation during a subsequent reads
Subset Pattern
Outlier or Overspill Pattern
Creates a separate collection if too many array items
In this pattern, this is not the typical case
There is a main record and a flag to say if there is an overspill situation
Additional collections only store the extra array items
For example, in a movie database:
Main cast and crew in the parent document
Full cast and crew in a second additional document (can be the same collection)
Anti-Patterns
Replication
Multiple copies of each collection
On different physical servers
As far away from each other as possible
Nature is the best example of replication for resilience
Different Access Patterns
If 90% of the users look at the latest 1% of data - small enough to remain in cache =fast
If the rest (10%) look at all data (analysts)
Don't need such a fast response
Shouldn't be bad neighbors
Protect the cache and resources from these heavy users
Replica Set component
1 primary at most
Responsible for accepting all writes
If primary goes down, a voting stage occurs
Up to 50 secondary members
Up to 7 can be voting members
Apps can read from a secondary
The largest mongo has seen is 9 in a replica set and that was a massive bank
3 members is the most common replica set
5 is for more critical processes
Secondary Server - more info
Priority 0 replica set member
Hidden replica set member
Delayed replica set member
Generally speaking, you should completely avoid hidden and delayed replica set members!
Drivers and Replica Sets
When coding with MongoDB, Drivers keep track of the topology
Drivers know where and how to route requests
You don't need to know what is a primary or secondary at any point
You can specify where you want a read to go logically
When upgrading the server, it is important to check driver compatibility
Oplog
each write is a new entry in the oplog:
Oplog is always idempotent
Oplog Window
Oplogs are capped collections (fixed size in bytes)
990mb to 1 petabyte configurable
The bigger you make it, the less space you have for your data
Guarantee the preservation of insertion order
Support high-throughput operation
Once a collection fills its allocated space:
Makes room for new documents by deleting the oldest documents in the collection
Like circular buffers
If a secondary goes down for longer than the Oplog Window, the secondary will be out of sync once it comes back up
The DBA will need to manually trigger initial sync
Initial Sync
New/replacement replica set members need a full copy of the data
As do any that have been down too long where the oplog has rolled over
By default, initial sync pulls data from the primary, which is bad!
Their is a way to sync from the secondary
Can take a long time and be relatively fragile on large systems
Recent changes (4.2/4.4):
Can auto restart on non-transient error
Resumable on transient error
Copy Oplog at the same time
Overall more resilient
Can use a secondary as source
Majority / Elections
In distributed systems, the idea of a majority or quorum is important:
Group that together have MORE than half of the total members
How to choose a primary (leader)
Agreeing on a primary is surprisingly hard
There are many academi papers on how to do this
MongoDB uses the RAFT protocol
Only one primary is chosen at a time (the chosen one)
Must be able to communicate with a majority of nodes
Should be the most up to date with the latest information
May prefer some over the others due to geography
Elections Explained
A secondary determines:
Has not heard from the primary recently - primary might be up, but not getting through
Can contact a majority of secondaries, including itself
Can vote and is not specifically ineligible
Secondary then proposes an election - with itself as Primary
States its latest transaction time
States the election term - that goes up by one every election
Votes for itself by default
Any server that can vote, only votes for the candidate in an election if:
The candidate is at the same or a higher transaction time
Has not already voted in that election term
If it's currently the primary, stands down
Once a candidate received a majority of votes:
Converts itself from a secondary to a primary
Goes through an onboarding
Starts to accept writes
If it keeps tying, you basically just keep electing (flipping quarters) until a primary is chosen
Ex.
if a connection blip occurs with 4 nodes, the primary (top left) will turn itself from a primary to a secondary
Apps will still read, but fail to write
Ex 2:
If the entire data center 1 goes down, you can still read from data center 2, but you can't write
Ex 3:
This protects you against hurricane/tornado/etc, but it causes the app to fail to a new data center
Depending on where the application sits, reads could actually become faster
Ex 4:
Ideal setup assuming cost is not a concern
Stepping Down
Rolling Upgrade for Maintenance
Go into a secondary host, upgrade the version, restart the secondary
Apps will still write to the primary and the app will be fine
You will then go to your other secondary and repeat the process
Go to primary and do an rs.stepdown
Ops and cloud manager automate this process
This is a MUST if you have 3+ replicas with sharding all in the correct sequence
Versions are backwards/forwards compatible +- 1 version
You can always rolling upgrade up multiple versions, but rolling back down may be more difficult
Write Concerns
Devs are responsible for this!
On the DB side, you can only set the default
In the below example, if their is a failure in-between steps 3 and 4.. you may lose data!
When the old primary comes back up, it has writes that don't exist on the new primary
It will silently roll back the original committed record
Majority Commit Point
Primary knows what timestamp each secondary is asking for, there it knows:
They have everything before that durable
Up to what timestamp exists on a majority of nodes
Up to what point data is 100% safe
Until a change is 100% safe, the primary will keep it in memory
In a system with automated failover, "Safe" means "if it fails over, this won't be silently lost"
Read Concerns
ONLY devs can control read concerns
In Read Majority, that point always lags behind "real-time"
Read Linearizable will add latency to your query until the Majority points catches up to "real-time" at the point where the query began
Read Preferences
When do you think you would use each of the following?
Read from primary only (default)
Read from primary unless no primary exists (primary preferred)
Read from any secondary only
Otherwise read from primary if no secondary exists
Read from secondary unless no secondary exists
Read from nearest geographically
Read from a specific set of servers
Data analysts and BI users should be using
Best Practice
Generally speaking, always use write concern majority
Read preference, use secondary preferred
Read concern, it depends
Arbiter
Acts as a tie-breaker in elections
Stores no data
Cannot become a primary
Is strongly advised against in production systems
A system with arbiters can be highly available OR guarantee durability - but not both
When to Shard?
Sharding is a solution to Big Data problems
Sharding is needed when resource limitations require horizontal scale
How to tell if you need to Shard
Resources are maxed out but schema and code are already optimized
Need to increase overall throughpout or increase in-memory cache/storage with affordable costs
Need to meet a backup restore time (RTO) target and need parallelism to do so
Architecture
Last updated