🥒
Dill's Knowledge Base
  • Hello World
  • 💻SQL
    • ❌Error Handling
    • 🧀Parameter Sniffing
      • Indexes
      • Query Hints
      • RECOMPILE
      • Branching
      • Memory Grants
      • Summary
      • Bonus
    • SQL Server Buffer Pool
  • 🖱️MongoDB
    • Instructor Led Training
      • DF100
      • DF200
      • DF300
      • DF400
    • MongoDB DBA University
      • DBA Admin Tools
      • DBA Basics
      • Metrics & Monitoring
  • 💻Web Design
    • Oxygen Tips
    • Bricks Builder
      • Tips
      • Discovery Call
      • Utility vs Custom Classes
      • Math Functions
      • Static vs Relative Units
  • Azure
    • AZ-900
      • Benefit of Cloud Computing
      • CapEx, OpEx and Consumption-based
      • Differences Between Cloud Service Categories
      • Identify The Right Service Type
      • Differences Between Types of Cloud Computing
      • Reliability and Predictability
      • Regions and Region Pairs
      • Availability Zones
      • Resource Groups
      • Subscriptions
      • Management Groups
      • Azure Resource Manager
      • Azure ARC
      • Resources Required for VM
      • Benefits and Usage of Core Compute Resources
      • Benefits and Usage of Core Network Resources
      • Public/Private Endpoints
      • Benefits and Usage of Storage Accounts
      • Benefits and Usage of Database Resources
      • Data Movement and Migration Options
      • Benefits and Usage of IoT Services
      • Benefits and Usage of Big Data and Analytics Services
      • Benefits and Usage of AI Services
      • Benefits and Usage of Serverless Technologies
      • Benefits and Usage of DevOps Technologies
      • Functionality of Azure Management Solutions
      • Functionality and Usage of Azure Advisor
      • Functionality and Usage of ARM Templates
      • Functionality and Usage of Azure Monitor
      • Functionality and Usage of Azure Service Health
      • Functionality of Microsoft Defender for Cloud
      • Functionality and Usage of Key Vault
      • Functionality and Usage of Microsoft Sentinel
      • Azure Dedicated Host
      • Defense in Depth
      • Describe the Concept of Zero Trust
      • Functionality and Usage of NSGs
      • Functionality and Usage of Azure Firewall
      • Functionality and Usage of Azure DDoS Protection
      • Explain Authentication and Authorization
      • Functionality and Usage of Azure AD
      • Microsoft Entra Overview
      • Functionality of Conditional Access, MFA and SSO
      • Functionality and Usage of RBAC
      • Functionality and Usage of Resource Locks
      • Functionality and Usage of Tags
      • Functionality and Usage of Azure Policy
      • Governance Hierarchy Constructs
      • Azure Blueprints
      • Describe Microsoft Privacy Statement, OST and DPA
      • Purpose of Trust Center and Azure Compliance Documentation
      • Purpose of Azure Sovereign Regions
      • Factors That Affect Costs
      • Factors to Reduce Cost
      • Functionality and Usage of Azure Cost Management
      • Purpose of Service Level Agreements
    • DP-900
      • Study Cram
    • DP-300
      • Deploy IaaS Soluton with Azure SQL
  • 📦Kubernetes
    • Udemy: Kubernetes for Beginners
Powered by GitBook
On this page
  • Core metrics
  • Additional Metrics:
  • Atlas CLI
  • Configure Alerts
  • Responding to Alerts
  • Integrations
  • Self-Managed Monitoring
  • Command Line Metrics
  1. MongoDB
  2. MongoDB DBA University

Metrics & Monitoring

Core metrics

  • Query targeting

    • Ideal ratio is 1 where a document is returned for every one read

    • very high ratio negatively impacts performance

  • Storage

    • writes are refused at capacity and can cause crashing

    • key metrics include disk space percent free, disk latency, disk IOPs, disk queue depth

  • CPU

    • May need to optimize with indexes or upgrade hardware

  • Memory

    • System should be sized to hold all indexes

    • Swap usage, and memory usage

  • Replication lag

    • delay between primary and secondary in seconds

Additional Metrics:

  • opcounters

    • number of operations per seconds run on mongodb process since startup

      • It tracks: command, query, insert, delete, update, getMore

  • network traffic

    • Average rate of physical bytes

      • bytes in / bytes out (physical)

    • Number of requests sent to DB

      • numRequests

  • connections

    • Organized by application, shell client, as well as internal processes

    • Can affect system performance

    • Large connection count may be suboptimal connection strategy

  • tickets available

    • when available tickets drops to zero, other operations must wait

    • indicates undersized cluster or poorly performing queries

Atlas CLI

  • atlas metrics processes <host_name>:<port>

    • You can also add params like period, granularity, output, type, etc

  • Atlas also has real-time monitoring charts

  • You can kill long running operations in this dashboard

atlas processes list

atlas metrics processes atlas-jj12z4-shard-00-00.p31f3ej.mongodb.net:27017 --period P1D --granularity PT1M

  • P1D stands for 1 day

  • PT1M is 1 minute intervals

or..

atlas metrics processes atlas-jj12z4-shard-00-00.p31f3ej.mongodb.net:27017 --period P1D --granularity PT1M --output json --type CONNECTIONS | jq '.measurements[0].dataPoints |= .[-10:]'

Configure Alerts

  • You can configure alert settings at organization and project levels

  • Must have "project owner" role

  • Shared tiered clusters will only triggers alerts for:

    • connections, logical size, opcounters, network

  • All Atlas projects come with defaults, but they can be edited

    • Atlas alerts are a little bell icon in the top right

CLI

atlas alerts settings list --output json

Responding to Alerts

Alerts are shown here:

  • Notifications will continue until an alert is acknowledged

    • No further notification are sent until the acknowledgement period ends, you resolve the condition, or you unacknowledge the alert

CLI

atlas alerts list --output json
atlas alerts acknowledge <alertId> --comment <comment>
atlas alerts unacknowledge <alertId>

Integrations

Good for hybrid situations or for when you are migrating on-prem to cloud

Examples:

  • Prometheus, pagerduty, datadog, sumo, splunk, custom web hooks, etc

    • Prometheus and DD are only on M10+ clusters

In database dashboard, click elipses and go to Integrations

  • Typically you fill in your credentials here for your connection

Self-Managed Monitoring

Cloud Manager or a hybrid solution listed above can be used

  • note: prometheus for example can not directly collect metrics from an onprem solution, but their are open source connectors such as Percona

  • The account collecting data will need clusterMonitor role

Command Line Metrics

serverStatus

  • diagnostic command that returns a document showing current instance state

  • This command is used by monitoring platforms to collect valuable metrics

  • Ex:

    db.runCommand(
       {
         serverStatus: 1
       }
    )
  • Ex helper command: db.serverStatus()

currentOp

  • admin command that returns document about active operations

  • monitoring apps use this command to find slow operations

  • Ex.

    db.adminCommand(
       {
         currentOp: true,
         "$all": true
       }
    )

killOp

  • terminates operations using opId

  • Ex.

    db.adminCommand(
       {
         killOp: 1,
         op: <opid>,
         comment: <any>
       }
    )
  • Helper function: db.killOp()

PreviousDBA BasicsNextOxygen Tips

Last updated 1 year ago

🖱️