🥒
Dill's Knowledge Base
  • Hello World
  • 💻SQL
    • ❌Error Handling
    • 🧀Parameter Sniffing
      • Indexes
      • Query Hints
      • RECOMPILE
      • Branching
      • Memory Grants
      • Summary
      • Bonus
    • SQL Server Buffer Pool
  • 🖱️MongoDB
    • Instructor Led Training
      • DF100
      • DF200
      • DF300
      • DF400
    • MongoDB DBA University
      • DBA Admin Tools
      • DBA Basics
      • Metrics & Monitoring
  • 💻Web Design
    • Oxygen Tips
    • Bricks Builder
      • Tips
      • Discovery Call
      • Utility vs Custom Classes
      • Math Functions
      • Static vs Relative Units
  • Azure
    • AZ-900
      • Benefit of Cloud Computing
      • CapEx, OpEx and Consumption-based
      • Differences Between Cloud Service Categories
      • Identify The Right Service Type
      • Differences Between Types of Cloud Computing
      • Reliability and Predictability
      • Regions and Region Pairs
      • Availability Zones
      • Resource Groups
      • Subscriptions
      • Management Groups
      • Azure Resource Manager
      • Azure ARC
      • Resources Required for VM
      • Benefits and Usage of Core Compute Resources
      • Benefits and Usage of Core Network Resources
      • Public/Private Endpoints
      • Benefits and Usage of Storage Accounts
      • Benefits and Usage of Database Resources
      • Data Movement and Migration Options
      • Benefits and Usage of IoT Services
      • Benefits and Usage of Big Data and Analytics Services
      • Benefits and Usage of AI Services
      • Benefits and Usage of Serverless Technologies
      • Benefits and Usage of DevOps Technologies
      • Functionality of Azure Management Solutions
      • Functionality and Usage of Azure Advisor
      • Functionality and Usage of ARM Templates
      • Functionality and Usage of Azure Monitor
      • Functionality and Usage of Azure Service Health
      • Functionality of Microsoft Defender for Cloud
      • Functionality and Usage of Key Vault
      • Functionality and Usage of Microsoft Sentinel
      • Azure Dedicated Host
      • Defense in Depth
      • Describe the Concept of Zero Trust
      • Functionality and Usage of NSGs
      • Functionality and Usage of Azure Firewall
      • Functionality and Usage of Azure DDoS Protection
      • Explain Authentication and Authorization
      • Functionality and Usage of Azure AD
      • Microsoft Entra Overview
      • Functionality of Conditional Access, MFA and SSO
      • Functionality and Usage of RBAC
      • Functionality and Usage of Resource Locks
      • Functionality and Usage of Tags
      • Functionality and Usage of Azure Policy
      • Governance Hierarchy Constructs
      • Azure Blueprints
      • Describe Microsoft Privacy Statement, OST and DPA
      • Purpose of Trust Center and Azure Compliance Documentation
      • Purpose of Azure Sovereign Regions
      • Factors That Affect Costs
      • Factors to Reduce Cost
      • Functionality and Usage of Azure Cost Management
      • Purpose of Service Level Agreements
    • DP-900
      • Study Cram
    • DP-300
      • Deploy IaaS Soluton with Azure SQL
  • 📦Kubernetes
    • Udemy: Kubernetes for Beginners
Powered by GitBook
On this page
  1. Azure
  2. AZ-900

Benefits and Usage of Big Data and Analytics Services

https://www.youtube.com/watch?v=LSVewE4mKfE&list=PLlVtbbG169nED0_vMEniWBQjSoxTsBYS3&index=24

Back in the day, ETL was the most common approach due to limited disk space

  • In the modern world, we have ADLS Gen 2 that sits on top of blob so it makes sense to load the raw data into ADLS Gen 2

    • If in the future you ever want to come back and add more data to the ETL process, you can easily access everything stored in ADLS Gen 2 in its' raw format

  • In the final transform phase of ELT, you clean and wrangle the data before loading into the destination where it can be analyzed

    • Azure Data Factory is the orchestrator that facilitates this data movement

HDInsight

  • HDInsight is all about open source frameworks that Microsoft has created managed solutions for

    • Hadoop is about dividing tasks into smaller parts

      • Disk based

        • Map reduce breaks things down into key value pairs of data that can be shuffled around

    • Storm is real-time processing for machine learning

    • Spark is mostly batch jobs and data transformation

      • Memory based

    • Kafka is all about big data streaming

    • Hive LLAP is interactive querying like from a data lake

    • HBase is NoSQL storage

Databricks

  • Built off apache spark

  • Microsoft Databricks is a managed solution built in Azure

  • Has a delta lake that sits on top of a data lake

Azure Synapse Analytics

  • Brings everything above under a single umbrella/workspace

PreviousBenefits and Usage of IoT ServicesNextBenefits and Usage of AI Services

Last updated 2 years ago