This page provides detailed documentation for each core component of the Azure Data Hub & Microsoft Fabric Terraform Accelerator.
Module: modules/storage_medallion
Implements a three-tier data lake using Azure Data Lake Storage Gen2 following the Medallion Architecture pattern:
- Bronze Container: Raw data ingestion from source systems
- Silver Container: Cleaned, validated, and conformed data
- Gold Container: Business-ready, aggregated datasets
- ✅ Hierarchical namespace enabled for big data analytics
- ✅ Soft delete and versioning for data protection
- ✅ Integration with Microsoft Fabric OneLake
- ✅ HIPAA-compliant lockdown (public access disabled when enabled)
- ✅ Lifecycle management policies
- ✅ Managed identity authentication
module "storage_medallion" {
source = "../modules/storage_medallion"
resource_group_name = module.resource_group.name
location = var.location
prefix = var.prefix
environment = var.environment
tags = var.tags
}- Storage Account:
{prefix}{environment}sa(e.g.,acmedevsa) - Containers:
bronze,silver,gold
Reference: Azure Data Lake Storage Gen2
Modules: modules/fabric_capacity, modules/fabric_workspace, modules/fabric_lakehouse
Microsoft Fabric provides a unified analytics platform integrating:
- Capacity: Compute resources (F-SKUs)
- Workspace: Collaboration and organization
- Lakehouse: Delta Lake-based storage with SQL query capabilities
| SKU | Capacity Units (CU) | Recommended Use Case |
|---|---|---|
| F2 | 2 CU | Development/Testing |
| F4 | 4 CU | Small workloads |
| F8 | 8 CU | Small production |
| F64 | 64 CU | Medium production |
| F128 | 128 CU | Large production |
| F256+ | 256+ CU | Enterprise scale |
fabric_sku_name = "F64" # Choose based on workload requirements
fabric_admin_object_ids = [
"00000000-0000-0000-0000-000000000001" # Service principal object ID
]
fabric_admin_upns = [
"admin@yourcompany.com"
]- Does NOT support Azure AD group object IDs
- Only supports user or service principal object IDs
- Minimum one admin required for capacity creation
Reference: Microsoft Fabric Documentation
Module: modules/data_factory
Orchestrates data pipelines for ETL/ELT workloads, providing:
- Integration runtime for data movement
- Pipeline authoring and scheduling
- Git integration for version control
- Managed identity for authentication
- ✅ Integration with medallion storage layers
- ✅ Managed identity authentication (no keys!)
- ✅ Diagnostic logging to Log Analytics
- ✅ Git repository integration
- ✅ Trigger-based and scheduled pipelines
- Ingest data from on-premises/cloud sources
- Transform Bronze → Silver → Gold
- Schedule batch processing jobs
- Trigger Microsoft Fabric notebook executions
- Data quality validation
module "data_factory" {
source = "../modules/data_factory"
count = var.enable_data_factory ? 1 : 0
resource_group_name = module.resource_group.name
location = var.location
prefix = var.prefix
environment = var.environment
tags = var.tags
}Reference: Azure Data Factory
Module: modules/key_vault
Centralized secrets management service for:
- Connection strings and passwords
- API keys and tokens
- Certificates and keys
- Encryption keys for services
- ✅ Soft delete enabled (90-day recovery)
- ✅ Purge protection
- ✅ RBAC-based access control
- ✅ Managed identity integration
- ✅ Audit logging to Log Analytics
- ✅ Network isolation support
✅ DO:
- Use managed identities to access secrets
- Reference secrets in Terraform via data sources
- Enable soft delete and purge protection
- Rotate secrets regularly
- Use RBAC, not access policies
❌ DON'T:
- Hardcode secrets in Terraform
- Use access keys when managed identity is available
- Share secrets across environments
- Disable auditing
# Reference existing secret
data "azurerm_key_vault_secret" "db_password" {
name = "database-password"
key_vault_id = module.key_vault.id
}
# Use in resource
resource "azurerm_mssql_server" "example" {
administrator_login_password = data.azurerm_key_vault_secret.db_password.value
}Reference: Azure Key Vault
Module: modules/purview
Data governance and catalog service providing:
- Automated data discovery and classification
- Data lineage tracking
- Compliance reporting
- Integration with Azure data services
✅ Enable Purview if you need:
- Data catalog for search and discovery
- Automated data classification (PII, PHI, etc.)
- Lineage tracking across data sources
- Compliance reporting (GDPR, HIPAA)
❌ Skip Purview if:
- Budget-constrained (Purview has additional costs)
- Simple data estate (< 5 data sources)
- Time-sensitive deployment (Purview takes 10-20 min to provision)
enable_purview = true # Set to false to skip
# Purview will automatically discover:
# - Storage accounts
# - Data Factory pipelines
# - SQL databases (if configured)Reference: Microsoft Purview
Module: modules/networking
Enterprise-grade networking infrastructure:
- Virtual Network (VNet) with subnets
- Network Security Groups (NSGs)
- Service endpoints for Azure services
- Optional private endpoints
┌─────────────────────────────────────────┐
│ Virtual Network (VNet) │
│ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Data Subnet │ │ Compute Subnet │ │
│ │ (Storage, │ │ (Data Factory, │ │
│ │ Key Vault) │ │ VMs) │ │
│ └─────────────┘ └─────────────────┘ │
│ ▲ ▲ │
│ │ Service │ │
│ │ Endpoints │ │
└─────────┴──────────────────┴────────────┘
│ │
▼ ▼
Storage Account Data Factory
- ✅ Network isolation for data services
- ✅ Deny public access by default (when HIPAA enabled)
- ✅ NSG flow logs to Log Analytics
- ✅ Service endpoints for Azure services
- ✅ DDoS protection (optional, additional cost)
enable_networking = true
# Networking is automatically configured when enabled
# Subnets and NSGs are created based on enabled modulesReference: Azure Virtual Network
Module: modules/log_analytics
Centralized logging and monitoring workspace for:
- Diagnostic logs from all Azure resources
- Azure Policy compliance tracking
- NSG flow logs
- Custom queries and alerts
- ✅ Retention period configurable (30-730 days)
- ✅ Integration with Azure Monitor
- ✅ KQL queries for log analysis
- ✅ Alerting and automation
- ✅ Compliance reporting
View all resource logs:
AzureDiagnostics
| where TimeGenerated > ago(1h)
| summarize count() by ResourceTypePolicy compliance violations:
AzurePolicyEvaluationDetails
| where ComplianceState == "NonCompliant"
| project ResourceId, PolicyDefinitionName, ReasonReference: Azure Log Analytics
Understanding component dependencies helps with troubleshooting and planning:
Resource Group (Required)
│
├─> Networking (Optional)
│ └─> Required by: Storage, Key Vault, Data Factory
│
├─> Log Analytics (Optional)
│ └─> Used by: All components for diagnostics
│
├─> Storage (Medallion)
│ └─> Dependency: Resource Group
│ └─> Used by: Fabric, Data Factory
│
├─> Key Vault
│ └─> Dependency: Resource Group, Networking (optional)
│
├─> Data Factory
│ └─> Dependencies: Resource Group, Storage, Networking (optional)
│
├─> Purview (Optional)
│ └─> Dependencies: Resource Group, Storage
│
└─> Compliance Modules (HIPAA/GDPR)
└─> Dependencies: Resource Group
└─> Affects: All resources in scope