|
| 1 | +--- |
| 2 | +title: Troubleshoot Geo-Replication and Redo Lag |
| 3 | +titleSuffix: Azure SQL Database |
| 4 | +description: Learn how to understand and troubleshoot geo-replication and redo lag in Azure SQL Database. |
| 5 | +author: WilliamDAssafMSFT |
| 6 | +ms.author: wiassaf |
| 7 | +ms.reviewer: mahyon, randolphwest |
| 8 | +ms.date: 03/02/2026 |
| 9 | +ms.service: azure-sql-database |
| 10 | +ms.subservice: high-availability |
| 11 | +ms.topic: troubleshooting |
| 12 | +ms.custom: |
| 13 | + - azure-sql-split |
| 14 | +monikerRange: "=azuresql || =azuresql-db" |
| 15 | +--- |
| 16 | + |
| 17 | +# Troubleshoot geo-replication and redo lag |
| 18 | + |
| 19 | +[!INCLUDE [appliesto-sqldb](../includes/appliesto-sqldb.md)] |
| 20 | + |
| 21 | +In active geo-replication, the geo-secondary replica continuously receives and applies transaction log records from the primary. When the secondary replica can't apply logs as fast as the primary generates them, a backlog builds (redo queue) and the time gap increases (redo lag). This situation can affect read-only freshness on the secondary and increase failover time. |
| 22 | + |
| 23 | +- **Redo queue**: The volume of transaction log records that geo-replication ships to the secondary but doesn't apply yet. |
| 24 | +- **Redo lag**: The elapsed time between transaction commit on the primary and completion of replay on the secondary. |
| 25 | + |
| 26 | +Geo-replication is asynchronous. Redo lag on the secondary replica does not cause waits on the primary, but redo lag can cause data on the secondary to be behind. |
| 27 | + |
| 28 | +## Symptoms |
| 29 | + |
| 30 | +- Stale data on the secondary for read-only workloads (reporting, analytics, or offloaded reads). |
| 31 | +- Longer failover time, which increases Recovery Time Objective (RTO). |
| 32 | +- Sustained resource pressure on the secondary, reducing its ability to catch up. |
| 33 | +- Confirm redo lag in the DMV [sys.dm_database_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database?view=azuresqldb-current&preserve-view=true), if `redo_queue_size > 0` and growing and `secondary_lag_seconds` is increasing. |
| 34 | + |
| 35 | +## Why redo backlog grows |
| 36 | + |
| 37 | +Although the secondary database is read-only, it still maintains a transaction log for internal operations, including replaying log records from the primary. When the redo queue grows, the secondary must retain more transaction log data. |
| 38 | + |
| 39 | +This situation can lead to: |
| 40 | + |
| 41 | +- Transaction log growth on the secondary. |
| 42 | +- Higher storage consumption, which can affect cost and performance. |
| 43 | +- Potential throttling scenarios when thresholds are exceeded. |
| 44 | + |
| 45 | +## Impact of replica size mismatch |
| 46 | + |
| 47 | +You should configure the primary and geo-secondary replica with the same service level objective (SLO), backup storage redundancy, [compute tier](service-tiers-sql-database-vcore.md#compute) (provisioned or serverless), and compute size (DTUs or vCores). |
| 48 | + |
| 49 | +If you configure a secondary database with a lower compute size than the primary database, you might experience: |
| 50 | + |
| 51 | +- Resource contention on the secondary (CPU, I/O), which slows down redo operations. |
| 52 | +- Inability to keep up with the transaction log generation rate of the primary. |
| 53 | +- Increased redo queue size, which worsens lag and reduces replication effectiveness. |
| 54 | + |
| 55 | +## Recommendations |
| 56 | + |
| 57 | +To reduce redo lag and maintain replication health and efficient log usage on the secondary: |
| 58 | + |
| 59 | +- Align SLO and compute sizes. Ensure the secondary database has the same performance tier as the primary. |
| 60 | + - Configure geo-secondary: [Active geo-replication](active-geo-replication-overview.md#configure-geo-secondary) |
| 61 | + - Scale a single database: [Scale single database resources in Azure SQL Database](single-database-scale.md) |
| 62 | + - Scale an elastic pool: [Scale elastic pool resources in Azure SQL Database](elastic-pool-scale.md) |
| 63 | + - Cost considerations: [Plan and manage costs for Azure SQL Database](cost-management.md) |
| 64 | + |
| 65 | +- Monitor regularly. Use dynamic management views (DMVs) such as [sys.dm_database_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database?view=azuresqldb-current&preserve-view=true) to track redo lag and queue size. Redo lag is confirmed when `redo_queue_size > 0` and growing, and `secondary_lag_seconds` is increasing. |
| 66 | + |
| 67 | +- Optimize workloads: |
| 68 | + |
| 69 | + - Reduce long-running transactions on the secondary and high log generation spikes on the primary. |
| 70 | + - Avoid large index rebuilds during peak times. Rebuilds can acquire schema modification (SCH-M) locks, which might block the redo thread on the secondary and contribute to redo queue build-up. |
| 71 | + |
| 72 | +## Related content |
| 73 | + |
| 74 | +- [Active geo-replication](active-geo-replication-overview.md) |
| 75 | +- [Configure active geo-replication and failover](active-geo-replication-configure-portal.md) |
| 76 | +- [Monitor geo-replication lag](active-geo-replication-overview.md#monitor-geo-replication-lag) |
0 commit comments