Skip to content

Commit 016f291

Browse files
authored
Merge pull request #36705 from mhyon/20260224-geo-replication-redo
20260224 geo replication redo
2 parents 4c1ec7d + fcb13f5 commit 016f291

8 files changed

Lines changed: 107 additions & 8 deletions

azure-sql/database/active-geo-replication-overview.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,9 @@ Both the primary and geo-secondary are required to have the same service tier. I
106106

107107
Another consequence of an imbalanced geo-secondary configuration is that after failover, application performance can suffer due to insufficient compute capacity of the new primary. In that case, it's necessary to scale up the database to have sufficient resources, which might take significant time, and requires a [high availability](high-availability-sla-local-zone-redundancy.md) failover at the end of the scale up process, which can interrupt application workloads.
108108

109+
> [!TIP]
110+
> For detailed troubleshooting guidance on lag with geo-replication, see [Troubleshoot geo-replication redo lag](troubleshoot-geo-replication-redo.md).
111+
109112
If you decide to create the geo-secondary with a different configuration, you should monitor log I/O rate on the primary over time. This lets you estimate the minimal compute size of the geo-secondary required to sustain the replication load. For example, if your primary database is P6 (1000 DTU) and its log I/O is sustained at 50%, the geo-secondary needs to be at least P4 (500 DTU). To retrieve historical log I/O data, use the [sys.resource_stats](/sql/relational-databases/system-catalog-views/sys-resource-stats-azure-sql-database) view. To retrieve recent log I/O data with higher granularity that better reflects short-term spikes, use the [sys.dm_db_resource_stats](/sql/relational-databases/system-dynamic-management-views/sys-dm-db-resource-stats-azure-sql-database) view.
110113

111114
> [!TIP]
@@ -243,6 +246,10 @@ Active geo-replication can also be managed programmatically using T-SQL, Azure P
243246

244247
---
245248

249+
## Troubleshooting
250+
251+
For more information on troubleshooting geo-replica lag, see [Troubleshoot geo-replication lag](troubleshoot-geo-replication-redo.md).
252+
246253
## Related content
247254

248255
Configure active geo-replication:

azure-sql/database/failover-group-sql-db.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ A typical Azure application uses multiple Azure services and consists of multipl
160160
If an outage occurs in the primary region, recent transactions might not have been replicated to the geo-secondary and there might be data loss if a forced failover is performed.
161161

162162
> [!IMPORTANT]
163-
> Elastic pools with 800 or fewer DTUs or 8 or fewer vCores, and more than 250 databases can encounter issues including longer planned geo-failovers and degraded performance. These issues are more likely to occur for write intensive workloads when geo-replicas are widely separated by geography, or when multiple secondary geo-replicas are used for each database. A symptom of these issues is an increase in geo-replication lag over time, potentially leading to a more extensive data loss in an outage. This lag can be monitored using [sys.dm_geo_replication_link_status](/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database). If these issues occur, then mitigation includes scaling up the pool to have more DTUs or vCores, or reducing the number of geo-replicated databases in the pool.
163+
> Elastic pools with 800 or fewer DTUs or 8 or fewer vCores, and more than 250 databases can encounter issues including longer planned geo-failovers and degraded performance. These issues are more likely to occur for write intensive workloads when geo-replicas are widely separated by geography, or when multiple secondary geo-replicas are used for each database. A symptom of these issues is an increase in geo-replication lag over time, potentially leading to a more extensive data loss in an outage. This lag can be monitored using [sys.dm_geo_replication_link_status](/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database). If these issues occur, then mitigation includes scaling up the pool to have more DTUs or vCores, or reducing the number of geo-replicated databases in the pool. For detailed troubleshooting guidance on redo lag issues, see [Troubleshoot geo-replication redo lag](troubleshoot-geo-replication-redo.md).
164164
165165

166166
<a id="failback"></a>
@@ -221,3 +221,4 @@ In a scenario where high availability is enabled on the primary database, and th
221221
- To learn about Azure SQL Database automated backups, see [SQL Database automated backups](automated-backups-overview.md).
222222
- To learn about using automated backups for recovery, see [Restore a database from the service-initiated backups](recovery-using-backups.md).
223223
- To learn about authentication requirements for a new primary server and database, see [SQL Database security after disaster recovery](active-geo-replication-security-configure.md).
224+
- For troubleshooting geo-replication issues, see [Troubleshoot geo-replication redo lag](troubleshoot-geo-replication-redo.md).
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Troubleshoot Geo-Replication and Redo Lag
3+
titleSuffix: Azure SQL Database
4+
description: Learn how to understand and troubleshoot geo-replication and redo lag in Azure SQL Database.
5+
author: WilliamDAssafMSFT
6+
ms.author: wiassaf
7+
ms.reviewer: mahyon, randolphwest
8+
ms.date: 03/02/2026
9+
ms.service: azure-sql-database
10+
ms.subservice: high-availability
11+
ms.topic: troubleshooting
12+
ms.custom:
13+
- azure-sql-split
14+
monikerRange: "=azuresql || =azuresql-db"
15+
---
16+
17+
# Troubleshoot geo-replication and redo lag
18+
19+
[!INCLUDE [appliesto-sqldb](../includes/appliesto-sqldb.md)]
20+
21+
In active geo-replication, the geo-secondary replica continuously receives and applies transaction log records from the primary. When the secondary replica can't apply logs as fast as the primary generates them, a backlog builds (redo queue) and the time gap increases (redo lag). This situation can affect read-only freshness on the secondary and increase failover time.
22+
23+
- **Redo queue**: The volume of transaction log records that geo-replication ships to the secondary but doesn't apply yet.
24+
- **Redo lag**: The elapsed time between transaction commit on the primary and completion of replay on the secondary.
25+
26+
Geo-replication is asynchronous. Redo lag on the secondary replica does not cause waits on the primary, but redo lag can cause data on the secondary to be behind.
27+
28+
## Symptoms
29+
30+
- Stale data on the secondary for read-only workloads (reporting, analytics, or offloaded reads).
31+
- Longer failover time, which increases Recovery Time Objective (RTO).
32+
- Sustained resource pressure on the secondary, reducing its ability to catch up.
33+
- Confirm redo lag in the DMV [sys.dm_database_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database?view=azuresqldb-current&preserve-view=true), if `redo_queue_size > 0` and growing and `secondary_lag_seconds` is increasing.
34+
35+
## Why redo backlog grows
36+
37+
Although the secondary database is read-only, it still maintains a transaction log for internal operations, including replaying log records from the primary. When the redo queue grows, the secondary must retain more transaction log data.
38+
39+
This situation can lead to:
40+
41+
- Transaction log growth on the secondary.
42+
- Higher storage consumption, which can affect cost and performance.
43+
- Potential throttling scenarios when thresholds are exceeded.
44+
45+
## Impact of replica size mismatch
46+
47+
You should configure the primary and geo-secondary replica with the same service level objective (SLO), backup storage redundancy, [compute tier](service-tiers-sql-database-vcore.md#compute) (provisioned or serverless), and compute size (DTUs or vCores).
48+
49+
If you configure a secondary database with a lower compute size than the primary database, you might experience:
50+
51+
- Resource contention on the secondary (CPU, I/O), which slows down redo operations.
52+
- Inability to keep up with the transaction log generation rate of the primary.
53+
- Increased redo queue size, which worsens lag and reduces replication effectiveness.
54+
55+
## Recommendations
56+
57+
To reduce redo lag and maintain replication health and efficient log usage on the secondary:
58+
59+
- Align SLO and compute sizes. Ensure the secondary database has the same performance tier as the primary.
60+
- Configure geo-secondary: [Active geo-replication](active-geo-replication-overview.md#configure-geo-secondary)
61+
- Scale a single database: [Scale single database resources in Azure SQL Database](single-database-scale.md)
62+
- Scale an elastic pool: [Scale elastic pool resources in Azure SQL Database](elastic-pool-scale.md)
63+
- Cost considerations: [Plan and manage costs for Azure SQL Database](cost-management.md)
64+
65+
- Monitor regularly. Use dynamic management views (DMVs) such as [sys.dm_database_replica_states](/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database?view=azuresqldb-current&preserve-view=true) to track redo lag and queue size. Redo lag is confirmed when `redo_queue_size > 0` and growing, and `secondary_lag_seconds` is increasing.
66+
67+
- Optimize workloads:
68+
69+
- Reduce long-running transactions on the secondary and high log generation spikes on the primary.
70+
- Avoid large index rebuilds during peak times. Rebuilds can acquire schema modification (SCH-M) locks, which might block the redo thread on the secondary and contribute to redo queue build-up.
71+
72+
## Related content
73+
74+
- [Active geo-replication](active-geo-replication-overview.md)
75+
- [Configure active geo-replication and failover](active-geo-replication-configure-portal.md)
76+
- [Monitor geo-replication lag](active-geo-replication-overview.md#monitor-geo-replication-lag)

azure-sql/database/troubleshoot-memory-errors-issues.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ If out of memory errors persist in Azure SQL Database, file an Azure support req
190190
- [Performance Center for SQL Server Database Engine and Azure SQL Database](/sql/relational-databases/performance/performance-center-for-sql-server-database-engine-and-azure-sql-database)
191191
- [Troubleshooting connectivity issues and other errors with Azure SQL Database and Azure SQL Managed Instance](troubleshoot-common-errors-issues.md)
192192
- [Troubleshoot transient connection errors in SQL Database and SQL Managed Instance](troubleshoot-common-connectivity-issues.md)
193+
- [Troubleshoot transaction log errors](troubleshoot-transaction-log-errors-issues.md)
193194
- [Demonstrating Intelligent Query Processing](https://github.com/Microsoft/sql-server-samples/tree/master/samples/features/intelligent-query-processing)
194195
- [Resource management in Azure SQL Database](resource-limits-logical-server.md#memory)
195196
- [Blog: A new way to troubleshoot out-of-memory errors in the database engine](https://techcommunity.microsoft.com/t5/azure-sql-blog/a-new-way-to-troubleshoot-out-of-memory-errors-in-the-database/ba-p/3271926)

azure-sql/database/troubleshoot-transaction-log-errors-issues.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,6 @@ To resolve this issue, try the following methods:
151151
- [Understand and resolve Azure SQL Database blocking problems](understand-resolve-blocking.md?view=azuresql-db&preserve-view=true#gather-blocking-information)
152152
- [Troubleshooting connectivity issues and other errors with Azure SQL Database and Azure SQL Managed Instance](troubleshoot-common-errors-issues.md?view=azuresql-db&preserve-view=true)
153153
- [Troubleshoot transient connection errors in Azure SQL Database and SQL Managed Instance](troubleshoot-common-connectivity-issues.md?view=azuresql-db&preserve-view=true)
154+
- [Troubleshoot geo-replication redo lag](troubleshoot-geo-replication-redo.md?view=azuresql-db&preserve-view=true)
155+
- [Troubleshoot out of memory errors](troubleshoot-memory-errors-issues.md?view=azuresql-db&preserve-view=true)
154156
- [Video: Data Loading Best Practices on Azure SQL Database](/shows/data-exposed/data-loading-best-practices-on-azure-sql-database?WT.mc_id=dataexposed-c9-niner)

azure-sql/toc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2036,7 +2036,9 @@
20362036
href: database/troubleshoot-common-connectivity-issues.md
20372037
- name: Troubleshoot out of memory errors
20382038
href: database/troubleshoot-memory-errors-issues.md
2039-
- name: Import/Export service hangs
2039+
- name: Troubleshoot geo-replication lag
2040+
href: database/troubleshoot-geo-replication-redo.md
2041+
- name: Troubleshoot Import/Export service
20402042
href: database/database-import-export-hang.md
20412043
- name: Transaction log errors in Azure SQL Database
20422044
href: database/troubleshoot-transaction-log-errors-issues.md

docs/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,12 @@ Returns state information for each database that participates in primary and sec
7676

7777
Requires `VIEW DATABASE STATE` permission on the database.
7878

79+
## Remarks
80+
81+
For more information on troubleshooting geo-replication redo lag in Azure SQL Database, see [Troubleshoot geo-replication redo lag](/azure/azure-sql/database/troubleshoot-geo-replication-redo?view=azuresql-db&preserve-view=true).
82+
7983
## Related content
8084

8185
- [What is an Always On availability group?](../../database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server.md)
8286
- [Monitor Availability Groups (Transact-SQL)](../../database-engine/availability-groups/windows/monitor-availability-groups-transact-sql.md)
87+
- [sys.dm_geo_replication_link_status (Azure SQL Database and Azure SQL Managed Instance)](sys-dm-geo-replication-link-status-azure-sql-database.md)

docs/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Contains a row for each replication link between primary and second
55
author: rwestMSFT
66
ms.author: randolphwest
77
ms.reviewer: wiassaf
8-
ms.date: 06/13/2025
8+
ms.date: 02/26/2026
99
ms.service: azure-sql-database
1010
ms.topic: reference
1111
f1_keywords:
@@ -24,7 +24,7 @@ monikerRange: "=azuresqldb-current || =azuresqldb-mi-current"
2424

2525
[!INCLUDE[Azure SQL Database Azure SQL Managed Instance](../../includes/applies-to-version/asdb-asdbmi.md)]
2626

27-
Contains a row for each replication link between primary and secondary databases in a geo-replication partnership. This includes both primary and secondary databases. If more than one continuous replication link exists for a given primary database, this table contains a row for each of the relationships. The view is created in all databases, including the `master` database. However, querying this view in the `master` database returns an empty set.
27+
Contains a row for each replication link between primary and secondary databases in a geo-replication partnership. This includes both primary and secondary databases. If more than one continuous replication link exists for a given primary database, this table contains a row for each of the relationships.
2828

2929
|Column name|Data type|Description|
3030
|-----------------|---------------|-----------------|
@@ -41,13 +41,18 @@ Contains a row for each replication link between primary and secondary databases
4141
| `secondary_allow_connections_desc` |**nvarchar(256)**|No<br /><br /> All|
4242
| `last_commit` |**datetimeoffset**|The time of last transaction committed to the database. If retrieved on the primary database, it indicates the last commit time on the primary database. If retrieved on the secondary database, it indicates the last commit time on the secondary database. If retrieved on the secondary database when the primary of the replication link is down, it indicates until what point the secondary has caught up.|
4343

44-
> [!NOTE]
45-
> If the replication relationship is terminated by removing the secondary database, the row for that database in the `sys.dm_geo_replication_link_status` view disappears.
46-
4744
## Permissions
4845

4946
Requires the `VIEW DATABASE STATE` permission in the database.
5047

48+
## Remarks
49+
50+
If the replication relationship is terminated by removing the secondary database, the row for that database in the `sys.dm_geo_replication_link_status` view disappears.
51+
52+
The view is created in all databases, including the `master` database. However, querying this view in the `master` database returns an empty set.
53+
54+
For more information on troubleshooting geo-replication redo lag in Azure SQL Database, see [Troubleshoot geo-replication redo lag](/azure/azure-sql/database/troubleshoot-geo-replication-redo?view=azuresql-db&preserve-view=true).
55+
5156
## Examples
5257

5358
This Transact-SQL query shows replication lags and last replication time of secondary databases.
@@ -63,7 +68,7 @@ FROM sys.dm_geo_replication_link_status;
6368

6469
## Related content
6570

66-
- [ALTER DATABASE (Transact-SQL)](../../t-sql/statements/alter-database-transact-sql.md)
71+
- [sys.dm_database_replica_states (Azure SQL Database)](sys-dm-database-replica-states-azure-sql-database.md)
6772
- [sys.geo_replication_links (Azure SQL Database)](sys-geo-replication-links-azure-sql-database.md)
6873
- [sys.dm_operation_status (Azure SQL Database)](sys-dm-operation-status-azure-sql-database.md)
6974
- [sp_wait_for_database_copy_sync](../system-stored-procedures/sp-wait-for-database-copy-sync-transact-sql.md)

0 commit comments

Comments
 (0)