Skip to content

Commit c7ee6b7

Browse files
Fix Google Search Console indexing issues for spdlearn.org
Two root causes were preventing proper indexing: 1. Sitemap URLs had a broken `en/0.1/` prefix (all 179 pages returned 404). sphinx_sitemap defaults to `{lang}{version}{link}` which doesn't match our single-version deployment at root. Fixed by setting `sitemap_url_scheme = "{link}"`. 2. Homepage canonical tag included `index.html` (`https://spdlearn.org/index.html`), causing Google to override it with `https://spdlearn.org/`. Added a Sphinx `html-page-context` hook to strip the trailing `index.html` from canonical URLs.
1 parent 46d7160 commit c7ee6b7

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

docs/source/conf.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,10 @@
303303
# Point 6: SEO - Add canonical URLs
304304
html_baseurl = "https://spdlearn.org/"
305305

306+
# Fix sitemap URLs - current deployment is single-version at root,
307+
# so don't prefix with language/version directories
308+
sitemap_url_scheme = "{link}"
309+
306310
# Point 7: Copy root-level site files (robots.txt, BingSiteAuth.xml, etc.)
307311
html_extra_path = ["_extra"]
308312

@@ -549,3 +553,14 @@ def linkcode_resolve(domain, info):
549553
pass
550554

551555
return f"https://github.com/{github_user}/{github_repo}/blob/{github_version}/{relpath}{linespec}"
556+
557+
558+
def _fix_index_canonical_url(app, pagename, templatename, context, doctree):
559+
"""Strip index.html from canonical URLs so Google indexes clean directory URLs."""
560+
pageurl = context.get("pageurl", "")
561+
if pageurl and pageurl.endswith("/index.html"):
562+
context["pageurl"] = pageurl[: -len("index.html")]
563+
564+
565+
def setup(app):
566+
app.connect("html-page-context", _fix_index_canonical_url)

0 commit comments

Comments
 (0)