Escape link/image destinations and image alt text (#261) by assinscreedFC · Pull Request #262 · matthewwithanm/python-markdownify

assinscreedFC · 2026-06-05T17:05:50Z

Summary

Fixes #261.

convert_a, convert_img and convert_video dropped raw attribute values and generated labels into Markdown link/image syntax without escaping them for that context. As reported, this lets a destination or alt value truncate the syntax, so downstream Markdown parsers read different HTML than the original — including substituting an attacker-controlled URL:

md('<img src="/safe" alt="](http://attacker)">')
# before: '![](http://attacker)](/safe)'  -> re-parses to <img src="http://attacker">
# after:  '![\](http://attacker)](/safe)' -> alt is literal text, src preserved

Changes

Two helpers in markdownify/__init__.py:

escape_link_destination(url) — wraps a destination containing whitespace or parentheses in angle brackets (CommonMark allows spaces and () inside <...>), escaping any </>. Applied to href (convert_a), src (convert_img, convert_video) and poster (convert_video).
escape_link_text(text) — escapes [ and ] (and backslashes) in alt text taken verbatim from HTML, so it can't close the [...] label early. Applied to alt in convert_img.

Verified against all reporter cases:

input	output
`<img src="/a" alt="]">`	`![\]](/a)`
`<img src="/a b" alt="x">`	`![x](</a b>)`
`<img src="/safe" alt="](http://attacker)">`	`![\](http://attacker)](/safe)`
`<a href="/a)b">click</a>`	`[click](</a)b>)`

Unaffected inputs (no spaces/parens/brackets) are emitted unchanged, so existing output is preserved.

Scope note

I left link text for convert_a/convert_video alone, since that is already-rendered Markdown from child nodes (escaped via self.escape()), not a verbatim attribute. Happy to extend if you'd prefer.

Test plan

Added test_a_escapes_destination, test_img_escapes_link_syntax, test_video_escapes_destination in tests/test_conversions.py.
pytest — 86 passed.
flake8 --ignore=E501,W503 markdownify tests — clean.

convert_a, convert_img and convert_video emitted attribute values and generated labels into Markdown link/image syntax without escaping them for that context. A destination containing spaces or parentheses, or an image alt containing ']', could truncate the syntax so downstream parsers read different HTML than the original -- including substituting an attacker-controlled URL (e.g. alt='](http://attacker)'). Add escape_link_destination() (wrap destinations with whitespace or parentheses in angle brackets) and escape_link_text() (escape '[' and ']' in verbatim alt text), and apply them in the link/image converters. Fixes matthewwithanm#261

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape link/image destinations and image alt text (#261)#262

Escape link/image destinations and image alt text (#261)#262
assinscreedFC wants to merge 1 commit into
matthewwithanm:developfrom
assinscreedFC:fix/261-escape-link-syntax

assinscreedFC commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

assinscreedFC commented Jun 5, 2026

Summary

Changes

Scope note

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant