Skip to content

Escape link/image destinations and image alt text (#261)#262

Open
assinscreedFC wants to merge 1 commit into
matthewwithanm:developfrom
assinscreedFC:fix/261-escape-link-syntax
Open

Escape link/image destinations and image alt text (#261)#262
assinscreedFC wants to merge 1 commit into
matthewwithanm:developfrom
assinscreedFC:fix/261-escape-link-syntax

Conversation

@assinscreedFC

Copy link
Copy Markdown

Summary

Fixes #261.

convert_a, convert_img and convert_video dropped raw attribute values and generated labels into Markdown link/image syntax without escaping them for that context. As reported, this lets a destination or alt value truncate the syntax, so downstream Markdown parsers read different HTML than the original — including substituting an attacker-controlled URL:

md('<img src="/safe" alt="](http://attacker)">')
# before: '![](http://attacker)](/safe)'  -> re-parses to <img src="http://attacker">
# after:  '![\](http://attacker)](/safe)' -> alt is literal text, src preserved

Changes

Two helpers in markdownify/__init__.py:

  • escape_link_destination(url) — wraps a destination containing whitespace or parentheses in angle brackets (CommonMark allows spaces and () inside <...>), escaping any </>. Applied to href (convert_a), src (convert_img, convert_video) and poster (convert_video).
  • escape_link_text(text) — escapes [ and ] (and backslashes) in alt text taken verbatim from HTML, so it can't close the [...] label early. Applied to alt in convert_img.

Verified against all reporter cases:

input output
<img src="/a" alt="]"> ![\]](/a)
<img src="/a b" alt="x"> ![x](</a b>)
<img src="/safe" alt="](http://attacker)"> ![\](http://attacker)](/safe)
<a href="/a)b">click</a> [click](</a)b>)

Unaffected inputs (no spaces/parens/brackets) are emitted unchanged, so existing output is preserved.

Scope note

I left link text for convert_a/convert_video alone, since that is already-rendered Markdown from child nodes (escaped via self.escape()), not a verbatim attribute. Happy to extend if you'd prefer.

Test plan

  • Added test_a_escapes_destination, test_img_escapes_link_syntax, test_video_escapes_destination in tests/test_conversions.py.
  • pytest — 86 passed.
  • flake8 --ignore=E501,W503 markdownify tests — clean.

convert_a, convert_img and convert_video emitted attribute values and
generated labels into Markdown link/image syntax without escaping them
for that context. A destination containing spaces or parentheses, or an
image alt containing ']', could truncate the syntax so downstream
parsers read different HTML than the original -- including substituting
an attacker-controlled URL (e.g. alt='](http://attacker)').

Add escape_link_destination() (wrap destinations with whitespace or
parentheses in angle brackets) and escape_link_text() (escape '[' and
']' in verbatim alt text), and apply them in the link/image converters.

Fixes matthewwithanm#261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image/link attributes containing ], ), or spaces produce broken Markdown output

1 participant