Skip to content

Commit af3ccc2

Browse files
mattnchrisbra
authored andcommitted
patch 9.2.0248: json_decode() is not strict enough
Problem: json_decode() accepted keywords case-insensitively, violating RFC 7159. Both json_decode() and js_decode() silently accepted lone surrogates, which are invalid Unicode. Solution: Only allow lowercase keyword in json_decode(), reject lone surrogates, improve encoding performance in write_string() and blob byte serialization. 1. Fix surrogate pair range check (0xDFFF -> 0xDBFF) so only high surrogates trigger pair decoding. Reject lone surrogates that do not form a valid pair instead of producing invalid UTF-8. 2. Use case-sensitive matching for JSON keywords (true, false, null, NaN, Infinity) in json_decode() per RFC 7159. js_decode() retains case-insensitive behavior. 3. Replace double ga_append() calls for escape sequences with single GA_CONCAT_LITERAL() calls, halving function call and buffer growth check overhead. 4. Replace vim_snprintf_safelen() for blob byte encoding (0-255) with direct digit conversion. closes: #19807 Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com> Signed-off-by: Christian Brabandt <cb@256bit.org>
1 parent c0f0a34 commit af3ccc2

5 files changed

Lines changed: 95 additions & 33 deletions

File tree

runtime/doc/builtin.txt

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
*builtin.txt* For Vim version 9.2. Last change: 2026 Mar 17
1+
*builtin.txt* For Vim version 9.2. Last change: 2026 Mar 25
22

33

44
VIM REFERENCE MANUAL by Bram Moolenaar
@@ -6432,6 +6432,8 @@ js_decode({string}) *js_decode()*
64326432
- Strings can be in single quotes.
64336433
- Empty items in an array (between two commas) are allowed and
64346434
result in v:none items.
6435+
- Capitalization is ignored in keywords: true, false, null,
6436+
NaN, Infinity and -Infinity.
64356437

64366438
Can also be used as a |method|: >
64376439
ReadObject()->js_decode()
@@ -6470,23 +6472,20 @@ json_decode({string}) *json_decode()* *E491*
64706472
same as {"1":2}.
64716473
- More floating point numbers are recognized, e.g. "1." for
64726474
"1.0", or "001.2" for "1.2". Special floating point values
6473-
"Infinity", "-Infinity" and "NaN" (capitalization ignored)
6474-
are accepted.
6475+
"Infinity", "-Infinity" and "NaN" are accepted.
64756476
- Leading zeroes in integer numbers are ignored, e.g. "012"
64766477
for "12" or "-012" for "-12".
6477-
- Capitalization is ignored in literal names null, true or
6478-
false, e.g. "NULL" for "null", "True" for "true".
64796478
- Control characters U+0000 through U+001F which are not
64806479
escaped in strings are accepted, e.g. " " (tab
64816480
character in string) for "\t".
64826481
- An empty JSON expression or made of only spaces is accepted
64836482
and results in v:none.
64846483
- Backslash in an invalid 2-character sequence escape is
64856484
ignored, e.g. "\a" is decoded as "a".
6486-
- A correct surrogate pair in JSON strings should normally be
6487-
a 12 character sequence such as "\uD834\uDD1E", but
6488-
json_decode() silently accepts truncated surrogate pairs
6489-
such as "\uD834" or "\uD834\u"
6485+
- A surrogate pair in JSON strings is a 12 character sequence
6486+
such as "\uD834\uDD1E". A lone surrogate or an invalid
6487+
surrogate pair (e.g. "\uD800" or "\uD800\uD800") results
6488+
in an error.
64906489
*E938*
64916490
A duplicate key in an object, valid in rfc7159, is not
64926491
accepted by json_decode() as the result must be a valid Vim

runtime/doc/version9.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
*version9.txt* For Vim version 9.2. Last change: 2026 Mar 22
1+
*version9.txt* For Vim version 9.2. Last change: 2026 Mar 25
22

33

44
VIM REFERENCE MANUAL by Bram Moolenaar
@@ -52620,6 +52620,9 @@ Add "-t" option to append a terminating NUL byte to C include output (-i).
5262052620
Changed~
5262152621
-------
5262252622
- Support for NeXTStep was dropped with patch v9.2.0122
52623+
- |json_decode()| is stricter: keywords must be lowercase, lone surrogates are
52624+
now invalid
52625+
- |js_decode()| rejects lone surrogates
5262352626

5262452627
*added-9.3*
5262552628
Added ~

src/json.c

Lines changed: 46 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,8 @@ write_string(garray_T *gap, char_u *str)
163163
}
164164
#endif
165165
ga_append(gap, '"');
166+
// Pre-grow for the common case: input length + quotes + some escapes.
167+
ga_grow(gap, (int)STRLEN(res) + 2);
166168
// `from` is the beginning of a sequence of bytes we can directly copy from
167169
// the input string, avoiding the overhead associated to decoding/encoding
168170
// them.
@@ -185,20 +187,19 @@ write_string(garray_T *gap, char_u *str)
185187
switch (c)
186188
{
187189
case 0x08:
188-
ga_append(gap, '\\'); ga_append(gap, 'b'); break;
190+
GA_CONCAT_LITERAL(gap, "\\b"); break;
189191
case 0x09:
190-
ga_append(gap, '\\'); ga_append(gap, 't'); break;
192+
GA_CONCAT_LITERAL(gap, "\\t"); break;
191193
case 0x0a:
192-
ga_append(gap, '\\'); ga_append(gap, 'n'); break;
194+
GA_CONCAT_LITERAL(gap, "\\n"); break;
193195
case 0x0c:
194-
ga_append(gap, '\\'); ga_append(gap, 'f'); break;
196+
GA_CONCAT_LITERAL(gap, "\\f"); break;
195197
case 0x0d:
196-
ga_append(gap, '\\'); ga_append(gap, 'r'); break;
198+
GA_CONCAT_LITERAL(gap, "\\r"); break;
197199
case 0x22: // "
200+
GA_CONCAT_LITERAL(gap, "\\\""); break;
198201
case 0x5c: // backslash
199-
ga_append(gap, '\\');
200-
ga_append(gap, c);
201-
break;
202+
GA_CONCAT_LITERAL(gap, "\\\\"); break;
202203
default:
203204
{
204205
size_t numbuflen;
@@ -341,13 +342,24 @@ json_encode_item(garray_T *gap, typval_T *val, int copyID, int options, int dept
341342
ga_append(gap, '[');
342343
for (i = 0; i < b->bv_ga.ga_len; i++)
343344
{
344-
size_t numbuflen;
345+
int byte = blob_get(b, i);
345346

346347
if (i > 0)
347-
GA_CONCAT_LITERAL(gap, ",");
348-
numbuflen = vim_snprintf_safelen((char *)numbuf, sizeof(numbuf),
349-
"%d", blob_get(b, i));
350-
ga_concat_len(gap, numbuf, numbuflen);
348+
ga_append(gap, ',');
349+
// blob bytes are 0-255, use simple conversion
350+
if (byte >= 100)
351+
{
352+
ga_append(gap, '0' + byte / 100);
353+
ga_append(gap, '0' + (byte / 10) % 10);
354+
ga_append(gap, '0' + byte % 10);
355+
}
356+
else if (byte >= 10)
357+
{
358+
ga_append(gap, '0' + byte / 10);
359+
ga_append(gap, '0' + byte % 10);
360+
}
361+
else
362+
ga_append(gap, '0' + byte);
351363
}
352364
ga_append(gap, ']');
353365
}
@@ -610,7 +622,7 @@ json_decode_string(js_read_T *reader, typval_T *res, int quote)
610622
return FAIL;
611623
}
612624
p += len + 2;
613-
if (0xd800 <= nr && nr <= 0xdfff
625+
if (0xd800 <= nr && nr <= 0xdbff
614626
&& (int)(reader->js_end - p) >= 6
615627
&& *p == '\\' && *(p+1) == 'u')
616628
{
@@ -633,6 +645,13 @@ json_decode_string(js_read_T *reader, typval_T *res, int quote)
633645
((nr2 - 0xdc00) & 0x3ff)) + 0x10000;
634646
}
635647
}
648+
// Lone surrogate is invalid.
649+
if (0xd800 <= nr && nr <= 0xdfff)
650+
{
651+
if (res != NULL)
652+
ga_clear(&ga);
653+
return FAIL;
654+
}
636655
if (res != NULL)
637656
{
638657
char_u buf[NUMBUFLEN];
@@ -975,7 +994,13 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
975994
retval = OK;
976995
break;
977996
}
978-
if (STRNICMP((char *)p, "false", 5) == 0)
997+
// In strinct JSON mode, keywords must be lowercase.
998+
// In JS mode, keywords are case-insensitive.
999+
#define MATCH_KW(p, kw, len) \
1000+
((options & JSON_JS) \
1001+
? STRNICMP((char *)(p), (kw), (len)) == 0 \
1002+
: STRNCMP((char *)(p), (kw), (len)) == 0)
1003+
if (MATCH_KW(p, "false", 5))
9791004
{
9801005
reader->js_used += 5;
9811006
if (cur_item != NULL)
@@ -986,7 +1011,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
9861011
retval = OK;
9871012
break;
9881013
}
989-
if (STRNICMP((char *)p, "true", 4) == 0)
1014+
if (MATCH_KW(p, "true", 4))
9901015
{
9911016
reader->js_used += 4;
9921017
if (cur_item != NULL)
@@ -997,7 +1022,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
9971022
retval = OK;
9981023
break;
9991024
}
1000-
if (STRNICMP((char *)p, "null", 4) == 0)
1025+
if (MATCH_KW(p, "null", 4))
10011026
{
10021027
reader->js_used += 4;
10031028
if (cur_item != NULL)
@@ -1008,7 +1033,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
10081033
retval = OK;
10091034
break;
10101035
}
1011-
if (STRNICMP((char *)p, "NaN", 3) == 0)
1036+
if (MATCH_KW(p, "NaN", 3))
10121037
{
10131038
reader->js_used += 3;
10141039
if (cur_item != NULL)
@@ -1019,7 +1044,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
10191044
retval = OK;
10201045
break;
10211046
}
1022-
if (STRNICMP((char *)p, "-Infinity", 9) == 0)
1047+
if (MATCH_KW(p, "-Infinity", 9))
10231048
{
10241049
reader->js_used += 9;
10251050
if (cur_item != NULL)
@@ -1030,7 +1055,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
10301055
retval = OK;
10311056
break;
10321057
}
1033-
if (STRNICMP((char *)p, "Infinity", 8) == 0)
1058+
if (MATCH_KW(p, "Infinity", 8))
10341059
{
10351060
reader->js_used += 8;
10361061
if (cur_item != NULL)
@@ -1041,6 +1066,7 @@ json_decode_item(js_read_T *reader, typval_T *res, int options)
10411066
retval = OK;
10421067
break;
10431068
}
1069+
#undef MATCH_KW
10441070
// check for truncated name
10451071
len = (int)(reader->js_end
10461072
- (reader->js_buf + reader->js_used));

src/testdir/test_json.vim

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ let s:var5 = "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
1414
" surrogate pair
1515
let s:jsonsp1 = '"\ud83c\udf63"'
1616
let s:varsp1 = "\xf0\x9f\x8d\xa3"
17+
" high surrogate followed by non-surrogate is invalid (lone surrogate)
1718
let s:jsonsp2 = '"\ud83c\u00a0"'
18-
let s:varsp2 = "\ud83c\u00a0"
1919

2020
let s:jsonmb = '"s¢cĴgё"'
2121
let s:varmb = "s¢cĴgё"
@@ -126,7 +126,7 @@ func Test_json_decode()
126126

127127
call assert_equal(s:varmb, json_decode(s:jsonmb))
128128
call assert_equal(s:varsp1, json_decode(s:jsonsp1))
129-
call assert_equal(s:varsp2, json_decode(s:jsonsp2))
129+
call assert_fails('call json_decode(s:jsonsp2)', 'E491:')
130130

131131
call assert_equal(s:varnr, json_decode(s:jsonnr))
132132
call assert_equal(s:varfl, json_decode(s:jsonfl))
@@ -151,6 +151,18 @@ func Test_json_decode()
151151
call assert_equal(type(v:none), type(json_decode('')))
152152
call assert_equal("", json_decode('""'))
153153

154+
" json_decode() requires lowercase keywords (RFC 7159)
155+
call assert_fails('call json_decode("True")', 'E491:')
156+
call assert_fails('call json_decode("FALSE")', 'E491:')
157+
call assert_fails('call json_decode("Null")', 'E491:')
158+
call assert_fails('call json_decode("NULL")', 'E491:')
159+
call assert_fails('call json_decode("nan")', 'E491:')
160+
call assert_fails('call json_decode("NAN")', 'E491:')
161+
call assert_fails('call json_decode("infinity")', 'E491:')
162+
call assert_fails('call json_decode("INFINITY")', 'E491:')
163+
call assert_fails('call json_decode("-infinity")', 'E491:')
164+
call assert_fails('call json_decode("-INFINITY")', 'E491:')
165+
154166
" Character in string after \ is ignored if not special.
155167
call assert_equal("x", json_decode('"\x"'))
156168

@@ -165,6 +177,12 @@ func Test_json_decode()
165177
" but not twice
166178
call assert_fails("call json_decode('{\"\": \"ok\", \"\": \"bad\"}')", 'E938:')
167179

180+
" lone surrogate is invalid
181+
call assert_fails('call json_decode("\"\\uD800\"")', 'E491:')
182+
call assert_fails('call json_decode("\"\\uDC00\"")', 'E491:')
183+
call assert_fails('call json_decode("\"\\uD800\\uD800\"")', 'E491:')
184+
call assert_fails('call json_decode("\"\\uDC00\\uDC00\"")', 'E491:')
185+
168186
call assert_equal({'n': 1}, json_decode('{"n":1,}'))
169187
call assert_fails("call json_decode(\"{'n':'1',}\")", 'E491:')
170188
call assert_fails("call json_decode(\"'n'\")", 'E491:')
@@ -257,7 +275,7 @@ func Test_js_decode()
257275

258276
call assert_equal(s:varmb, js_decode(s:jsonmb))
259277
call assert_equal(s:varsp1, js_decode(s:jsonsp1))
260-
call assert_equal(s:varsp2, js_decode(s:jsonsp2))
278+
call assert_fails('call js_decode(s:jsonsp2)', 'E491:')
261279

262280
call assert_equal(s:varnr, js_decode(s:jsonnr))
263281
call assert_equal(s:varfl, js_decode(s:jsonfl))
@@ -293,6 +311,20 @@ func Test_js_decode()
293311
call assert_equal("", js_decode("''"))
294312

295313
call assert_equal('n', js_decode("'n'"))
314+
315+
" js_decode() accepts keywords case-insensitively
316+
call assert_equal(v:true, js_decode('True'))
317+
call assert_equal(v:true, js_decode('TRUE'))
318+
call assert_equal(v:false, js_decode('False'))
319+
call assert_equal(v:false, js_decode('FALSE'))
320+
call assert_equal(v:null, js_decode('Null'))
321+
call assert_equal(v:null, js_decode('NULL'))
322+
call assert_true(isnan(js_decode('nan')))
323+
call assert_equal(s:varposinf, js_decode('infinity'))
324+
call assert_equal(s:varneginf, js_decode('-infinity'))
325+
call assert_equal(s:varposinf, js_decode('INFINITY'))
326+
call assert_equal(s:varneginf, js_decode('-INFINITY'))
327+
296328
call assert_equal({'n': 1}, js_decode('{"n":1,}'))
297329
call assert_equal({'n': '1'}, js_decode("{'n':'1',}"))
298330

src/version.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -734,6 +734,8 @@ static char *(features[]) =
734734

735735
static int included_patches[] =
736736
{ /* Add new patch number below this line */
737+
/**/
738+
248,
737739
/**/
738740
247,
739741
/**/

0 commit comments

Comments
 (0)