Commit 1edd0c9cf59c80ec46bc9b75df9669e08da160c1

Martin Mitas 2019-03-26T11:49:25

test/spec.txt: Update to current upstream HEAD.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
diff --git a/test/spec.txt b/test/spec.txt
index 9fd5841..2f01257 100644
--- a/test/spec.txt
+++ b/test/spec.txt
@@ -514,8 +514,8 @@ one block element does not affect the inline parsing of any other.
 ## Container blocks and leaf blocks
 
 We can divide blocks into two types:
-[container block](@)s,
-which can contain other blocks, and [leaf block](@)s,
+[container blocks](@),
+which can contain other blocks, and [leaf blocks](@),
 which cannot.
 
 # Leaf blocks
@@ -527,7 +527,7 @@ Markdown document.
 
 A line consisting of 0-3 spaces of indentation, followed by a sequence
 of three or more matching `-`, `_`, or `*` characters, each followed
-optionally by any number of spaces, forms a
+optionally by any number of spaces or tabs, forms a
 [thematic break](@).
 
 ```````````````````````````````` example
@@ -1584,8 +1584,8 @@ begins with a code fence, indented no more than three spaces.
 
 The line with the opening code fence may optionally contain some text
 following the code fence; this is trimmed of leading and trailing
-spaces and called the [info string](@).
-The [info string] may not contain any backtick
+whitespace and called the [info string](@). If the [info string] comes
+after a backtick fence, it may not contain any backtick
 characters.  (The reason for this restriction is that otherwise
 some inline code would be incorrectly interpreted as the
 beginning of a fenced code block.)
@@ -1973,6 +1973,18 @@ foo</p>
 ````````````````````````````````
 
 
+[Info strings] for tilde code blocks can contain backticks and tildes:
+
+```````````````````````````````` example
+~~~ aa ``` ~~~
+foo
+~~~
+.
+<pre><code class="language-aa">foo
+</code></pre>
+````````````````````````````````
+
+
 Closing code fences cannot have [info strings]:
 
 ```````````````````````````````` example
@@ -1996,9 +2008,10 @@ by their start and end conditions.  The block begins with a line that
 meets a [start condition](@) (after up to three spaces
 optional indentation).  It ends with the first subsequent line that
 meets a matching [end condition](@), or the last line of
-the document or other [container block]), if no line is encountered that meets the
-[end condition].  If the first line meets both the [start condition]
-and the [end condition], the block will contain just that line.
+the document or other [container block](#container-blocks)), if no
+line is encountered that meets the [end condition].  If the first line
+meets both the [start condition] and the [end condition], the block
+will contain just that line.
 
 1.  **Start condition:**  line begins with the string `<script`,
 `<pre`, or `<style` (case-insensitive), followed by whitespace,
@@ -2029,7 +2042,7 @@ followed by one of the strings (case-insensitive) `address`,
 `footer`, `form`, `frame`, `frameset`,
 `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`,
 `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
-`meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
+`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
 `section`, `source`, `summary`, `table`, `tbody`, `td`,
 `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
 by [whitespace], the end of the line, the string `>`, or
@@ -2043,10 +2056,11 @@ or the end of the line.\
 **End condition:** line is followed by a [blank line].
 
 HTML blocks continue until they are closed by their appropriate
-[end condition], or the last line of the document or other [container block].
-This means any HTML **within an HTML block** that might otherwise be recognised
-as a start condition will be ignored by the parser and passed through as-is,
-without changing the parser's state.
+[end condition], or the last line of the document or other [container
+block](#container-blocks).  This means any HTML **within an HTML
+block** that might otherwise be recognised as a start condition will
+be ignored by the parser and passed through as-is, without changing
+the parser's state.
 
 For instance, `<pre>` within a HTML block started by `<table>` will not affect
 the parser state; as the HTML block was started in by start condition 6, it
@@ -2069,7 +2083,7 @@ _world_.
 </td></tr></table>
 ````````````````````````````````
 
-In this case, the HTML block is terminated by the newline — the `**hello**`
+In this case, the HTML block is terminated by the newline — the `**Hello**`
 text remains verbatim — and regular parsing resumes, with a paragraph,
 emphasised `world` and inline and block HTML following.
 
@@ -2612,7 +2626,8 @@ bar
 
 
 However, a following blank line is needed, except at the end of
-a document, and except for blocks of types 1--5, above:
+a document, and except for blocks of types 1--5, [above][HTML
+block]:
 
 ```````````````````````````````` example
 <div>
@@ -2758,8 +2773,8 @@ an indented code block:
 
 Fortunately, blank lines are usually not necessary and can be
 deleted.  The exception is inside `<pre>` tags, but as described
-above, raw HTML blocks starting with `<pre>` *can* contain blank
-lines.
+[above][HTML blocks], raw HTML blocks starting with `<pre>`
+*can* contain blank lines.
 
 ## Link reference definitions
 
@@ -2811,7 +2826,7 @@ them.
 
 ```````````````````````````````` example
 [Foo bar]:
-<my%20url>
+<my url>
 'title'
 
 [Foo bar]
@@ -2877,6 +2892,18 @@ The link destination may not be omitted:
 <p>[foo]</p>
 ````````````````````````````````
 
+The title must be separated from the link destination by
+whitespace:
+
+```````````````````````````````` example
+[foo]: <bar>(baz)
+
+[foo]
+.
+<p>[foo]: <bar>(baz)</p>
+<p>[foo]</p>
+````````````````````````````````
+
 
 Both title and destination can contain backslash escapes
 and literal backslashes:
@@ -3207,7 +3234,7 @@ aaa
 
 # Container blocks
 
-A [container block] is a block that has other
+A [container block](#container-blocks) is a block that has other
 blocks as its contents.  There are two basic kinds of container blocks:
 [block quotes] and [list items].
 [Lists] are meta-containers for [list items].
@@ -3669,9 +3696,8 @@ in some browsers.)
 The following rules define [list items]:
 
 1.  **Basic case.**  If a sequence of lines *Ls* constitute a sequence of
-    blocks *Bs* starting with a [non-whitespace character] and not separated
-    from each other by more than one blank line, and *M* is a list
-    marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result
+    blocks *Bs* starting with a [non-whitespace character], and *M* is a
+    list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result
     of prepending *M* and the following spaces to the first line of
     *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
     list item with *Bs* as its contents.  The type of the list item
@@ -3981,8 +4007,7 @@ A start number may not be negative:
 
 2.  **Item starting with indented code.**  If a sequence of lines *Ls*
     constitute a sequence of blocks *Bs* starting with an indented code
-    block and not separated from each other by more than one blank line,
-    and *M* is a list marker of width *W* followed by
+    block, and *M* is a list marker of width *W* followed by
     one space, then the result of prepending *M* and the following
     space to the first line of *Ls*, and indenting subsequent lines of
     *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
@@ -4458,9 +4483,10 @@ continued here.</p>
 6.  **That's all.** Nothing that is not counted as a list item by rules
     #1--5 counts as a [list item](#list-items).
 
-The rules for sublists follow from the general rules above.  A sublist
-must be indented the same number of spaces a paragraph would need to be
-in order to be included in the list item.
+The rules for sublists follow from the general rules
+[above][List items].  A sublist must be indented the same number
+of spaces a paragraph would need to be in order to be included
+in the list item.
 
 So, in this case we need two spaces indent:
 
@@ -5049,11 +5075,9 @@ item:
  - b
   - c
    - d
-    - e
-   - f
-  - g
- - h
-- i
+  - e
+ - f
+- g
 .
 <ul>
 <li>a</li>
@@ -5063,8 +5087,6 @@ item:
 <li>e</li>
 <li>f</li>
 <li>g</li>
-<li>h</li>
-<li>i</li>
 </ul>
 ````````````````````````````````
 
@@ -5074,7 +5096,7 @@ item:
 
   2. b
 
-    3. c
+   3. c
 .
 <ol>
 <li>
@@ -5089,6 +5111,49 @@ item:
 </ol>
 ````````````````````````````````
 
+Note, however, that list items may not be indented more than
+three spaces.  Here `- e` is treated as a paragraph continuation
+line, because it is indented more than three spaces:
+
+```````````````````````````````` example
+- a
+ - b
+  - c
+   - d
+    - e
+.
+<ul>
+<li>a</li>
+<li>b</li>
+<li>c</li>
+<li>d
+- e</li>
+</ul>
+````````````````````````````````
+
+And here, `3. c` is treated as in indented code block,
+because it is indented four spaces and preceded by a
+blank line.
+
+```````````````````````````````` example
+1. a
+
+  2. b
+
+    3. c
+.
+<ol>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+</li>
+</ol>
+<pre><code>3. c
+</code></pre>
+````````````````````````````````
+
 
 This is a loose list, because there is a blank line between
 two of the list items:
@@ -5522,7 +5587,7 @@ foo
 ## Entity and numeric character references
 
 All valid HTML entity references and numeric character
-references, except those occuring in code blocks and code spans,
+references, except those occurring in code blocks and code spans,
 are recognized as such and treated as equivalent to the
 corresponding Unicode characters.  Conforming CommonMark parsers
 need not store information about whether a particular character
@@ -5548,22 +5613,22 @@ references and their corresponding code points.
 
 [Decimal numeric character
 references](@)
-consist of `&#` + a string of 1--8 arabic digits + `;`. A
+consist of `&#` + a string of 1--7 arabic digits + `;`. A
 numeric character reference is parsed as the corresponding
 Unicode character. Invalid Unicode code points will be replaced by
 the REPLACEMENT CHARACTER (`U+FFFD`).  For security reasons,
 the code point `U+0000` will also be replaced by `U+FFFD`.
 
 ```````````````````````````````` example
-&#35; &#1234; &#992; &#98765432; &#0;
+&#35; &#1234; &#992; &#0;
 .
-<p># Ӓ Ϡ � �</p>
+<p># Ӓ Ϡ �</p>
 ````````````````````````````````
 
 
 [Hexadecimal numeric character
 references](@) consist of `&#` +
-either `X` or `x` + a string of 1-8 hexadecimal digits + `;`.
+either `X` or `x` + a string of 1-6 hexadecimal digits + `;`.
 They too are parsed as the corresponding Unicode character (this
 time specified with a hexadecimal numeral instead of decimal).
 
@@ -5578,9 +5643,13 @@ Here are some nonentities:
 
 ```````````````````````````````` example
 &nbsp &x; &#; &#x;
+&#987654321;
+&#abcdef0;
 &ThisIsNotDefined; &hi?;
 .
 <p>&amp;nbsp &amp;x; &amp;#; &amp;#x;
+&amp;#987654321;
+&amp;#abcdef0;
 &amp;ThisIsNotDefined; &amp;hi?;</p>
 ````````````````````````````````
 
@@ -5669,9 +5738,15 @@ preceded nor followed by a backtick.
 
 A [code span](@) begins with a backtick string and ends with
 a backtick string of equal length.  The contents of the code span are
-the characters between the two backtick strings, with leading and
-trailing spaces and [line endings] removed, and
-[whitespace] collapsed to single spaces.
+the characters between the two backtick strings, normalized in the
+following ways:
+
+- First, [line endings] are converted to [spaces].
+- If the resulting string both begins *and* ends with a [space]
+  character, a single [space] character is removed from the
+  front and back.  This allows you to include code that begins
+  or ends with backtick characters, which must be separated by
+  whitespace from the opening or closing backtick strings.
 
 This is a simple code span:
 
@@ -5683,10 +5758,11 @@ This is a simple code span:
 
 
 Here two backticks are used, because the code contains a backtick.
-This example also illustrates stripping of leading and trailing spaces:
+This example also illustrates stripping of a single leading and
+trailing space:
 
 ```````````````````````````````` example
-`` foo ` bar  ``
+`` foo ` bar ``
 .
 <p><code>foo ` bar</code></p>
 ````````````````````````````````
@@ -5701,58 +5777,69 @@ spaces:
 <p><code>``</code></p>
 ````````````````````````````````
 
-
-[Line endings] are treated like spaces:
+Note that only *one* space is stripped:
 
 ```````````````````````````````` example
-``
-foo
-``
+`  ``  `
 .
-<p><code>foo</code></p>
+<p><code> `` </code></p>
 ````````````````````````````````
 
+The stripping only happens if the space is on both
+sides of the string:
+
+```````````````````````````````` example
+` a`
+.
+<p><code> a</code></p>
+````````````````````````````````
 
-Interior spaces and [line endings] are collapsed into
-single spaces, just as they would be by a browser:
+Only [spaces], and not [unicode whitespace] in general, are
+stripped in this way:
 
 ```````````````````````````````` example
-`foo   bar
-  baz`
+` b `
 .
-<p><code>foo bar baz</code></p>
+<p><code> b </code></p>
 ````````````````````````````````
 
 
-Not all [Unicode whitespace] (for instance, non-breaking space) is
-collapsed, however:
+[Line endings] are treated like spaces:
 
 ```````````````````````````````` example
-`a  b`
+``
+foo
+bar  
+baz
+``
 .
-<p><code>a  b</code></p>
+<p><code>foo bar   baz</code></p>
 ````````````````````````````````
 
+```````````````````````````````` example
+``
+foo 
+``
+.
+<p><code>foo </code></p>
+````````````````````````````````
 
-Q: Why not just leave the spaces, since browsers will collapse them
-anyway?  A:  Because we might be targeting a non-HTML format, and we
-shouldn't rely on HTML-specific rendering assumptions.
 
-(Existing implementations differ in their treatment of internal
-spaces and [line endings].  Some, including `Markdown.pl` and
-`showdown`, convert an internal [line ending] into a
-`<br />` tag.  But this makes things difficult for those who like to
-hard-wrap their paragraphs, since a line break in the midst of a code
-span will cause an unintended line break in the output.  Others just
-leave internal spaces as they are, which is fine if only HTML is being
-targeted.)
+Interior spaces are not collapsed:
 
 ```````````````````````````````` example
-`foo `` bar`
+`foo   bar 
+baz`
 .
-<p><code>foo `` bar</code></p>
+<p><code>foo   bar  baz</code></p>
 ````````````````````````````````
 
+Note that browsers will typically collapse consecutive spaces
+when rendering `<code>` elements, so it is recommended that
+the following CSS be used:
+
+    code{white-space: pre-wrap;}
+
 
 Note that backslash escapes do not work in code spans. All backslashes
 are treated literally:
@@ -5768,6 +5855,19 @@ Backslash escapes are never needed, because one can always choose a
 string of *n* backtick characters as delimiters, where the code does
 not contain any strings of exactly *n* backtick characters.
 
+```````````````````````````````` example
+``foo`bar``
+.
+<p><code>foo`bar</code></p>
+````````````````````````````````
+
+```````````````````````````````` example
+` foo `` bar `
+.
+<p><code>foo `` bar</code></p>
+````````````````````````````````
+
+
 Code span backticks have higher precedence than any other inline
 constructs except HTML tags and autolinks.  Thus, for example, this is
 not parsed as emphasized text, since the second `*` is part of a code
@@ -5905,15 +6005,17 @@ of one or more `_` characters that is not preceded or followed by
 a non-backslash-escaped `_` character.
 
 A [left-flanking delimiter run](@) is
-a [delimiter run] that is (a) not followed by [Unicode whitespace],
-and (b) not followed by a [punctuation character], or
+a [delimiter run] that is (1) not followed by [Unicode whitespace],
+and either (2a) not followed by a [punctuation character], or
+(2b) followed by a [punctuation character] and
 preceded by [Unicode whitespace] or a [punctuation character].
 For purposes of this definition, the beginning and the end of
 the line count as Unicode whitespace.
 
 A [right-flanking delimiter run](@) is
-a [delimiter run] that is (a) not preceded by [Unicode whitespace],
-and (b) not preceded by a [punctuation character], or
+a [delimiter run] that is (1) not preceded by [Unicode whitespace],
+and either (2a) not preceded by a [punctuation character], or
+(2b) preceded by a [punctuation character] and
 followed by [Unicode whitespace] or a [punctuation character].
 For purposes of this definition, the beginning and the end of
 the line count as Unicode whitespace.
@@ -6636,6 +6738,17 @@ cannot form emphasis if the sum of the lengths of
 the delimiter runs containing the opening and
 closing delimiters is a multiple of 3.
 
+
+For the same reason, we don't get two consecutive
+emphasis sections in this example:
+
+```````````````````````````````` example
+*foo**bar*
+.
+<p><em>foo**bar</em></p>
+````````````````````````````````
+
+
 The same condition ensures that the following
 cases are all strong emphasis nested inside
 emphasis, even when the interior spaces are
@@ -7198,7 +7311,7 @@ following rules apply:
 A [link destination](@) consists of either
 
 - a sequence of zero or more characters between an opening `<` and a
-  closing `>` that contains no spaces, line breaks, or unescaped
+  closing `>` that contains no line breaks or unescaped
   `<` or `>` characters, or
 
 - a nonempty sequence of characters that does not include
@@ -7269,9 +7382,8 @@ Both the title and the destination may be omitted:
 <p><a href="">link</a></p>
 ````````````````````````````````
 
-
-The destination cannot contain spaces or line breaks,
-even if enclosed in pointy brackets:
+The destination can only contain spaces if it is
+enclosed in pointy brackets:
 
 ```````````````````````````````` example
 [link](/my uri)
@@ -7279,13 +7391,14 @@ even if enclosed in pointy brackets:
 <p>[link](/my uri)</p>
 ````````````````````````````````
 
-
 ```````````````````````````````` example
 [link](</my uri>)
 .
-<p>[link](&lt;/my uri&gt;)</p>
+<p><a href="/my%20uri">link</a></p>
 ````````````````````````````````
 
+The destination cannot contain line breaks,
+even if enclosed in pointy brackets:
 
 ```````````````````````````````` example
 [link](foo
@@ -7295,7 +7408,6 @@ bar)
 bar)</p>
 ````````````````````````````````
 
-
 ```````````````````````````````` example
 [link](<foo
 bar>)
@@ -8624,7 +8736,7 @@ a [single-quoted attribute value], or a [double-quoted attribute value].
 
 An [unquoted attribute value](@)
 is a nonempty string of characters not
-including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``.
+including [whitespace], `"`, `'`, `=`, `<`, `>`, or `` ` ``.
 
 A [single-quoted attribute value](@)
 consists of `'`, zero or more
@@ -8745,9 +8857,13 @@ Illegal [whitespace]:
 ```````````````````````````````` example
 < a><
 foo><bar/ >
+<foo bar=baz
+bim!bop />
 .
 <p>&lt; a&gt;&lt;
-foo&gt;&lt;bar/ &gt;</p>
+foo&gt;&lt;bar/ &gt;
+&lt;foo bar=baz
+bim!bop /&gt;</p>
 ````````````````````````````````
 
 
@@ -8944,10 +9060,10 @@ bar</em></p>
 Line breaks do not occur inside code spans
 
 ```````````````````````````````` example
-`code  
+`code 
 span`
 .
-<p><code>code span</code></p>
+<p><code>code  span</code></p>
 ````````````````````````````````