Branch
Hash :
64f36805
Author :
Date :
2024-02-25T16:24:50
Fixes:
#236:
Fix quadratic time behavior caused by one-by-one walking over block lines
instead of calling md_lookup_line()
.
#238: Fix quadratic time and output size behavior caused by malicious misuse of link reference definitions.
#242:
The strike-through extension (with flag MD_FLAG_STRIKETHROUGH
) now follows
same logic as other emphasis spans in respect to punctuation character and
word boundaries.
#248: Fix handling tab when removing trailing whitespace, especially in connection with ATX headers.
Changes:
Changes mandated by CommonMark specification 0.31:
The specification expands set of Unicode characters seen by Markdown parser as a punctuation. Namely all Unicode general categories P (punctuation) and S (symbols) are now seen as such.
The definition of HTML comment has been changed so that <!-->
and
<!--->
are also recognized as HTML comments.
HTML tags recognized as HTML block starting condition of type 4 has been
updated, namely a tag <source>
has been removed, whereas <search>
added.
Refer to CommonMark 0.31.2 for full specification.
Fixes:
#230: The fix #223 in 0.5.1 release was incomplete and one corner case remained unfixed. This is now addressed.
#231:
md2html --full-html
now emits <meta charset="UTF-8">
in the HTML header.
Changes:
LaTeX math extension (MD_FLAG_LATEXMATHSPANS
) now requires that opener
mark is not immediately preceded with alpha-numeric character and similarly
that closer mark is not immediately followed with alpha-numeric character.
So for example foo$ x + y = z $
is not recognized as LaTeX equation
anymore because there is no space between foo
and the opening $
.
Table extension (MD_FLAG_TABLES
) now recognizes only tables with no more
than 128 columns. This limit has been imposed to prevent a pathological
case of quadratic output size explosion which could be used as DoS attack
vector.
We are now more strict with MD_FLAG_PERMISSIVExxxAUTOLINKS
family of
extensions with respect to non-alphanumeric characters, with the aim to
mitigate false positive detections.
Only relatively few selected non-alphanumeric are now allowed in permissive
e-mail auto-links (MD_FLAG_PERMISSIVEEMAILAUTOLINKS
):
.
, -
, _
, +
in user name part of e-mail address; and .
, -
, _
in host part of the e-mail address.
Similarly for URL and e-mail auto-links (MD_FLAG_PERMISSIVEURLAUTOLINKS
and
MD_FLAG_PERMISSIVEWWWAUTOLINKS
):
.
, -
, _
in host part of the URL; /
, .
, -
, _
in path part of the URL; &
, .
, -
, +
, _
, =
, (
, )
in the query part of the URL
(additionally, if present, (
and )
must form balanced pairs); and .
, -
, +
, _
in the fragment part of the URL.
Furthermore these characters (with some exceptions like where they serve as
delimiter characters, e.g. /
for paths) are generally accepted only when
an alphanumeric character both precedes and follows them (i.e. these cannot
be “stacked” together).
Fixes:
Fix several bugs where we haven’t properly respected already resolved spans
of higher precedence level in handling of permissive auto-links extensions
(family of MD_FLAG_PERMISSIVExxxAUTOLINKS
flags), LaTeX math extension
(MD_FLAG_LATEXMATHSPANS
) and wiki-links extension (MD_FLAG_WIKILINKS
)
of the form [[label|text]]
(with pipe |
). In some complex cases this
could lead to invalid internal parser state and memory corruption.
Identified with OSS-Fuzz.
#222:
Fix strike-through extension (MD_FLAG_STRIKETHROUGH
) which did not respect
same rules for pairing opener and closer marks as other emphasis spans.
#223: Fix incorrect handling of new-line character just at the beginning and/or end of a code span where we were not following CommonMark specification requirements correctly.
Changes:
Changes mandated by CommonMark specification 0.30.
Actually there are only very minor changes to recognition of HTML blocks:
The tag <textarea>
now triggers HTML block (of type 1 as per the
specification).
HTML declaration (HTML block type 4) is not required to begin with an
upper-case ASCII character after the <!
. Any ASCII character is now
allowed. Also it now doesn’t require a whitespace before the closing >
.
Other than that, the newest specification mainly improves test coverage and clarifies its wording in some cases, without affecting the implementation.
Refer to CommonMark 0.30 for full specification.
Make Unicode-specific code compliant to Unicode 15.1.
Update list of entities known to the HTML renderer from https://html.spec.whatwg.org/entities.json.
New Features:
Add extension allowing to treat all soft break as hard ones. It has to be
explicitly enabled with MD_FLAG_HARD_SOFT_BREAKS
.
Contributed by l-m.
Structure MD_SPAN_A_DETAIL
now has a new member is_autolink
.
Contributed by Jens Alfke.
md2html
utility now supports command line options --html-title
and
--html-css
.
Contributed by Andreas Baumann.
Fixes:
#163:
Make HTML renderer to emit '\n'
after the root tag when in the XHTML mode.
#165:
Make HTML renderer not to percent-encode '~'
in URLs. Although it does
work, it’s not needed, and it can actually be confusing with URLs such as
http://www.example.com/~johndoe/
.
#167, #168: Fix multiple instances of various buffer overflow bugs, found mostly using a fuzz testing. Contributed by dtldarek and Thierry Coppey.
#169:
Table underline now does not require 3 characters per table column anymore.
One dash (optionally with a leading or tailing :
appended or prepended)
is now sufficient. This improves compatibility with the GFM.
#172: Fix quadratic time behavior caused by unnecessary lookup for link reference definition even if the potential label contains nested brackets.
#173, #174, #212, #213: Multiple bugs identified with OSS-Fuzz were fixed.
#190, #200, #201: Multiple fixes of incorrect interactions of indented code block with a preceding block.
#202:
We were not correctly calling enter_block()
and leave_block()
callbacks
if multiple HTML blocks followed one after another; instead previously
such blocks were merged into one.
(This may likely impact only applications interested in Markdown’s AST, and not just converting Markdown to other formats like HTML.)
#210:
The md2html
utility now handles nested images with optional titles
correctly.
#214:
Tags <h2>
… <h6>
incorrectly did not trigger HTML block.
#215: The parser incorrectly did not accept optional tabs after setext header underline.
#217: The parser incorrectly resolved emphasis in some situations, if the emphasis marks were enclosed by punctuation characters.
Fixes:
#149: A HTML block started in a container block (and not explicitly finished in the block) could eat 1 line of actual contents.
#150:
Fix md2html
to output proper DOCTYPE and HTML tags when --full-html
command line options is used, accordingly to the expected output format
(HTML or XHTML).
#152: Suppress recognition of a permissive autolink if it would otherwise form a complete body of an outer inline link.
#153,
#154:
Set MD_BLOCK_UL_DETAIL::mark
and MD_BLOCK_OL_DETAIL::mark_delimiter
correctly, even when the blocks are nested at the same line in a complicated
ways.
#155: Avoid reading 1 character beyond the input size in some complex cases.
Changes:
MD_TABLE_DETAIL
structure into the API. The structure describes column
count and row count of the table, and pointer to it is passed into the
application-provided block callback with the MD_BLOCK_TABLE
block type. Fixes:
#131: Fix handling of a reference image nested in a reference link.
#135: Handle unmatched parenthesis pairs inside a permissive URL and WWW auto-links in a way more compatible with the GFM.
#138:
The tag <tbody></tbody>
is now suppressed whenever the table has zero body
rows.
#139: Recognize a list item mark even when EOF follows it.
#142: Fix reference link definition label matching in a case when the label ends with a Unicode character with non-trivial case folding mapping.
Fixes:
#130:
Fix ISANYOF
macro, which could provide unexpected results when encountering
zero byte in the input text; in some cases leading to broken internal state
of the parser.
The bug could result in denial of service and possibly also to other security implications. Applications are advised to update to 0.4.6.
Fixes:
#118:
Fix HTML renderer’s MD_HTML_FLAG_VERBATIM_ENTITIES
flag, exposed in the
md2html
utility via --fverbatim-entities
.
#124: Fix handling of indentation of 16 or more spaces in the fenced code blocks.
Changes:
New features:
The HTML renderer, developed originally as the heart of the md2html
utility, is now built as a standalone library, in order to simplify its
reuse in applications.
With MD_HTML_FLAG_SKIP_UTF8_BOM
, the HTML renderer now skips UTF-8 byte
order mark (BOM) if the input begins with it, before passing to the Markdown
parser.
md2html
utility automatically enables the flag (unless it is custom-built
with -DMD4C_USE_ASCII
).
With MD_HTML_FLAG_XHTML
, The HTML renderer generates XHTML instead of
HTML.
This effectively means <br />
instead of <br>
, <hr />
instead of
<hr>
, and <img ... />
instead of <img ...>
.
md2html
utility now understands the command line option -x
or --xhtml
enabling the XHTML mode.
Fixes:
#113:
Add missing folding info data for the following Unicode characters:
U+0184
, U+018a
, U+01b2
, U+01b5
, U+01f4
, U+0372
, U+038f
,
U+1c84
, U+1fb9
, U+1fbb
, U+1fd9
, U+1fdb
, U+1fe9
, U+1feb
,
U+1ff9
, U+1ffb
, U+2c7f
, U+2ced
, U+a77b
, U+a792
, U+a7c9
.
Due the bug, the link definition label matching did not work in the case insensitive way for these characters.
New features:
MD_FLAG_UNDERLINE
, spans enclosed in underscore (_foo_
) are seen
as underline (MD_SPAN_UNDERLINE
) rather than an ordinary emphasis or
strong emphasis. Changes:
The implementation of wiki-links extension (with MD_FLAG_WIKILINKS
) has
been simplified.
Recognition of strike-through spans (with the flag MD_FLAG_STRIKETHROUGH
)
has become much stricter and, arguably, reasonable.
~
) and double tildes (~~
) are recognized as
strike-through marks. Longer ones are not anymore. This change follows the changes of behavior in cmark-gfm some time ago, so it is also beneficial from compatibility point of view.
When building MD4C by hand instead of using its CMake-based build, the UTF-8
support was by default disabled, unless explicitly asked for by defining
a preprocessor macro MD4C_USE_UTF8
.
This has been changed and the UTF-8 mode now becomes the default, no matter
how md4c.c
is compiled. If you need to disable it and use the ASCII-only
mode, you have explicitly define macro MD4C_USE_ASCII
when compiling it.
(The CMake-based build as provided in our repository explicitly asked for
the UTF-8 support with -DMD4C_USE_UTF8
. I.e. if you are using MD4C library
built with our vanilla CMakeLists.txt
files, this change should not affect
you.)
Fixes:
Fixed some string length handling in the special MD4C_USE_UTF16
build.
(This does not affect you unless you are on Windows and explicitly define the macro when building MD4C.)
#100: Fixed an off-by-one error in the maximal length limit of some segments of e-mail addresses used in autolinks.
#107:
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
length of the opener and closer differs, as in ***foo *bar baz***
.
Fixes:
**a *b c** d*
. Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing this is the only change on 0.4.1.
New features:
With MD_FLAG_LATEXMATHSPANS
, LaTeX math spans ($...$
) and LaTeX display
math spans ($$...$$
) are now recognized. (Note though that the HTML
renderer outputs them verbatim in a custom <x-equation>
tag.)
Contributed by Tilman Roeder.
With MD_FLAG_WIKILINKS
, Wiki-style links ([[...]]
) are now recognized.
(Note though that the HTML renderer renders them as a custom <x-wikilink>
tag.)
Contributed by Nils Blomqvist.
Changes:
Parsing of tables (with MD_FLAG_TABLES
) is now closer to the way how
cmark-gfm parses tables as we do not require every row of the table to
contain a pipe |
anymore.
As a consequence, paragraphs now cannot interrupt tables. A paragraph which follows the table has to be delimited with a blank line.
Fixes:
#94:
md_build_ref_def_hashtable()
: Do not allocate more memory than strictly
needed.
#95:
md_is_container_mark()
: Ordered list mark requires at least one digit.
#96: Some fixes for link label comparison.
Changes:
Make Unicode-specific code compliant to Unicode 12.1.
Structure MD_BLOCK_CODE_DETAIL
got new member fenced_char
. Application
can use it to detect character used to form the block fences (`
or
~
). In the case of indented code block, it is set to zero.
Fixes:
#77: Fix maximal count of digits for numerical character references, as requested by CommonMark specification 0.29.
#78:
Fix link reference definition label matching for Unicode characters where
the folding mapping leads to multiple codepoints, as e.g. in ẞ
-> SS
.
#83: Fix recognition of an empty blockquote which interrupts a paragraph.
Changes:
Make permissive URL autolink and permissive WWW autolink extensions stricter.
This brings the behavior closer to GFM and mitigates risk of false positives.
In particular, the domain has to contain at least one dot and parenthesis
can be part of the link destination only if (
and )
are balanced.
Fixes:
#73: Some raw HTML inputs could lead to quadratic parsing times.
#74: Fix input leading to a crash. Found by fuzzing.
#76: Fix handling of parenthesis in some corner cases of permissive URL autolink and permissive WWW autolink extensions.
Changes:
Changes mandated by CommonMark specification 0.29.
Most importantly, the white-space trimming rules for code spans have changed. At most one space/newline is trimmed from beginning/end of the code span (if the code span contains some non-space contents, and if it begins and ends with space at the same time). In all other cases the spaces in the code span are now left intact.
Other changes in behavior are in corner cases only. Refer to CommonMark 0.29 notes for more info.
Fixes:
#68: Some specific HTML blocks were not recognized when EOF follows without any end-of-line character.
#69: Strike-through span not working correctly when its opener mark is directly followed by other opener mark; or when other closer mark directly precedes its closer mark.
Fixes:
#58, #59, #60, #63, #66: Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg for finding all those issues.
#61:
Flag MD_FLAG_NOHTMLSPANS
erroneously affected also recognition of
CommonMark autolinks.
New features:
Add extension for GitHub-style task lists:
* [x] foo
* [x] bar
* [ ] baz
(It has to be explicitly enabled with MD_FLAG_TASKLISTS
.)
Added support for building as a shared library. On non-Windows platforms,
this is now default behavior; on Windows static library is still the default.
The CMake option BUILD_SHARED_LIBS
can be used to request one or the other
explicitly.
Contributed by Lisandro Damián Nicanor Pérez Meyer.
Renamed structure MD_RENDERER
to MD_PARSER
and refactorize its contents
a little bit. Note this is source-level incompatible and initialization code
in apps may need to be updated.
The aim of the change is to be more friendly for long-term ABI compatibility we shall maintain, starting with this release.
Added CHANGELOG.md
(this file).
Make sure md_process_table_row()
reports the same count of table cells for
all table rows, no matter how broken the input is. The cell count is derived
from table underline line. Bogus cells in other rows are silently ignored.
Missing cells in other rows are reported as empty ones.
Fixes:
CID 1475544:
Calling md_free_attribute()
on uninitialized data.
#47:
Using bad offsets in md_is_entity_str()
, in some cases leading to buffer
overflow.
#51:
Segfault in md_process_table_cell()
.
#53:
With MD_FLAG_PERMISSIVEURLAUTOLINKS
or MD_FLAG_PERMISSIVEWWWAUTOLINKS
we could generate bad output for ordinary Markdown links, if a non-space
character immediately follows like e.g. in [link](http://github.com)X
.
This was the last version before the changelog has been added.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604
# MD4C Change Log
## Next Version (Work in Progress)
Fixes:
- [#236](https://github.com/mity/md4c/issues/236):
Fix quadratic time behavior caused by one-by-one walking over block lines
instead of calling `md_lookup_line()`.
- [#238](https://github.com/mity/md4c/issues/238):
Fix quadratic time and output size behavior caused by malicious misuse of
link reference definitions.
- [#242](https://github.com/mity/md4c/issues/242):
The strike-through extension (with flag `MD_FLAG_STRIKETHROUGH`) now follows
same logic as other emphasis spans in respect to punctuation character and
word boundaries.
- [#248](https://github.com/mity/md4c/issues/248):
Fix handling tab when removing trailing whitespace, especially in connection
with ATX headers.
## Version 0.5.2
Changes:
* Changes mandated by CommonMark specification 0.31:
- The specification expands set of Unicode characters seen by Markdown
parser as a punctuation. Namely all Unicode general categories P
(punctuation) and S (symbols) are now seen as such.
- The definition of HTML comment has been changed so that `<!-->` and
`<!--->` are also recognized as HTML comments.
- HTML tags recognized as HTML block starting condition of type 4 has been
updated, namely a tag `<source>` has been removed, whereas `<search>`
added.
Refer to [CommonMark 0.31.2](https://spec.commonmark.org/0.31.2/) for full
specification.
Fixes:
- [#230](https://github.com/mity/md4c/issues/230):
The fix [#223](https://github.com/mity/md4c/issues/223) in 0.5.1 release
was incomplete and one corner case remained unfixed. This is now addressed.
- [#231](https://github.com/mity/md4c/issues/231):
`md2html --full-html` now emits `<meta charset="UTF-8">` in the HTML header.
## Version 0.5.1
Changes:
* LaTeX math extension (`MD_FLAG_LATEXMATHSPANS`) now requires that opener
mark is not immediately preceded with alpha-numeric character and similarly
that closer mark is not immediately followed with alpha-numeric character.
So for example `foo$ x + y = z $` is not recognized as LaTeX equation
anymore because there is no space between `foo` and the opening `$`.
* Table extension (`MD_FLAG_TABLES`) now recognizes only tables with no more
than 128 columns. This limit has been imposed to prevent a pathological
case of quadratic output size explosion which could be used as DoS attack
vector.
* We are now more strict with `MD_FLAG_PERMISSIVExxxAUTOLINKS` family of
extensions with respect to non-alphanumeric characters, with the aim to
mitigate false positive detections.
Only relatively few selected non-alphanumeric are now allowed in permissive
e-mail auto-links (`MD_FLAG_PERMISSIVEEMAILAUTOLINKS`):
- `.`, `-`, `_`, `+` in user name part of e-mail address; and
- `.`, `-`, `_` in host part of the e-mail address.
Similarly for URL and e-mail auto-links (`MD_FLAG_PERMISSIVEURLAUTOLINKS` and
`MD_FLAG_PERMISSIVEWWWAUTOLINKS`):
- `.`, `-`, `_` in host part of the URL;
- `/`, `.`, `-`, `_` in path part of the URL;
- `&`, `.`, `-`, `+`, `_`, `=`, `(`, `)` in the query part of the URL
(additionally, if present, `(` and `)` must form balanced pairs); and
- `.`, `-`, `+`, `_` in the fragment part of the URL.
Furthermore these characters (with some exceptions like where they serve as
delimiter characters, e.g. `/` for paths) are generally accepted only when
an alphanumeric character both precedes and follows them (i.e. these cannot
be "stacked" together).
Fixes:
* Fix several bugs where we haven't properly respected already resolved spans
of higher precedence level in handling of permissive auto-links extensions
(family of `MD_FLAG_PERMISSIVExxxAUTOLINKS` flags), LaTeX math extension
(`MD_FLAG_LATEXMATHSPANS`) and wiki-links extension (`MD_FLAG_WIKILINKS`)
of the form `[[label|text]]` (with pipe `|`). In some complex cases this
could lead to invalid internal parser state and memory corruption.
Identified with [OSS-Fuzz](https://github.com/google/oss-fuzz).
* [#222](https://github.com/mity/md4c/issues/222):
Fix strike-through extension (`MD_FLAG_STRIKETHROUGH`) which did not respect
same rules for pairing opener and closer marks as other emphasis spans.
* [#223](https://github.com/mity/md4c/issues/223):
Fix incorrect handling of new-line character just at the beginning and/or
end of a code span where we were not following CommonMark specification
requirements correctly.
## Version 0.5.0
Changes:
* Changes mandated by CommonMark specification 0.30.
Actually there are only very minor changes to recognition of HTML blocks:
- The tag `<textarea>` now triggers HTML block (of type 1 as per the
specification).
- HTML declaration (HTML block type 4) is not required to begin with an
upper-case ASCII character after the `<!`. Any ASCII character is now
allowed. Also it now doesn't require a whitespace before the closing `>`.
Other than that, the newest specification mainly improves test coverage and
clarifies its wording in some cases, without affecting the implementation.
Refer to [CommonMark 0.30](https://spec.commonmark.org/0.30/) for full
specification.
* Make Unicode-specific code compliant to Unicode 15.1.
* Update list of entities known to the HTML renderer from
https://html.spec.whatwg.org/entities.json.
New Features:
* Add extension allowing to treat all soft break as hard ones. It has to be
explicitly enabled with `MD_FLAG_HARD_SOFT_BREAKS`.
Contributed by [l-m](https://github.com/l1mey112).
* Structure `MD_SPAN_A_DETAIL` now has a new member `is_autolink`.
Contributed by [Jens Alfke](https://github.com/snej).
* `md2html` utility now supports command line options `--html-title` and
`--html-css`.
Contributed by [Andreas Baumann](https://github.com/andreasbaumann).
Fixes:
* [#163](https://github.com/mity/md4c/issues/163):
Make HTML renderer to emit `'\n'` after the root tag when in the XHTML mode.
* [#165](https://github.com/mity/md4c/issues/165):
Make HTML renderer not to percent-encode `'~'` in URLs. Although it does
work, it's not needed, and it can actually be confusing with URLs such as
`http://www.example.com/~johndoe/`.
* [#167](https://github.com/mity/md4c/issues/167),
[#168](https://github.com/mity/md4c/issues/168):
Fix multiple instances of various buffer overflow bugs, found mostly using
a fuzz testing. Contributed by [dtldarek](https://github.com/dtldarek) and
[Thierry Coppey](https://github.com/TCKnet).
* [#169](https://github.com/mity/md4c/issues/169):
Table underline now does not require 3 characters per table column anymore.
One dash (optionally with a leading or tailing `:` appended or prepended)
is now sufficient. This improves compatibility with the GFM.
* [#172](https://github.com/mity/md4c/issues/172):
Fix quadratic time behavior caused by unnecessary lookup for link reference
definition even if the potential label contains nested brackets.
* [#173](https://github.com/mity/md4c/issues/173),
[#174](https://github.com/mity/md4c/issues/174),
[#212](https://github.com/mity/md4c/issues/212),
[#213](https://github.com/mity/md4c/issues/213):
Multiple bugs identified with [OSS-Fuzz](https://github.com/google/oss-fuzz)
were fixed.
* [#190](https://github.com/mity/md4c/issues/190),
[#200](https://github.com/mity/md4c/issues/200),
[#201](https://github.com/mity/md4c/issues/201):
Multiple fixes of incorrect interactions of indented code block with a
preceding block.
* [#202](https://github.com/mity/md4c/issues/202):
We were not correctly calling `enter_block()` and `leave_block()` callbacks
if multiple HTML blocks followed one after another; instead previously
such blocks were merged into one.
(This may likely impact only applications interested in Markdown's AST,
and not just converting Markdown to other formats like HTML.)
* [#210](https://github.com/mity/md4c/issues/210):
The `md2html` utility now handles nested images with optional titles
correctly.
* [#214](https://github.com/mity/md4c/issues/214):
Tags `<h2>` ... `<h6>` incorrectly did not trigger HTML block.
* [#215](https://github.com/mity/md4c/issues/215):
The parser incorrectly did not accept optional tabs after setext header
underline.
* [#217](https://github.com/mity/md4c/issues/217):
The parser incorrectly resolved emphasis in some situations, if the emphasis
marks were enclosed by punctuation characters.
## Version 0.4.8
Fixes:
* [#149](https://github.com/mity/md4c/issues/149):
A HTML block started in a container block (and not explicitly finished in
the block) could eat 1 line of actual contents.
* [#150](https://github.com/mity/md4c/issues/150):
Fix `md2html` to output proper DOCTYPE and HTML tags when `--full-html`
command line options is used, accordingly to the expected output format
(HTML or XHTML).
* [#152](https://github.com/mity/md4c/issues/152):
Suppress recognition of a permissive autolink if it would otherwise form a
complete body of an outer inline link.
* [#153](https://github.com/mity/md4c/issues/153),
[#154](https://github.com/mity/md4c/issues/154):
Set `MD_BLOCK_UL_DETAIL::mark` and `MD_BLOCK_OL_DETAIL::mark_delimiter`
correctly, even when the blocks are nested at the same line in a complicated
ways.
* [#155](https://github.com/mity/md4c/issues/155):
Avoid reading 1 character beyond the input size in some complex cases.
## Version 0.4.7
Changes:
* Add `MD_TABLE_DETAIL` structure into the API. The structure describes column
count and row count of the table, and pointer to it is passed into the
application-provided block callback with the `MD_BLOCK_TABLE` block type.
Fixes:
* [#131](https://github.com/mity/md4c/issues/131):
Fix handling of a reference image nested in a reference link.
* [#135](https://github.com/mity/md4c/issues/135):
Handle unmatched parenthesis pairs inside a permissive URL and WWW auto-links
in a way more compatible with the GFM.
* [#138](https://github.com/mity/md4c/issues/138):
The tag `<tbody></tbody>` is now suppressed whenever the table has zero body
rows.
* [#139](https://github.com/mity/md4c/issues/139):
Recognize a list item mark even when EOF follows it.
* [#142](https://github.com/mity/md4c/issues/142):
Fix reference link definition label matching in a case when the label ends
with a Unicode character with non-trivial case folding mapping.
## Version 0.4.6
Fixes:
* [#130](https://github.com/mity/md4c/issues/130):
Fix `ISANYOF` macro, which could provide unexpected results when encountering
zero byte in the input text; in some cases leading to broken internal state
of the parser.
The bug could result in denial of service and possibly also to other security
implications. Applications are advised to update to 0.4.6.
## Version 0.4.5
Fixes:
* [#118](https://github.com/mity/md4c/issues/118):
Fix HTML renderer's `MD_HTML_FLAG_VERBATIM_ENTITIES` flag, exposed in the
`md2html` utility via `--fverbatim-entities`.
* [#124](https://github.com/mity/md4c/issues/124):
Fix handling of indentation of 16 or more spaces in the fenced code blocks.
## Version 0.4.4
Changes:
* Make Unicode-specific code compliant to Unicode 13.0.
New features:
* The HTML renderer, developed originally as the heart of the `md2html`
utility, is now built as a standalone library, in order to simplify its
reuse in applications.
* With `MD_HTML_FLAG_SKIP_UTF8_BOM`, the HTML renderer now skips UTF-8 byte
order mark (BOM) if the input begins with it, before passing to the Markdown
parser.
`md2html` utility automatically enables the flag (unless it is custom-built
with `-DMD4C_USE_ASCII`).
* With `MD_HTML_FLAG_XHTML`, The HTML renderer generates XHTML instead of
HTML.
This effectively means `<br />` instead of `<br>`, `<hr />` instead of
`<hr>`, and `<img ... />` instead of `<img ...>`.
`md2html` utility now understands the command line option `-x` or `--xhtml`
enabling the XHTML mode.
Fixes:
* [#113](https://github.com/mity/md4c/issues/113):
Add missing folding info data for the following Unicode characters:
`U+0184`, `U+018a`, `U+01b2`, `U+01b5`, `U+01f4`, `U+0372`, `U+038f`,
`U+1c84`, `U+1fb9`, `U+1fbb`, `U+1fd9`, `U+1fdb`, `U+1fe9`, `U+1feb`,
`U+1ff9`, `U+1ffb`, `U+2c7f`, `U+2ced`, `U+a77b`, `U+a792`, `U+a7c9`.
Due the bug, the link definition label matching did not work in the case
insensitive way for these characters.
## Version 0.4.3
New features:
* With `MD_FLAG_UNDERLINE`, spans enclosed in underscore (`_foo_`) are seen
as underline (`MD_SPAN_UNDERLINE`) rather than an ordinary emphasis or
strong emphasis.
Changes:
* The implementation of wiki-links extension (with `MD_FLAG_WIKILINKS`) has
been simplified.
- A noticeable increase of MD4C's memory footprint introduced by the
extension implementation in 0.4.0 has been removed.
- The priority handling towards other inline elements have been unified.
(This affects an obscure case where syntax of an image was in place of
wiki-link destination made the wiki-link invalid. Now *all* inline spans
in the wiki-link destination, including the images, is suppressed.)
- The length limitation of 100 characters now always applies to wiki-link
destination.
* Recognition of strike-through spans (with the flag `MD_FLAG_STRIKETHROUGH`)
has become much stricter and, arguably, reasonable.
- Only single tildes (`~`) and double tildes (`~~`) are recognized as
strike-through marks. Longer ones are not anymore.
- The length of the opener and closer marks have to be the same.
- The tildes cannot open a strike-through span if a whitespace follows.
- The tildes cannot close a strike-through span if a whitespace precedes.
This change follows the changes of behavior in cmark-gfm some time ago, so
it is also beneficial from compatibility point of view.
* When building MD4C by hand instead of using its CMake-based build, the UTF-8
support was by default disabled, unless explicitly asked for by defining
a preprocessor macro `MD4C_USE_UTF8`.
This has been changed and the UTF-8 mode now becomes the default, no matter
how `md4c.c` is compiled. If you need to disable it and use the ASCII-only
mode, you have explicitly define macro `MD4C_USE_ASCII` when compiling it.
(The CMake-based build as provided in our repository explicitly asked for
the UTF-8 support with `-DMD4C_USE_UTF8`. I.e. if you are using MD4C library
built with our vanilla `CMakeLists.txt` files, this change should not affect
you.)
Fixes:
* Fixed some string length handling in the special `MD4C_USE_UTF16` build.
(This does not affect you unless you are on Windows and explicitly define
the macro when building MD4C.)
* [#100](https://github.com/mity/md4c/issues/100):
Fixed an off-by-one error in the maximal length limit of some segments
of e-mail addresses used in autolinks.
* [#107](https://github.com/mity/md4c/issues/107):
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
length of the opener and closer differs, as in `***foo *bar baz***`.
## Version 0.4.2
Fixes:
* [#98](https://github.com/mity/md4c/issues/98):
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
length of the opener and closer differs, as in `**a *b c** d*`.
## Version 0.4.1
Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing
this is the only change on 0.4.1.
## Version 0.4.0
New features:
* With `MD_FLAG_LATEXMATHSPANS`, LaTeX math spans (`$...$`) and LaTeX display
math spans (`$$...$$`) are now recognized. (Note though that the HTML
renderer outputs them verbatim in a custom `<x-equation>` tag.)
Contributed by [Tilman Roeder](https://github.com/dyedgreen).
* With `MD_FLAG_WIKILINKS`, Wiki-style links (`[[...]]`) are now recognized.
(Note though that the HTML renderer renders them as a custom `<x-wikilink>`
tag.)
Contributed by [Nils Blomqvist](https://github.com/niblo).
Changes:
* Parsing of tables (with `MD_FLAG_TABLES`) is now closer to the way how
cmark-gfm parses tables as we do not require every row of the table to
contain a pipe `|` anymore.
As a consequence, paragraphs now cannot interrupt tables. A paragraph which
follows the table has to be delimited with a blank line.
Fixes:
* [#94](https://github.com/mity/md4c/issues/94):
`md_build_ref_def_hashtable()`: Do not allocate more memory than strictly
needed.
* [#95](https://github.com/mity/md4c/issues/95):
`md_is_container_mark()`: Ordered list mark requires at least one digit.
* [#96](https://github.com/mity/md4c/issues/96):
Some fixes for link label comparison.
## Version 0.3.4
Changes:
* Make Unicode-specific code compliant to Unicode 12.1.
* Structure `MD_BLOCK_CODE_DETAIL` got new member `fenced_char`. Application
can use it to detect character used to form the block fences (`` ` `` or
`~`). In the case of indented code block, it is set to zero.
Fixes:
* [#77](https://github.com/mity/md4c/issues/77):
Fix maximal count of digits for numerical character references, as requested
by CommonMark specification 0.29.
* [#78](https://github.com/mity/md4c/issues/78):
Fix link reference definition label matching for Unicode characters where
the folding mapping leads to multiple codepoints, as e.g. in `ẞ` -> `SS`.
* [#83](https://github.com/mity/md4c/issues/83):
Fix recognition of an empty blockquote which interrupts a paragraph.
## Version 0.3.3
Changes:
* Make permissive URL autolink and permissive WWW autolink extensions stricter.
This brings the behavior closer to GFM and mitigates risk of false positives.
In particular, the domain has to contain at least one dot and parenthesis
can be part of the link destination only if `(` and `)` are balanced.
Fixes:
* [#73](https://github.com/mity/md4c/issues/73):
Some raw HTML inputs could lead to quadratic parsing times.
* [#74](https://github.com/mity/md4c/issues/74):
Fix input leading to a crash. Found by fuzzing.
* [#76](https://github.com/mity/md4c/issues/76):
Fix handling of parenthesis in some corner cases of permissive URL autolink
and permissive WWW autolink extensions.
## Version 0.3.2
Changes:
* Changes mandated by CommonMark specification 0.29.
Most importantly, the white-space trimming rules for code spans have changed.
At most one space/newline is trimmed from beginning/end of the code span
(if the code span contains some non-space contents, and if it begins and
ends with space at the same time). In all other cases the spaces in the code
span are now left intact.
Other changes in behavior are in corner cases only. Refer to [CommonMark
0.29 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.29)
for more info.
Fixes:
* [#68](https://github.com/mity/md4c/issues/68):
Some specific HTML blocks were not recognized when EOF follows without any
end-of-line character.
* [#69](https://github.com/mity/md4c/issues/69):
Strike-through span not working correctly when its opener mark is directly
followed by other opener mark; or when other closer mark directly precedes
its closer mark.
## Version 0.3.1
Fixes:
* [#58](https://github.com/mity/md4c/issues/58),
[#59](https://github.com/mity/md4c/issues/59),
[#60](https://github.com/mity/md4c/issues/60),
[#63](https://github.com/mity/md4c/issues/63),
[#66](https://github.com/mity/md4c/issues/66):
Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg
for finding all those issues.
* [#61](https://github.com/mity/md4c/issues/59):
Flag `MD_FLAG_NOHTMLSPANS` erroneously affected also recognition of
CommonMark autolinks.
## Version 0.3.0
New features:
* Add extension for GitHub-style task lists:
```
* [x] foo
* [x] bar
* [ ] baz
```
(It has to be explicitly enabled with `MD_FLAG_TASKLISTS`.)
* Added support for building as a shared library. On non-Windows platforms,
this is now default behavior; on Windows static library is still the default.
The CMake option `BUILD_SHARED_LIBS` can be used to request one or the other
explicitly.
Contributed by Lisandro Damián Nicanor Pérez Meyer.
* Renamed structure `MD_RENDERER` to `MD_PARSER` and refactorize its contents
a little bit. Note this is source-level incompatible and initialization code
in apps may need to be updated.
The aim of the change is to be more friendly for long-term ABI compatibility
we shall maintain, starting with this release.
* Added `CHANGELOG.md` (this file).
* Make sure `md_process_table_row()` reports the same count of table cells for
all table rows, no matter how broken the input is. The cell count is derived
from table underline line. Bogus cells in other rows are silently ignored.
Missing cells in other rows are reported as empty ones.
Fixes:
* CID 1475544:
Calling `md_free_attribute()` on uninitialized data.
* [#47](https://github.com/mity/md4c/issues/47):
Using bad offsets in `md_is_entity_str()`, in some cases leading to buffer
overflow.
* [#51](https://github.com/mity/md4c/issues/51):
Segfault in `md_process_table_cell()`.
* [#53](https://github.com/mity/md4c/issues/53):
With `MD_FLAG_PERMISSIVEURLAUTOLINKS` or `MD_FLAG_PERMISSIVEWWWAUTOLINKS`
we could generate bad output for ordinary Markdown links, if a non-space
character immediately follows like e.g. in `[link](http://github.com)X`.
## Version 0.2.7
This was the last version before the changelog has been added.