src/md4c.c


Log

Author Commit Date CI Message
Martin Mitas f46000c7 2024-01-24T09:49:59 Use UTF-8 in copyright notes.
Martin Mitas 2cb4f23f 2024-01-22T09:14:58 md_collect_marks: Improve pre-test for '.'.
Martin Mitas 23e7929b 2024-01-22T09:10:25 md_analyze_permissive_autolink: Check left boundary asap.
Martin Mitas fcd3ca13 2024-01-21T15:20:49 Fix source indentation.
Martin Mitas 83e093fb 2024-01-21T11:50:18 md_opener_stack: Mark the default branch of switch as unreachable. We were returning NULL previously, but that would lead to a crash anyway; all callsites expect to get their respective stack anyway and anything else would mean we are internally broken.
Martin Mitas 0672f27c 2024-01-21T11:45:02 md_process_table_row: Remove not needed freeing of ptr_stack. This is already handled universally in md_process_normal_block_contents() which is called from md_process_table_row() via md_process_table_cell().
Martin Mitas faf39849 2024-01-21T11:42:30 md_is_html_cdata: Remove not needed max_end shrinking. md_scan_for_html_closer() handles that internally.
Martin Mitas 65957f53 2024-01-19T10:37:33 Limit number of table columns to prevent explosion of output... with the input pattern in the form of geneated by this one-liner: $ python3 -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x\n" * N)' Here the amount of HTML otput grows with N^2.
Martin Mitas 70b247cf 2024-01-19T13:59:45 md_analyze_permissive_autolink: Accept path ending with '/'. Fixes #226.
Martin Mitas bbb43fe0 2024-01-18T17:30:44 Rename PUSH_MARK() to ADD_MARK(). This is to pevent confusion with opener stack operations.
Martin Mitáš 246e105d 2024-01-18T17:22:54 Refactor mark chains. (#224) * Rename MD_MARKCHAIN to MD_MARKSTACK to indicate its semantics much clearer. * Simplify its implementation (single-linked list instead of double-linked one). * Where it was reused (misused?) for other, unrelated stuff, with other semantics, it's now done explicitly. (i.e. got rid of TABLECELLBOUNDARIES). * PTR_CHAIN still uses the stack (we don't care about order there), but it got separated from the array of ordinary opener stacks at least.
Martin Mitas 601ff053 2024-01-18T16:28:16 Fix handling new line at beginning/end of a code span. Fixes #223.
Martin Mitas c076698a 2024-01-18T16:10:46 md_collect_marks: Get rid of helper vars line_beg, line_end.
Martin Mitas 08728831 2024-01-18T13:39:48 md_rollback: Update outdated comment.
Martin Mitas d40458b5 2024-01-18T12:39:36 md_rollback: Simplify the function. We assume the provided opener_index and closer_index do not cross boundaries of already resolved ranges. Previously the function tried deal with such situation but this code should not be needed, it was very complex and, most importantly, broken anyway.
Martin Mitas a08f6a05 2024-01-18T12:29:31 Improve/fix latex math extension. To mitigate false positives: * We accept $ and $$ as a potential opener only if it's not preceded with alnum char. * Similarly closer cannot be followed with alnum char. * We now also match closer with last preceding pontential opener, not the first one. (And to avoid nesting, any previous openers are ignored.) * Also revert an unintended change in 3fc207affaba313cc1f4ef3b4e9e57df89b0e028 which allowed keeping nested resolved marks in it.
Martin Mitas 3fc207af 2024-01-18T10:56:12 Handle e-mail autolinks in a safer way. For standard e-mail autolinks <user@host> we internally transformed '<' into '@' (permissive e-mail autolink) to unify handling of missing "mailto:" needed into the destination attribute. This is now not true anymore and we handle that specially. It is actually what has bitten us in https://oss-fuzz.com/testcase-detail/4815193402048512. Even though this isn't the root cause of the issue, this change makes the code safer and easier to understand.
Martin Mitas 4728cd98 2024-01-17T16:04:14 md_analyze_tilde: Pop from chain tail like other emphasis. The function incorrectly used header from the head, leading to wrong result (incompatible with e.. GFM) but even worse to bad internal state md_rollback() is then potentially unable to solve. Fixes #222.
Martin Mitas 006611b9 2024-01-17T15:03:00 md_analyze_dollar: Call md_rollback() only when resolving. Fixes #221.
Martin Mitáš d955c495 2024-01-17T02:48:57 Rework permissive autolinks. (#220) * We have now dedicated run over the inline marks for them. * We check more throughly whether it really looks as an URL or e-mail address. The old implementation recognized even heavily broken ones. * This allows us to be much more careful in order not to cross already resolved marks. * Share substantial parts of the code between all three types of the permissive autolinks (URL, WWW, e-mail). * Merge their tests into one file, spec-permissive-autolinks.txt. * Add one pathological case which triggered quadratic behavior in the old implementation.
Martin Mitas 0ac9f35d 2024-01-16T09:53:41 md_analyze_marks: Skip analyzing marks if... they fall into range of previously analyzed mark. That can happen if the previous mark has been expanded. That typically happens for permissive auto-links. This fixes one case of pathologic input leading to quadratic behavior.
Martin Mitas b6777d78 2024-01-16T01:30:59 Wiki-links extension: Search for '|' only outside resolved ranges.
Martin Mitas afeece29 2024-01-15T23:03:21 Fix line indentation calculation when interrupting list... due the "list item cannot begin with two blank lines" rule.
Martin Mitas 78829427 2024-01-13T02:59:35 Fix some emphasis parsing issues. * We incorrectly applied the infamous rule of three only to asterisk-encoded emphasis, it has to be applied to underscore as well. * We incorrectly applied the rule of three only if the opener and/or closer was inside a word. It has also to be applied if the mark is both preceded and followed by punctuation. Fixes #217.
Martin Mitas 5592352f 2024-01-13T00:30:08 HTML declaration doesn't require whitespace before the closer. Fixes #216.
Martin Mitas 7497ea92 2024-01-13T00:17:08 Allow tabs after setext header underline. Fixes #215.
Martin Mitas 2750d9fa 2024-01-13T00:02:12 Add tags <h2>...<h6> as triggers for HTML block type 6. Fixes #214.
Martin Mitas 4a64fee2 2024-01-11T13:12:55 Bump copyright years.
Martin Mitas 5204c30d 2024-01-11T12:41:40 md_is_html_block_end_condition: Fix return value.
Martin Mitas f32a861e 2024-01-11T12:20:23 md_end_current_block: Fix EOL handling.
Martin Mitas 76abc636 2024-01-11T12:09:22 md_is_html_block_end_condition: Fix EOF handling.
Martin Mitas 4a7246de 2024-01-11T11:55:38 md_is_inline_link_spec: Fix EOL checking.
Martin Mitas c6535ff3 2024-01-10T21:39:24 Fix eof handling in a middle of task list item.
Martin Mitas ebbb12e5 2024-01-10T20:29:02 Revert most of PR #168 i.e of the commit f436c3029850c138e54a0de055d61db45130409e. It added bunch of checks all over the place, but most of them shouldn't be needed: If they are true, our internal state is already broken. In other words, those checks are hiding real bugs and making debugging harder. Hopefully the underlying bugs are already fixed in some of previous commits addressing some fuzzing issues, like these: * d775b5103ee130edbd808e21d1da6ca75f76a558 * c6942ef03ed46a67bd9b3af8ce1eefd781622777
Martin Mitas d775b510 2024-01-10T18:33:32 More fixes of TABLECELLBOUNDARIES chain handling. Fixes #213.
Martin Mitas c6942ef0 2024-01-10T17:31:55 Treat TABLECELLBOUNDARIES chain as special one. It's not an ordinary openers chain as (most of) the others, and md_rollback() must not touch it. Fixes #212.
Jens Alfke efcfd7e7 2024-01-09T02:32:17 Added MD_SPAN_A_DETAIL.is_autolink (#181) This allows the processor to tell whether an <A> tag is the result of an autolink, and customize its output. For example, I want to emit an autolink of an image URL as an <IMG> tag, and an autolink of a YouTube URL as a video embed.
Martin Mitas 61949ee9 2024-01-09T02:08:48 Update to Unicode 15.1.
Martin Mitas 38303af3 2024-01-09T00:01:35 Make md_is_html_block_end_condition() reuse the same data... ... as md_is_html_block_start_condition() for the type 1 so we make all tags are used consistently there. Fixes #207.
Martin Mitas 319631f6 2024-01-08T21:52:30 Don't merge multiple HTML blocks together. Fixes #202.
l-m 6ef3be6e 2024-01-08T20:09:57 `MD_FLAG_HARD_SOFT_BREAKS` (#193)
step f554bf11 2024-01-08T20:55:54 Don't trim HTML block lines (MD_LINE_HTML) (#206) Markdown 0.30 doesn't mandate right-trimming the contents of HTML lines. Doing so is more work and breaks output compatibility with cmark, tested with https://github.com/commonmark/cmark/commit/9393560.
Martin Mitas 132c29dc 2024-01-08T19:31:37 Allow indented code block to follow any block except paragraph without a blank line. Fixes #200.
Martin Mitas 601c8ab7 2024-01-08T19:06:04 Restore parent's block indentation when interruping a list item with double blank line. Fixes #190.
Martin Mitas 28f253d7 2024-01-08T18:18:51 Fix some gcc warnings with -pedantic. Fixes #187.
Martin Mitas f7c8db75 2022-01-14T11:04:02 md_rollback: Fix dummization of virtual closers. Fixes #173.
Martin Mitas 6abb7789 2022-01-14T10:13:28 Remove debug messages left by mistake in the previous commit.
Martin Mitas 62b60979 2022-01-14T10:00:09 Reset TABLECELLBOUNDARIES with ordinary opener chains. This is needed because special handling of '|' is now done also if the wiki-links extension is enabled so the chain is populated even with that extension. Fixes #174.
Martin Mitas db9ab417 2022-01-12T16:16:00 Improve wiki-link parsing. * md_rollback: Restore dummy marks changed to virtual zero-length closers. * md_analyze_links: Be more careful in how we rollback contents of a full wiki link (`[[destination|label]]`). The destination has to be rollbacked completely (MD_ROLBACK_ALL) while the label only with MD_ROLLBACK_CROSSING. Fixes #173.
Martin Mitas 8dd35762 2022-01-11T20:53:04 md_analyze_dollar: Simplify the function.
Martin Mitas 4358c40a 2022-01-11T10:28:06 md_lookup_line: Advance to the next line even if the offset... falls into a gap between two lines, instead of returning NULL. Fixes NULL dereference in md_is_link_reference(). This was a regression in 2e9b13cc512b5984b010a7934253702a6763f4f7.
Martin Mitas c058e82c 2022-01-10T12:34:57 md_is_table_underline: Fix detection by the end of file. This was a regression in a8bb4d3020eb1cfa07f01241c2aa668d91011cb5.
Martin Mitas b42e7f5c 2022-01-10T11:41:25 md_resolve_links: Avoid link ref. def. lookup if... if we know that the bracket pair contains nested brackets. That makes the label invalid anyway, therefore we know that there is no link ref. def. to be found anyway. In case of heavily nested bracket pairs, the lookup could lead to quadratic parsing times. Fixes #172.
Martin Mitas 2e9b13cc 2022-01-10T03:10:43 md_lookup_line: New function. The function performs a binary search over array of MD_LINE structs to find a line the given offset lives on. Replaced few linear scans for such lines with a call to this function.
Thierry Coppey f436c302 2022-01-06T16:21:51 Fix buffer overflows and other errors found with fuzzying. (#168) Fix multiple buffer overflow on input found with fuzzying.
Martin Mitáš eeb32ecc 2022-01-06T16:16:45 Merge pull request #167 from dtldarek/master Two buffer overflow fixes.
Martin Mitas a8bb4d30 2022-01-06T16:01:55 md_is_table_underline: Remove requirement for minimal length of a cell underline. Fixes #169.
dtldarek 260cd339 2021-08-25T15:02:38 Fix buffer overflow on input found with fuzzying (in c-string format): "\n# h1\nc hh##e2ked\n\n A | rong__ ___strong \u0000\u0000\u0000\u0000\u0000\u0000\a\u0000\u0000\u0000\u0000\n# h1\nh# #2\n### h3\n#### h4\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\\\n##### h5\n#*#####\u0000\n6"
dtldarek 933388a6 2021-08-25T14:41:49 This is a fix for a buffer overflow that happens on input found with fuzzying (in c-string format): "\xA9##r[](r[](".
Martin Mitas 82b226ff 2021-06-27T18:42:09 md_is_html_block_start_condition: Accept lower-case HTML declaration. The change is mandated by the spec v. 0.30.
Martin Mitas d50a0142 2021-06-27T18:31:02 md_is_html_block_start_condition: Update for 0.30. The spec. 0.30 adds the tag <textarea> into the list if HTML blocks start condition type 1.
Kai Koehne e8285942 2021-06-14T09:47:17 Fix MSVC compiler level 3 warnings (#162) Fix various C4244 warnings with the MSVC compiler for 64 bit
Martin Mitas b2ee4b19 2021-04-14T18:27:19 md_resolve_links: Fix the test for the nested autolink covering whole link text. This fixes the fix for #152.
Martin Mitas bcb55d0d 2021-04-14T09:18:09 md_resolve_links: Suppress bogus nested permissive autolink. Fixes #152.
Martin Mitas 4fc808d8 2021-03-29T12:51:48 md_analyze_line: Avoid reading 1 byte beyond the input size. Fixes #155.
Martin Mitas aa654230 2021-03-22T14:00:35 md_enter_child_containers: Propagate list mark character properly. Fixes #153, #154.
Martin Mitas fe2f2427 2021-02-11T11:35:54 Fix copy&paster error in a comment.
Martin Mitas fd7b5fe0 2021-02-05T21:40:47 md_analyze_line: Fix implicit ending of HTML blocks... ... when the HTML block is not explicitly ended (before the enclosing container block ends). Fixes #149.
Martin Mitas 9ba57ccb 2020-12-14T19:53:58 md_link_label_cmp_load_fold_info: Remove a bogus code. The input into the function is already guaranted to not have a new line characters. (And handling of them in the function was broken anyway.)
Martin Mitas 5a44e327 2020-12-14T18:59:56 md_link_label_cmp: Fix the loop end condition. The old version likely could stop prematurely in a corner case when there was a Unicode character at the end of the either string, which maps into multiple fold info codepoints. Fixes #142.
Martin Mitas d4a78622 2020-12-14T18:49:35 Minor cleanup.
Martin Mitas 701a0626 2020-12-14T18:45:54 Make MD_UNICODE_FOLD_INFO::n_codepoints unsigned.
Giuseppe D'Angelo a45f839b 2020-12-14T12:21:50 Fix mixed signed/unsigned comparisons Force both operands to unsigned. n_codepoints does not seem to ever contain negative offsets anyhow, should it actually be unsigned?
Giuseppe D'Angelo 6dd64346 2020-12-14T01:40:40 Silence "unused parameter" warnings Merely added a suitable macro. Didn't refactor any code to actually figure out why the parameters were not used.
Giuseppe D'Angelo 569defae 2020-12-14T01:25:26 Silence -Wimplicit-fallthrough warnings Use a macro that dispatches to the compiler-specific magic to silence implicit fallthrough warnings when the fallthrough was actually intended. The code already featured comments, so these are actually safe to place. (Unfortunately, Clang does not recognize any comment as "fall through" comment, and GCC only recognizes some variations of "fall through", not "pass through". Moreover, one of the comments replaced here had a typo...)
Martin Mitas 26003b88 2020-12-04T20:42:22 md_is_container_mark: Recognize list item marks just before EOF. We were recognizing the list item marks when a new line or a blank character follows. However, given end-of-file means implicitly also an end-of-line, we should recognize in that situation too. Fixes #139.
Martin Mitas 3254b7cb 2020-11-13T12:02:39 md_process_table_block_contents: Suppress empty TBODY block generation. When the table has no body rows, do not call the callback with MD_BLOCK_TBODY events. Fixes #138.
Martin Mitas a997cb21 2020-10-18T09:34:10 Add MD_BLOCK_TABLE_DETAIL. This allows renderers to have the info about table dimension (table column and row count) in advance and e.g. simplify their memory allocation strategy.
Martin Mitas 4585088a 2020-11-13T10:16:34 md_analyze_permissive_url_autolink: Better GFM compatibility. The autolinks now allow unmatched parenthesis, only the trailing parenthesis closers are handled specially to deal with the situation the autolink is all inside an outer parenthesis. Somehow our tests were broken and avoided the cases with unmatched parenthesis pairs inside the auto-link. That's now fixed and in sync with GFM specs too. Fixes #135.
Martin Mitas c3a18d55 2020-11-13T09:27:10 md_collect_marks: continue -> break Does not cause any change in behavior: we just avoid needless loop iterations now.
Martin Mitas baa1dd06 2020-11-09T16:02:06 Fix some English wording in comments.
Rasmus Andersson 125e8e03 2020-10-18T10:18:11 Initializes an uninitilized variable in md_analyze_emph Fixes the following, reported by clang analysis: src/md4c.c:3729:61: warning: variable 'opener_index' may be uninitialized when used here [-Wconditional-uninitialized] MD_MARKCHAIN* opener_chain = md_mark_chain(ctx, opener_index); ^~~~~~~~~~~~ src/md4c.c:3686:25: note: initialize the variable 'opener_index' to silence this warning int opener_index; ^ = 0
Rasmus Andersson 1a2f4816 2020-10-18T10:56:49 Adds missing field initializers (undefined behavior) src/md4c.c:5667:72: warning: missing field 'beg' initializer [-Wmissing-field-initializers] static const MD_LINE_ANALYSIS md_dummy_blank_line = { MD_LINE_BLANK, 0 };
Martin Mitas 002f76c9 2020-10-18T09:37:45 md_resolve_links: Skip [...] used as a reference link/image label. Fixes #131.
Martin Mitas 22ca89a3 2020-09-29T21:33:43 Fix ISANYOF encountering a zero byte in the input. When it happened, it could lead to unexpected results, including broken internal state of the parser. Fixes #130.
Martin Mitas 67214417 2020-08-05T10:53:33 Make mark_chain[] helper macro definitions safer.
Martin Mitas 70d0ef7c 2020-08-05T09:18:41 Avoid simple {0} to initialize a more complex object. Should fix #125.
Martin Mitas c501c891 2020-07-30T10:13:05 Fix spelling of "than" in many occurances. I often spell it errorneously as "then". Doing this mistake way too often when typing fast.
Martin Mitas c595c2ed 2020-07-30T08:38:19 md_process_verbatim_block_contents: Fix off by 1 error. This caused outputting wrong indentation inside a fenced code blocks for lines indented with mor ethan 16 spaces. Fixes #124.
Martin Mitas 72dad97e 2020-05-20T16:44:07 scripts/build_folding_map.py: Handle properly "ranges" of length 2. Update the data structures in md_get_unicode_fold_info() to reflect the update in the script and handle the previously omitted characters. Fixes #113.
Dmitry Atamanov 3d64d6be 2020-05-08T02:13:55 Update to Unicode 13.0 (#111)
Martin Mitas 7f2d880f 2019-08-09T09:50:24 Refactor dir structure. We place all the sources in the single directory in order to not having many dirs with too few sources.