Hash :
4585088a
Author :
Date :
2020-11-13T10:16:34
md_analyze_permissive_url_autolink: Better GFM compatibility. The autolinks now allow unmatched parenthesis, only the trailing parenthesis closers are handled specially to deal with the situation the autolink is all inside an outer parenthesis. Somehow our tests were broken and avoided the cases with unmatched parenthesis pairs inside the auto-link. That's now fixed and in sync with GFM specs too. Fixes #135.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
# Permissive WWW Autolinks
With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS`, MD4C enables recognition of
autolinks starting with `www.`, even if they do not exactly follow the syntax
of autolink as specified in CommonMark specification.
These do not have to be enclosed in `<` and `>`, and they even do not need
any preceding scheme specification.
The WWW autolink will be recognized when the text `www.` is found followed by a
valid domain. A valid domain consists of segments of alphanumeric characters,
underscores (`_`) and hyphens (`-`) separated by periods (`.`). There must be
at least one period, and no underscores may be present in the last two segments
of the domain.
The scheme `http` will be inserted automatically:
```````````````````````````````` example
www.commonmark.org
.
<p><a href="http://www.commonmark.org">www.commonmark.org</a></p>
````````````````````````````````
After a valid domain, zero or more non-space non-`<` characters may follow:
```````````````````````````````` example
Visit www.commonmark.org/help for more information.
.
<p>Visit <a href="http://www.commonmark.org/help">www.commonmark.org/help</a> for more information.</p>
````````````````````````````````
We then apply extended autolink path validation as follows:
Trailing punctuation (specifically, `?`, `!`, `.`, `,`, `:`, `*`, `_`, and `~`)
will not be considered part of the autolink, though they may be included in the
interior of the link:
```````````````````````````````` example
Visit www.commonmark.org.
Visit www.commonmark.org/a.b.
.
<p>Visit <a href="http://www.commonmark.org">www.commonmark.org</a>.</p>
<p>Visit <a href="http://www.commonmark.org/a.b">www.commonmark.org/a.b</a>.</p>
````````````````````````````````
When an autolink ends in `)`, we scan the entire autolink for the total number
of parentheses. If there is a greater number of closing parentheses than
opening ones, we don't consider the last character part of the autolink, in
order to facilitate including an autolink inside a parenthesis:
```````````````````````````````` example
www.google.com/search?q=Markup+(business)
(www.google.com/search?q=Markup+(business))
.
<p><a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a></p>
<p>(<a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a>)</p>
````````````````````````````````
This check is only done when the link ends in a closing parentheses `)`, so if
the only parentheses are in the interior of the autolink, no special rules are
applied:
```````````````````````````````` example
www.google.com/search?q=(business))+ok
.
<p><a href="http://www.google.com/search?q=(business))+ok">www.google.com/search?q=(business))+ok</a></p>
````````````````````````````````
If an autolink ends in a semicolon (`;`), we check to see if it appears to
resemble an [entity reference][entity references]; if the preceding text is `&`
followed by one or more alphanumeric characters. If so, it is excluded from
the autolink:
```````````````````````````````` example
www.google.com/search?q=commonmark&hl=en
www.google.com/search?q=commonmark&hl;
.
<p><a href="http://www.google.com/search?q=commonmark&hl=en">www.google.com/search?q=commonmark&hl=en</a></p>
<p><a href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark</a>&hl;</p>
````````````````````````````````
`<` immediately ends an autolink.
```````````````````````````````` example
www.commonmark.org/he<lp
.
<p><a href="http://www.commonmark.org/he">www.commonmark.org/he</a><lp</p>
````````````````````````````````
## GitHub Issues
### [Issue 53](https://github.com/mity/md4c/issues/53)
```````````````````````````````` example
This is [link](www.github.com/).
.
<p>This is <a href="www.github.com/">link</a>.</p>
````````````````````````````````
```````````````````````````````` example
This is [link](www.github.com/)X
.
<p>This is <a href="www.github.com/">link</a>X</p>
````````````````````````````````