md_analyze_permissive_email_autolink: Make it compatible with CMark-gfm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
diff --git a/md4c/md4c.c b/md4c/md4c.c
index ce7cd8d..5163b53 100644
--- a/md4c/md4c.c
+++ b/md4c/md4c.c
@@ -3381,45 +3381,28 @@ md_analyze_permissive_email_autolink(MD_CTX* ctx, int mark_index)
MD_MARK* closer;
OFF beg = opener->beg;
OFF end = opener->end;
- int right_dot_count = 0;
+ int dot_count = 0;
MD_ASSERT(CH(beg) == _T('@'));
- /* Accept any alphanumeric sequences delimited with dot before the '@'.
- * There must be a whitespace or start of line before it. */
- while(1) {
- while(beg > 0 && ISALNUM(beg-1))
- beg--;
+ /* Scan for name before '@'. */
+ while(beg > 0 && (ISALNUM(beg-1) || ISANYOF(beg-1, _T(".-_+"))))
+ beg--;
- if(beg > 1 && CH(beg-1) == _T('.') && ISALNUM(beg-2))
- beg -= 2;
- else if(beg == 0 || ISWHITESPACE(beg-1) || ISNEWLINE(beg-1))
- break;
- else
- return;
+ /* Scan for domain after '@'. */
+ while(end < ctx->size && (ISALNUM(end) || ISANYOF(end, _T(".-_")))) {
+ if(CH(end) == _T('.'))
+ dot_count++;
+ end++;
}
-
- /* Accept any alphanumeric sequences delimited with dot after the '@',
- * limiting the sequences length by 64 characters. */
- while(1) {
- OFF label_start = end;
- while(end + 1 < ctx->size && ISALNUM(end))
- end++;
- if(end - label_start > 63)
- return;
-
- if(end + 1 < ctx->size && CH(end) == _T('.') && ISALNUM(end+1)) {
- right_dot_count++;
- end += 2;
- } else if(right_dot_count > 0) {
- /* Although "user@machine" is technically correct e-mail address,
- * we request at least one dot, as in e.g. "user@machine.com" to
- * prevent some false positives with this very loose format. */
- break;
- } else {
- return;
- }
+ if(CH(end-1) == _T('.')) { /* Final '.' not part of it. */
+ dot_count--;
+ end--;
}
+ else if(ISANYOF2(end-1, _T('-'), _T('_'))) /* These are forbidden at the end. */
+ return;
+ if(CH(end-1) == _T('@') || dot_count == 0)
+ return;
/* Ok. Lets call it auto-link. Adapt opener and create closer to zero
* length so all the contents becomes the link text. */
diff --git a/test/permissive-email-autolinks.txt b/test/permissive-email-autolinks.txt
index 7d7f572..12e8786 100644
--- a/test/permissive-email-autolinks.txt
+++ b/test/permissive-email-autolinks.txt
@@ -1,9 +1,10 @@
# Permissive E-mail Autolinks
-With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive recognition
-of e-mail addresses and transforms them to autolinks, even if they do not exactly follow
-the syntax of autolink as specified in CommonMark specification.
+With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive
+recognition of e-mail addresses and transforms them to autolinks, even if they
+do not exactly follow the syntax of autolink as specified in CommonMark
+specification.
This is standard CommonMark e-mail autolink:
@@ -20,3 +21,30 @@ E-mail: john.doe@gmail.com
.
<p>E-mail: <a href="mailto:john.doe@gmail.com">john.doe@gmail.com</a></p>
````````````````````````````````
+
+`+` can occur before the `@`, but not after.
+
+```````````````````````````````` example
+hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is.
+.
+<p>hello@mail+xyz.example isn't valid, but <a href="mailto:hello+xyz@mail.example">hello+xyz@mail.example</a> is.</p>
+````````````````````````````````
+
+`.`, `-`, and `_` can occur on both sides of the `@`, but only `.` may occur at
+the end of the email address, in which case it will not be considered part of
+the address:
+
+```````````````````````````````` example
+a.b-c_d@a.b
+
+a.b-c_d@a.b.
+
+a.b-c_d@a.b-
+
+a.b-c_d@a.b_
+.
+<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a></p>
+<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a>.</p>
+<p>a.b-c_d@a.b-</p>
+<p>a.b-c_d@a.b_</p>
+````````````````````````````````