Improve coverage testing of UTF-8 routines.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
diff --git a/test/coverage.txt b/test/coverage.txt
index 4b51ef6..146210c 100644
--- a/test/coverage.txt
+++ b/test/coverage.txt
@@ -41,12 +41,97 @@ Ditto for Unicode punctuation (here U+00A1).
## `md_decode_utf8__()` and `md_decode_utf8_before__()`
+### Alphanumerical Character (i.e. not whitespace, not punctuation)
+
+Non-whitespace & non-punctuation characters below suppress `_` from being
+recognized as an emphasis because `_` should be seen as in-word character:
+
+Example of 1-byte UTF-8 sequence (U+0058):
+```````````````````````````````` example
+X__foo__X
+.
+<p>X__foo__X</p>
+````````````````````````````````
+
+Example of 2-byte UTF-8 sequence (U+0158):
+```````````````````````````````` example
+Ř__foo__Ř
+.
+<p>Ř__foo__Ř</p>
+````````````````````````````````
+
+Example of 3-byte UTF-8 sequence (U+0BA3):
+```````````````````````````````` example
+ண__foo__ண
+.
+<p>ண__foo__ண</p>
+````````````````````````````````
+
+Example of 4-byte UTF-8 sequence (U+13142):
+```````````````````````````````` example
+𓅂__foo__𓅂
+.
+<p>𓅂__foo__𓅂</p>
+````````````````````````````````
+
+### Whitespace character
+
+Whitespace on the other hand should not suppress `_`:
+
+Example of 1-byte UTF-8 sequence (U+0009):
+```````````````````````````````` example
+x→__foo__→
+.
+<p>x <strong>foo</strong></p>
+````````````````````````````````
+(The initial `x` to suppress indented code block.)
+
+Example of 2-byte UTF-8 sequence (U+00A0):
+```````````````````````````````` example
+ __foo__
+.
+<p> <strong>foo</strong> </p>
+````````````````````````````````
+
+Example of 3-byte UTF-8 sequence (U+2000):
+```````````````````````````````` example
+ __foo__
+.
+<p> <strong>foo</strong> </p>
+````````````````````````````````
+
+(AFAIK, there is no 4-byte UTF-8 whitespace.)
+
+### Punctuation character
+
+Punctuation also should not suppress `_`:
+
+Example of 1-byte UTF-8 sequence (U+002E):
+```````````````````````````````` example
+.__foo__.
+.
+<p>.<strong>foo</strong>.</p>
+````````````````````````````````
+
+Example of 2-byte UTF-8 sequence (U+00B7):
+```````````````````````````````` example
+·__foo__·
+.
+<p>·<strong>foo</strong>·</p>
+````````````````````````````````
+
+Example of 3-byte UTF-8 sequence (U+0C84):
+```````````````````````````````` example
+಄__foo__಄
+.
+<p>಄<strong>foo</strong>಄</p>
+````````````````````````````````
+
+Example of 4-byte UTF-8 sequence (U+1039F):
```````````````````````````````` example
-á*Á (U+00E1, i.e. two byte UTF-8 sequence)
- * (U+2000, i.e. three byte UTF-8 sequence)
+𐎟__foo__𐎟
.
-<p>á*Á (U+00E1, i.e. two byte UTF-8 sequence)
- * (U+2000, i.e. three byte UTF-8 sequence)</p>
+<p>𐎟<strong>foo</strong>𐎟</p>
````````````````````````````````