html: Add some comments regarding HTML5 serialization It seems that the specification of the HTML output method in XSLT 1.0 had a lot of influence on how the HTML serializer in libxml2 ended up: https://www.w3.org/TR/xslt-10/#section-HTML-Output-Method There are two remaining behaviors suggested by XSLT 1.0 that don't match the HTML5 fragment serialization algorithm: We escape non-ASCII characters in URI attributes (the list of which is probably outdated). This was originally recommended in appendix B of the HTML 4.01 spec, but only for user agents: https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.1 From my experience, any tool that processes HTML should escape as little as possible. For example, we used to escape many more characters which are invalid in URIs, but often used in template languages. (Note that we still escape whitespace and control chars.) Nevertheless, I guess that some libxslt users continue to expect this behavior from libxml2. Then we collapse Boolean attributes using an outdated list. This is mostly a cosmetic issue, but a somewhat important one for libxslt users. We probably need a serialization option for the xmlsave module that enables fully HTML5-conformant output.