Edit

kc3-lang/libxml2/doc/encoding.html

Branch :

  • Show log

    Commit

  • Author : Daniel Veillard
    Date : 2000-07-14 12:10:59
    Hash : be40c8b2
    Message : First version of the encoding doc, Daniel.

  • doc/encoding.html
  • <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
                          "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
    <head>
      <title>Libxml Internationalization support</title>
      <meta name="GENERATOR" content="amaya V3.2">
      <meta http-equiv="Content-Type" content="text/html">
    </head>
    
    <body bgcolor="#ffffff">
    <h1 align="center">Libxml Internationalization support</h1>
    
    <p>Location: <a
    href="http://xmlsoft.org/encoding.html">http://xmlsoft.org/encoding.html</a></p>
    
    <p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
    
    <p>Mailing-list archive:  <a
    href="http://xmlsoft.org/messages/">http://xmlsoft.org/messages/</a></p>
    
    <p>Version: $Revision$</p>
    
    <p>Table of Content:</p>
    <ol>
      <li><a href="#What">What does internationalization support mean ?</a></li>
      <li><a href="#internal">The internal encoding, how and why</a></li>
      <li><a href="#implemente">How is it implemented ?</a></li>
      <li><a href="#Default">Default supported encodings</a></li>
      <li><a href="#extend">How to extend the existing support</a></li>
    </ol>
    
    <h2><a name="What">What does internationalization support mean ?</a></h2>
    
    <p>XML was designed from the start to allow the support of any character set
    by using Unicode. Any conformant XML parser has to support the UTF-8 and
    UTF-16 default encodings which can both express the full unicode ranges. UTF8
    is a variable length encoding whose greatest point are to resuse the same
    emcoding for ASCII and to save space for Western encodings, but it is a bit
    more complex to handle in practice. UTF-16 use 2 bytes per characters (and
    sometimes combines two pairs), it makes implementation easier, but looks a bit
    overkill for Western languages encoding. Moreover the XML specification allows
    document to be encoded in other encodings at the condition that they are
    clearly labelled as such. For example the following is a wellformed XML
    document encoded in ISO-Latin 1 and using accentuated letter that we French
    likes for both markup and content:</p>
    <pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
    &lt;tr