Branch
Hash :
7c11da2d
Author :
Date :
2024-06-27T12:47:47
tests: Clarify licence of test/intsubset2.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
SAX.setDocumentLocator()
SAX.startDocument()
SAX.comment(
Copyright (C) Electronic Dictionary Research and Development Group
Released under Creative Commons Attribution-ShareAlike Licence (V4.0)
This file only contains the kanjidic2 DTD without the actual database.
http://nihongo.monash.edu/kanjidic2/index.html
http://www.edrdg.org/edrdg/licence.html
)
SAX.internalSubset(kanjidic2, , )
SAX.comment( Version 1.3
This is the DTD of the XML-format kanji file combining information from
the KANJIDIC and KANJD212 files. It is intended to be largely self-
documenting, with each field being accompanied by an explanatory
comment.
The file covers the following kanji:
(a) the 6,355 kanji from JIS X 0208;
(b) the 5,801 kanji from JIS X 0212;
(c) the 3,625 kanji from JIS X 0213 as follows:
(i) the 2,741 kanji which are also in JIS X 0212 have
JIS X 0213 code-points (kuten) added to the existing entry;
(ii) the 884 "new" kanji have new entries.
At the end of the explanation for a number of fields there is a tag
with the format [N]. This indicates the leading letter(s) of the
equivalent field in the KANJIDIC and KANJD212 files.
The KANJIDIC documentation should also be read for additional
information about the information in the file.
)
SAX.elementDecl(kanjidic2, 4, ...)
SAX.elementDecl(header, 4, ...)
SAX.comment(
The single header element will contain identification information
about the version of the file
)
SAX.elementDecl(file_version, 3, ...)
SAX.comment(
This field denotes the version of kanjidic2 structure, as more
than one version may exist.
)
SAX.elementDecl(database_version, 3, ...)
SAX.comment(
The version of the file, in the format YYYY-NN, where NN will be
a number starting with 01 for the first version released in a
calendar year, then increasing for each version in that year.
)
SAX.elementDecl(date_of_creation, 3, ...)
SAX.comment(
The date the file was created in international format (YYYY-MM-DD).
)
SAX.elementDecl(character, 4, ...)
SAX.elementDecl(literal, 3, ...)
SAX.comment(
The character itself in UTF8 coding.
)
SAX.elementDecl(codepoint, 4, ...)
SAX.comment(
The codepoint element states the code of the character in the various
character set standards.
)
SAX.elementDecl(cp_value, 3, ...)
SAX.comment(
The cp_value contains the codepoint of the character in a particular
standard. The standard will be identified in the cp_type attribute.
)
SAX.attributeDecl(cp_value, cp_type, 1, 2, NULL, ...)
SAX.comment(
The cp_type attribute states the coding standard applying to the
element. The values assigned so far are:
jis208 - JIS X 0208-1997 - kuten coding (nn-nn)
jis212 - JIS X 0212-1990 - kuten coding (nn-nn)
jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn)
ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits)
)
SAX.elementDecl(radical, 4, ...)
SAX.elementDecl(rad_value, 3, ...)
SAX.comment(
The radical number, in the range 1 to 214. The particular
classification type is stated in the rad_type attribute.
)
SAX.attributeDecl(rad_value, rad_type, 1, 2, NULL, ...)
SAX.comment(
The rad_type attribute states the type of radical classification.
classical - as recorded in the KangXi Zidian.
nelson - as used in the Nelson "Modern Japanese-English
Character Dictionary" (i.e. the Classic, not the New Nelson).
This will only be used where Nelson reclassified the kanji.
)
SAX.elementDecl(misc, 4, ...)
SAX.elementDecl(grade, 3, ...)
SAX.comment(
The Jouyou Kanji grade level. 1 through 6 indicate the grade in which
the kanji is taught in Japanese schools. 8 indicates it is one of the
remaining Jouyou Kanji to be learned in junior high school, and 9
indicates it is a Jinmeiyou (for use in names) kanji. [G]
)
SAX.elementDecl(stroke_count, 3, ...)
SAX.comment(
The stroke count of the kanji, including the radical. If more than
one, the first is considered the accepted count, while subsequent ones
are common miscounts. (See Appendix E. of the KANJIDIC documentation
for some of the rules applied when counting strokes in some of the
radicals.) [S]
)
SAX.elementDecl(variant, 3, ...)
SAX.comment(
A cross-reference code to another kanji, usually regarded as a variant.
The type of cross-reference is given in the var_type attribute.
)
SAX.attributeDecl(variant, var_type, 1, 2, NULL, ...)
SAX.comment(
The var_type attribute indicates the type of variant code. The current
values are:
jis208 - in JIS X 0208 - kuten coding
jis212 - in JIS X 0212 - kuten coding
jis213 - in JIS X 0213 - kuten coding
deroo - De Roo number - numeric
njecd - Halpern NJECD index number - numeric
s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor
nelson - "Classic" Nelson - numeric
oneill - Japanese Names (O'Neill) - numeric
)
SAX.elementDecl(freq, 3, ...)
SAX.comment(
A frequency-of-use ranking. The 2,500 most-used characters have a
ranking; those characters that lack this field are not ranked. The
frequency is a number from 1 to 2,500 that expresses the relative
frequency of occurrence of a character in modern Japanese. This is
based on a survey in newspapers, so it is biassed towards kanji
used in newspaper articles. The discrimination between the less
frequently used kanji is not strong.
)
SAX.elementDecl(rad_name, 3, ...)
SAX.comment(
When the kanji is itself a radical and has a name, this element
contains the name (in hiragana.) [T2]
)
SAX.elementDecl(dic_number, 4, ...)
SAX.comment(
This element contains the index numbers and similar unstructured
information such as page numbers in a number of published dictionaries,
and instructional books on kanji.
)
SAX.elementDecl(dic_ref, 3, ...)
SAX.comment(
Each dic_ref contains an index number. The particular dictionary,
etc. is defined by the dr_type attribute.
)
SAX.attributeDecl(dic_ref, dr_type, 1, 2, NULL, ...)
SAX.comment(
The dr_type defines the dictionary or reference book, etc. to which
dic_ref element applies. The initial allocation is:
nelson_c - "Modern Reader's Japanese-English Character Dictionary",
edited by Andrew Nelson (now published as the "Classic"
Nelson).
nelson_n - "The New Nelson Japanese-English Character Dictionary",
edited by John Haig.
halpern_njecd - "New Japanese-English Character Dictionary",
edited by Jack Halpern.
halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by
Jack Halpern.
heisig - "Remembering The Kanji" by James Heisig.
gakken - "A New Dictionary of Kanji Usage" (Gakken)
oneill_names - "Japanese Names", by P.G. O'Neill.
oneill_kk - "Essential Kanji" by P.G. O'Neill.
moro - "Daikanwajiten" compiled by Morohashi. For some kanji two
additional attributes are used: m_vol: the volume of the
dictionary in which the kanji is found, and m_page: the page
number in the volume.
henshall - "A Guide To Remembering Japanese Characters" by
Kenneth G. Henshall.
sh_kk - "Kanji and Kana" by Spahn and Hadamitzky.
sakade - "A Guide To Reading and Writing Japanese" edited by
Florence Sakade.
henshall3 - "A Guide To Reading and Writing Japanese" 3rd
edition, edited by Henshall, Seeley and De Groot.
tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask.
crowley - "The Kanji Way to Japanese Language Power" by
Dale Crowley.
kanji_in_context - "Kanji in Context" by Nishiguchi and Kono.
busy_people - "Japanese For Busy People" vols I-III, published
by the AJLT. The codes are the volume.chapter.
kodansha_compact - the "Kodansha Compact Kanji Guide".
)
SAX.attributeDecl(dic_ref, m_vol, 1, 3, NULL, ...)
SAX.comment(
See above under "moro".
)
SAX.attributeDecl(dic_ref, m_page, 1, 3, NULL, ...)
SAX.comment(
See above under "moro".
)
SAX.elementDecl(query_code, 4, ...)
SAX.comment(
These codes contain information relating to the glyph, and can be used
for finding a required kanji. The type of code is defined by the
qc_type attribute.
)
SAX.elementDecl(q_code, 3, ...)
SAX.comment(
The q_code contains the actual query-code value, according to the
qc_type attribute.
)
SAX.attributeDecl(q_code, qc_type, 1, 2, NULL, ...)
SAX.comment(
The q_code attribute defines the type of query code. The current values
are:
skip - Halpern's SKIP (System of Kanji Indexing by Patterns)
code. The format is n-nn-nn. See the KANJIDIC documentation
for a description of the code and restrictions on the
commercial use of this data. [P]
sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle
1996) by Spahn and Hadamitzky. They are in the form nxnn.n,
e.g. 3k11.2, where the kanji has 3 strokes in the
identifying radical, it is radical "k" in the SH
classification system, there are 11 other strokes, and it is
the 2nd kanji in the 3k11 sequence. (I am very grateful to
Mark Spahn for providing the list of these descriptor codes
for the kanji in this file.) [I]
four_corner - the "Four Corner" code for the kanji. This is a code
invented by Wang Chen in 1928. See the KANJIDIC documentation
for an overview of the Four Corner System. [Q]
deroo - the codes developed by the late Father Joseph De Roo, and
published in his book "2001 Kanji" (Bojinsha). Fr De Roo
gave his permission for these codes to be included. [DR]
misclass - a possible misclassification of the kanji according
to one of the code types. (See the "Z" codes in the KANJIDIC
documentation for more details.)
)
SAX.elementDecl(reading_meaning, 4, ...)
SAX.comment(
The readings for the kanji in several languages, and the meanings, also
in several languages. The readings and meanings are grouped to enable
the handling of the situation where the meaning is differentiated by
reading. [T1]
)
SAX.elementDecl(nanori, 3, ...)
SAX.comment(
Japanese readings that are now only associated with names.
)
SAX.elementDecl(rmgroup, 4, ...)
SAX.elementDecl(reading, 3, ...)
SAX.comment(
The reading element contains the reading or pronunciation
of the kanji.
)
SAX.attributeDecl(reading, r_type, 1, 2, NULL, ...)
SAX.comment(
The r_type attribute defines the type of reading in the reading
element. The current values are:
pinyin - the modern PinYin romanization of the Chinese reading
of the kanji. The tones are represented by a concluding
digit. [Y]
korean_r - the romanized form of the Korean reading(s) of the
kanji. The readings are in the (Republic of Korea) Ministry
of Education style of romanization. [W]
korean_h - the Korean reading(s) of the kanji in hangul.
ja_on - the "on" Japanese reading of the kanji, in katakana. A
second attribute r_status, if present, will indicate with
a value of "jy" whether the reading is approved for a
"Jouyou kanji".
ja_kun - the "kun" Japanese reading of the kanji, in hiragana.
Where relevant the okurigana is also included separated by a
".". Readings associated with prefixes and suffixes are
marked with a "-". A second attribute r_status, if present,
will indicate with a value of "jy" whether the reading is
approved for a "Jouyou kanji".
)
SAX.attributeDecl(reading, r_status, 1, 3, NULL, ...)
SAX.comment(
See under ja_on and ja_kun above.
)
SAX.elementDecl(meaning, 3, ...)
SAX.comment(
The meaning associated with the kanji.
)
SAX.attributeDecl(meaning, m_lang, 1, 3, NULL, ...)
SAX.comment(
The m_lang attribute defines the target language of the meaning. It
will be coded using the two-letter language code from the ISO 639
standard. When absent, the value "en" (i.e. English) is implied. [{}]
)
SAX.externalSubset(kanjidic2, , )
SAX.startElementNs(kanjidic2, NULL, NULL, 0, 0, 0)
SAX.characters(
, 1)
SAX.endElementNs(kanjidic2, NULL, NULL)
SAX.endDocument()