KanjiConfudal

From TORI
Revision as of 21:16, 7 July 2021 by T (talk | contribs) (Confusion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

KanjiConfudal is set of unicode characters that are easy to confuse with KanjiRadical and KanjiLiberal.

References


http://www.unicode.org/Public/security/8.0.0/confusables.txt (2021) .. F981 ; 5973 ; MA # ( 女 → 女 ) CJK COMPATIBILITY IDEOGRAPH-F981 → CJK UNIFIED IDEOGRAPH-5973 # 2F25 ; 5973 ; MA #* ( ⼥ → 女 ) KANGXI RADICAL WOMAN → CJK UNIFIED IDEOGRAPH-5973 # .. F90A ; 91D1 ; MA # ( 金 → 金 ) CJK COMPATIBILITY IDEOGRAPH-F90A → CJK UNIFIED IDEOGRAPH-91D1 # 2FA6 ; 91D1 ; MA #* ( ⾦ → 金 ) KANGXI RADICAL GOLD → CJK UNIFIED IDEOGRAPH-91D1 # ..

KanjiConfudal is set of unicode characters that are easy to confuse with KanjiRadical and KanjiLiberal.

Number \(n\) of the KanjiConfudal is in range

\( \mathrm{0xF900} \le n < \mathrm{0xFA70} \)

Generator

<?php
include "unichr.t";
for($i=0xf900;$i<0xfA70;$i++){
$a=unichr($i);
if($i%16==0) printf("<br>\nX%04X",$i);
printf(" [[%s]]",$a);
}
echo "\n";
?>

In order to execute the code above,
file unichr.t should be also loaded.

Table

XF900
XF910
XF920
XF930 錄
XF940
XF950
XF960
XF970 勵
XF980
XF990
XF9A0
XF9B0 樂
XF9C0
XF9D0
XF9E0
XF9F0 刺
XFA00
XFA10
XFA20
XFA30 憎
XFA40
XFA50
XFA60

Confusion

Note, that most of software does not make difference between KanjiLiberal and KanjiConfudal; so, it is wain to copypast it to anywhere. This problem is described [1]

In order to get the character is should be specified by its hexadecimal number.

At the default setting of mediawiki, article names with KanjiConfudal characters are not allowed. The clicking on the KanjiConfidal above leads to corresponding partner from the KanjiLiberal set. In particular,
click on (XF902) leads to (X8ECA)
click on (XF90A) leads to (X91D1)
click on (XF963) leads to (X5317)
click on (XF980) leads to (X5442) ("Ryo": spine, backbone)
click on (XF981) leads to (X5973)
click on (XF98E) leads to (X5E74) (ねん (nen), year)
click on (XF9D1) leads to (X516D) ("roku", 6)
click on (XF9F4) leads to (X6797) ("Hayashi", wood)
click on (XF9F7) leads to (X7ACB) ("Tatsuru", stand)
click on (XF9FD) leads to (X4EC0) ("じゅう", equipment)
click on (XF9E0) leads to (X6613) ("yasu", easy)
click on (XFA0A) leads to (X898B) (みる、look)
click on (XFA3C) leads to (X5C6E)
click on (XFA5E) leads to (X8279)
click on (XFA66) leads to (X8FB6)
..

No simple relation is found between the number of a KanjiConfudal and the number of the corresponding KanjiLiberal.

References

  1. https://www.unicode.org/charts/PDF/UF900.pdf CJK Compatibility Ideographs (2021)// This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 13.0 .. Range: F900–FAFF A thorough understanding of the information contained in these additional sources is required for a successful implementation. Copying characters from the character code tables or list of character names is not recommended, because for production reasons the PDF files for the code charts cannot guarantee that the correct character codes will always be copied.A thorough understanding of the information contained in these additional sources is required for a successful implementation. Copying characters from the character code tables or list of character names is not recommended, because for production reasons the PDF files for the code charts cannot guarantee that the correct character codes will always be copied. ..

http://www.unicode.org/Public/security/8.0.0/confusables.txt (2021) .. F981 ; 5973 ; MA # ( 女 → 女 ) CJK COMPATIBILITY IDEOGRAPH-F981 → CJK UNIFIED IDEOGRAPH-5973 # 2F25 ; 5973 ; MA #* ( ⼥ → 女 ) KANGXI RADICAL WOMAN → CJK UNIFIED IDEOGRAPH-5973 # .. F90A ; 91D1 ; MA # ( 金 → 金 ) CJK COMPATIBILITY IDEOGRAPH-F90A → CJK UNIFIED IDEOGRAPH-91D1 # 2FA6 ; 91D1 ; MA #* ( ⾦ → 金 ) KANGXI RADICAL GOLD → CJK UNIFIED IDEOGRAPH-91D1 # ..

https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. Such encodings include the South Korean KS X 1001:1998 (U+F900–U+FA0B, 268 characters), Taiwanese Big5 (U+FA0C–U+FA0D, 2 characters), Japanese IBM 32 (CP932 variant; U+FA0E–U+FA2D, 32 characters), South Korean KS X 1001:2004 (U+FA2E–U+FA2F, 2 character), Japanese JIS X 0213 (U+FA30–U+FA6A, 59 characters), Japanese ARIB STD-B24 (U+FA6B–U+FA6D, 3 characters) and the North Korean KPS 10721-2000 (U+FA70–U+FAD9, 106 characters) source standards.

https://en.wikipedia.org/wiki/ARIB_STD_B24_character_set Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[2] The latest revision is version 6.3 as of 2016-07-06. // It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[4]

Keywords

Chinese, Confusion, Japanese, Kanji, KanjiConfudal, KanjiLiberal, KanjiRadical, PHP, Unicode