X5973

From TORI
Revision as of 21:49, 25 May 2021 by T (talk | contribs) (→‎Phonetic)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

X5973 refers to Unicode character number number 22899.

At your browser, X5973 is shown as ( , & # 2 2 8 9 9 ;), ( , & # x 5 9 7 3 ;)

In Chinese and Japanese languages, this character is used to denote a woman.

Woman227px.png

Encoding

In Utf8, character X5973 is encoded with three bytes. These bytes are
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation.

Phonetic

Character X5973 an its picture may be pronounced おんな (onna) [1].

However, the pronunciation has many meanings, and several different kanji are used to adjust the case.

Semantic

Onna1282725fragment.png
or or

Usually, in Chinese and Japanese, character X5973 denotes a woman, female Human being.

Synonyms: , ,

With various softwares, including the command line, the text editors and the brausers, the three characters
(X2F25), denoted also as (& # 1 2 0 6 9 ;),
(X5973), denoted also as (& # 2 2 8 9 9 ;),
(XF981), denoted also as (& # 6 3 8 7 3 ;)
have similar images, similar pronouncing (in Japanese, "onna") and similar meaning ("Woman"). In this sense, these characters should be qualified as synonyms. However, the identification (the same or not the same) depends on the software used.

For example, the basic setting of mediawiki does not allow to create separated articles entitled and . The same refers to some operational systems: they do not allow different directories (folders) to have names and .

In century 20, similar confusion took place even for the ascii characters; some operational system did not see diference between the upper case letter and the lower case letter. for such a degenerate operational system, for example, folder A is the same as folder a.

Character XF981 ( , , & # 6 3 8 7 3 ;) appears as recommended Kanji number 952 in the Table jōyō (2021) [2][3].

The similarity of characters (X2F25, X5973 and XF981 can be revealed with the PHP program below:

<?php 
function unichr($dec) {
  if ($dec < 128) {
    $utf = chr($dec);
  } else if ($dec < 2048) {
    $utf = chr(192 + (($dec - ($dec % 64)) / 64));
    $utf .= chr(128 + ($dec % 64));
  } else {
    $utf = chr(224 + (($dec - ($dec % 4096)) / 4096));
    $utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64));
    $utf .= chr(128 + ($dec % 64));
  }
  return $utf;
} 

function mb_str_split($str) {
   // split multibyte string in characters
   // at all positions except the start: ^
   // and the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a) 
{
  $M=strlen($a);
  $p=ord($a[0]);                    if($M==1) return $p;
  $p-=194;  $p*=64; $p+=ord($a[1]); if($M==2) return $p;
  $p-=2050; $p*=64; $p+=ord($a[2]);           return $p;
}

$a=unichr(0x2f25);
echo "$a\n";
$a.=unichr(0x5973);
echo "$a\n";
$a.=unichr(0xF981);
echo "$a\n";

//$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";"
//$a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";"
//$a='⼥,女'; # pair of different unicode characters separated with coma

$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";

$b = mb_str_split($a);

var_dump($b);
$M=count($b);

#mb_internal_encoding("UTF-8");

for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

This program uses portable PHP functions unichr.t, mb_str_split.t and uniord.t. The output is

⼥
⼥女
⼥女女
The array has 9 bytes; here is its splitting:
e2 bc a5 e5 a5 b3 ef a6 81 
array(3) {
  [0]=>
  string(3) "⼥"
  [1]=>
  string(3) "女"
  [2]=>
  string(3) "女"
}

Unicode character number 12069 id est, x2F25
Picture: ⼥ uses 3 bytes. These bytes are:
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation

Unicode character number 22899 id est, x5973
Picture: 女 uses 3 bytes. These bytes are:
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation

Unicode character number 63873 id est, xF981
Picture: 女 uses 3 bytes. These bytes are:
xEF xA6 x81 in the hexadecimal representation and
239 166 129 in the decimal representation

Confusions

Characters
X2F25, ( , & # x 2 f 2 5 ;) [4],
X5973 ( , & # x 5 9 7 3 ;) [5] and
XF981 ( , & # x F 9 8 1 ;) [6]
are easy to confuse. All the three appear with pictures similar to .

Not only Humans, but also the default mediawiki software confuse X5973 and XF981, redirecting from one to another. The same refers to various text editors, they confuse these characters. Such a case can be expressed with sentence: "If something is wrong, Cherchez la " [7][8][9].

For this reason, the names of articles should not include character X5973 nor XF981; at least until these bugs in the software will be corrected, and computers, where the old (not corrected for this bug) software is installed, become unusable. This may take from 10 to 100 years.

There are many confusible characters in Unicode [10]. For this reason, for serious documents, the Ascii characters should be preferred.

References

  1. https://ja.wikipedia.org/wiki/おんな
  2. https://ja.wikipedia.org/wiki/%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E4%B8%80%E8%A6%A7
    https://ja.wikipedia.org/wiki/常用漢字一覧 常用漢字一覧(じょうようかんじいちらん) 常用漢字は2136字。下表の配列は常用漢字表(平成22年内閣告示第2号)に準じる。
  3. https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji
    https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji system of representing written Japanese consists of 2,136 characters.
  4. https://util.unicode.org/UnicodeJsps/character.jsp?a=2F25 2F25 KANGXI RADICAL WOMAN Han Script id: allowed confuse: 女 , 女
  5. https://util.unicode.org/UnicodeJsps/character.jsp?a=5973 女 5973 CJK UNIFIED IDEOGRAPH-5973 Han Script id: restricted confuse: 女 , ⼥
  6. https://util.unicode.org/UnicodeJsps/character.jsp?a=F981 F981 CJK COMPATIBILITY IDEOGRAPH-F981 Han Script id: allowed confuse: 女 , ⼥
  7. https://archive.org/details/lesmohicansdepa02dumagoog/page/n243/mode/2up?view=theater Alexandre Dumas. Les Mohicans de Paris. 1874, p.332. .. L'huissier disparut par une porte, et revint presque aua* sitôt. — Dans deux minutes, M. Jackal est à vous. Effectivement, un instant après, la porté se rouvrit, et, avant que Ton vit encore personne, on entendit une voix qui criait : — Cherchez la femme, pardieu! cherchez la femme! Puis parut Thomme dont on venait d'entendre la voix. Essayons de tracer le portrait de M. Jackal...
  8. https://fr.wikipedia.org/wiki/Cherchez_la_femme « Cherchez la femme » est une expression connue sous sa forme française dans des ouvrages écrits en anglais, en italien et dans plusieurs autres langues. ..
  9. https://en.wikipedia.org/wiki/Cherchez_la_femme Cherchez la femme (French: [ʃɛʁʃe la fam]) is a French phrase which literally means 'look for the woman'. .. Il y a une femme dans toutes les affaires; aussitôt qu'on me fait un rapport, je dis: « Cherchez la femme ! »
  10. http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt
    1. Summary: Recommended confusable mapping for IDN
    2. File: confusablesSummary.txt
    3. Version: 2.1-draft
    4. Generated: 2010-04-13, 01:33:25 GMT
    5. Checkin: $Revision: 1.29 $
    6. For documentation and usage, see http://www.unicode.org/reports/tr39/

Keywords

Chinese, Female, Japanese, Kanji, Onna, PHP, SomeU, SomeUtf8, Unicode, Utf8, UtfH, Utf8table, Woman, X2F25, X5973, XF981,