Difference between revisions of "X2F25"
Line 156: | Line 156: | ||
This program uses portable PHP functions |
This program uses portable PHP functions |
||
− | [[ |
+ | [[unichr.t]], |
[[mb_str_split.t]] and |
[[mb_str_split.t]] and |
||
[[uniord.t]]. The output is |
[[uniord.t]]. The output is |
Latest revision as of 10:13, 17 June 2021
X2F25 is name of the Unicode character 12069.
Utf8 character number 12069 (X2F25) is shown by your browser as ⼥, (⼥, ⼥).
HTML input
Character number 12069 can be generated in html with the Ascii input in any of the two follo]ing ways:
⼥ & # 1 2 0 6 9 ; in the decimal representation or
⼥ & # x 2 F 2 5 ; in the hexadecimal representation;
both the scripts above generate the same Unicode character number 12069.
Encoding
In Utf8, character ⼥ (⼥, (& # 1 2 0 6 9 ;))
is encoded with 3 bytes. These bytes are
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation.
Phonetic
⼥ may be pronounced おんな (onna) [1].
However, the pronunciation has many meanings, and several different kanji are used to adjust the case.
Semantic
Both in Chines [2] and in Japanese [3], character ⼥ may mean a woman, human female.
Synonyms: 女 and 女
Character ⼥ (⼥ , & # 1 2 0 6 9 ;)
looks similar to
character 女 (女 , & # 2 2 8 9 9 ;)
and has similar meaning, and similar to
character 女 (女 , & # 6 3 8 7 3 ;)
, that also has similar meaning
[4].
Character 女 (& # 6 3 8 7 3 ;) appears as recommended Kanji number 952 in the Table jōyō (2021) [5][6].
Existence of close synonyms is typical for the Unicode characters.
Often, characters with similar pictures and similar meaning appear in Unicode;
for example,
A (A, & # 0 0 6 5 ;),
А (А, & # 1 0 4 0 ;),
Ꭺ (Ꭺ, & # 5 0 3 4 ;),
ᗅ (A & # 6 5 3 1 3;)
at some graphical interfaces look very similar.
These similarities (and the encoding of the characters mentioned) can be revealed with the PHP program below:
<?php function unichr($dec) { if ($dec < 128) { $utf = chr($dec); } else if ($dec < 2048) { $utf = chr(192 + (($dec - ($dec % 64)) / 64)); $utf .= chr(128 + ($dec % 64)); } else { $utf = chr(224 + (($dec - ($dec % 4096)) / 4096)); $utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64)); $utf .= chr(128 + ($dec % 64)); } return $utf; } function mb_str_split($str) { // split multibyte string in characters // at all positions except the start: ^ // and the end: $ $pattern = '/(?<!^)(?!$)/u'; return preg_split($pattern,$str); } function uniord($a) { $M=strlen($a); $p=ord($a[0]); if($M==1) return $p; $p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p; $p-=2050; $p*=64; $p+=ord($a[2]); return $p; } $a=unichr(0x2f25); echo "$a\n"; $a.=unichr(0x5973); echo "$a\n"; $a.=unichr(0xF981); echo "$a\n"; //$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";" //$a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";" //$a='⼥,女'; # pair of different unicode characters separated with coma $N=strlen($a); echo "The array has $N bytes; here is its splitting:\n"; for($n=0;$n<$N;$n++) { printf("%02x ",ord($a[$n]) ); } echo "\n"; $b = mb_str_split($a); var_dump($b); $M=count($b); #mb_internal_encoding("UTF-8"); for($m=0;$m<$M;$m++) { printf("\n"); $c=$b[$m]; $u=uniord($c); printf("Unicode character number %05d id est, x%04X\n",$u,$u); $d=strlen($c); echo "Picture: $c uses $d bytes. These bytes are:\n"; for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n])); printf("in the hexadecimal representation and\n"); for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n])); printf("in the decimal representation\n"); } ?>
This program uses portable PHP functions unichr.t, mb_str_split.t and uniord.t. The output is
⼥ ⼥女 ⼥女女 The array has 9 bytes; here is its splitting: e2 bc a5 e5 a5 b3 ef a6 81 array(3) { [0]=> string(3) "⼥" [1]=> string(3) "女" [2]=> string(3) "女" } Unicode character number 12069 id est, x2F25 Picture: ⼥ uses 3 bytes. These bytes are: xE2 xBC xA5 in the hexadecimal representation and 226 188 165 in the decimal representation Unicode character number 22899 id est, x5973 Picture: 女 uses 3 bytes. These bytes are: xE5 xA5 xB3 in the hexadecimal representation and 229 165 179 in the decimal representation Unicode character number 63873 id est, xF981 Picture: 女 uses 3 bytes. These bytes are: xEF xA6 x81 in the hexadecimal representation and 239 166 129 in the decimal representation
Confusions
Characters
X2F25, (⼥ , & # x 2 f 2 5 ;)
[7],
X5973 (女 , & # x 5 9 7 3 ;)
[8] and
XF981 (女 , & # x F 9 8 1 ;)
[9]
are easy to confuse.
All the three appear with pictures similar to ⼥.
Not only Humans, but also the default mediawiki software confuse X5973 and XF981, redirecting from one to another. The same refers to various text editors, they confuse these characters. Such a case can be expressed with sentence: "If something is wrong, Cherchez la ⼥" [10][11][12].
For this reason, the names of articles should not include character X5973 nor XF981; at least until these bugs in the software will be corrected, and computers, where the old (not corrected for this bug) software is installed, become unusable. This may take from 10 to 100 years.
There are many confusible characters in Unicode [13]. For this reason, for serious documents, the Ascii characters should be preferred.
References
- ↑ https://ja.wikipedia.org/wiki/おんな
- ↑ https://zh.wikipedia.org/wiki/⼥ (https://zh.wikipedia.org/wiki/%E5%A5%B3) 女性,是指雌性的人類。女性這個名詞可以用來表示生物學上的性別劃分,同時亦可指社會認定或自我認同的性別角色,一般只適用於稱呼人類,其他生物通常說是「雌性」或「母的」。 ..
- ↑ https://ja.wikipedia.org/wiki/女 or https://ja.wikipedia.org/wiki/女性 or https://ja.wikipedia.org/wiki/%E5%A5%B3 女性(じょせい、希: γυναίκα、英: woman)は、男性と対比されるヒト(人間)の性別のこと。一般には生物学のメスと同義だが、社会・個人の価値観や性向に基づいた多様な見方が存在する。
- ↑ https://en.wiktionary.org/wiki/%E5%A5%B3 https://en.wiktionary.org/wiki/女
- ↑
https://ja.wikipedia.org/wiki/%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E4%B8%80%E8%A6%A7
https://ja.wikipedia.org/wiki/常用漢字一覧 常用漢字一覧(じょうようかんじいちらん) 常用漢字は2136字。下表の配列は常用漢字表(平成22年内閣告示第2号)に準じる。 - ↑
https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji
https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji system of representing written Japanese consists of 2,136 characters. - ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=2F25 ⼥ 2F25 KANGXI RADICAL WOMAN Han Script id: allowed confuse: 女 , 女
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=5973 女 5973 CJK UNIFIED IDEOGRAPH-5973 Han Script id: restricted confuse: 女 , ⼥
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=F981 女 F981 CJK COMPATIBILITY IDEOGRAPH-F981 Han Script id: allowed confuse: 女 , ⼥
- ↑ https://archive.org/details/lesmohicansdepa02dumagoog/page/n243/mode/2up?view=theater Alexandre Dumas. Les Mohicans de Paris. 1874, p.332. .. L'huissier disparut par une porte, et revint presque aua* sitôt. — Dans deux minutes, M. Jackal est à vous. Effectivement, un instant après, la porté se rouvrit, et, avant que Ton vit encore personne, on entendit une voix qui criait : — Cherchez la femme, pardieu! cherchez la femme! Puis parut Thomme dont on venait d'entendre la voix. Essayons de tracer le portrait de M. Jackal...
- ↑ https://fr.wikipedia.org/wiki/Cherchez_la_femme « Cherchez la femme » est une expression connue sous sa forme française dans des ouvrages écrits en anglais, en italien et dans plusieurs autres langues. ..
- ↑ https://en.wikipedia.org/wiki/Cherchez_la_femme Cherchez la femme (French: [ʃɛʁʃe la fam]) is a French phrase which literally means 'look for the woman'. .. Il y a une femme dans toutes les affaires; aussitôt qu'on me fait un rapport, je dis: « Cherchez la femme ! »
- ↑
http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt
- Summary: Recommended confusable mapping for IDN
- File: confusablesSummary.txt
- Version: 2.1-draft
- Generated: 2010-04-13, 01:33:25 GMT
- Checkin: $Revision: 1.29 $
- For documentation and usage, see http://www.unicode.org/reports/tr39/
Keywords
Chinese, Female, Japanese, Kanji, PHP, SomeUtf8, Unicode, Utf8, UtfH, Utf8table, Woman