Difference between revisions of "⼥"
Line 9: | Line 9: | ||
==Encoding== |
==Encoding== |
||
− | In [[Utf8]], character [[⼥]] ([[⼥]], (& # 1 2 0 6 9 ; |
+ | In [[Utf8]], character [[⼥]] ([[⼥]], (& # 1 2 0 6 9 ;)) |
is encoded with 3 bytes. These bytes are <br> |
is encoded with 3 bytes. These bytes are <br> |
||
xE2 xBC xA5 in the hexadecimal representation and<br> |
xE2 xBC xA5 in the hexadecimal representation and<br> |
Revision as of 17:36, 24 May 2021
Utf8 character number 12069 is shown by your browser as ⼥ (⼥).
HTML input
Character number 12069 can be generated in html in any of the two follo]ing ways:
⼥ & # 1 2 0 6 9 ; in the decimal representation or
⼥ & # x 2 F 2 5 ; in the hexadecimal representation;
both the scripts above generate the same Unicode character number 12069.
Encoding
In Utf8, character ⼥ (⼥, (& # 1 2 0 6 9 ;))
is encoded with 3 bytes. These bytes are
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation.
Phonetic
⼥ may be pronounced おんな (onna) [1].
However, the pronunciation has many meanings, and several different kanji are used to adjust the case.
Semantic
Both in Chines [2] and in Japanese [3], character ⼥ may mean a woman, human female.
Synonym: 女
Character ⼥ ( & # 1 2 0 6 9 ;
)
looks similar to
character 女 ( & # 6 3 8 7 3 ;
)
and has similar meaning.
Existence of close synonyms is typical for the Unicode characters.
Often, characters with similar pictures and similar meaning appear in Unicode;
for example,
A (A, & # 0 0 6 5 ;),
А (А, & # 1 0 4 0 ;),
Ꭺ (Ꭺ, & # 5 0 3 4 ;),
A (A & # 6 5 3 1 3;)
at some graphical interfaces look very similar.
These similarities (and the encoding of the characters mentioned) can be revealed with the PHP program below:
<?php function mb_str_split($str) { // split multibyte string in characters // at all positions excepr the start: ^ // and the end: $ $pattern = '/(?<!^)(?!$)/u'; return preg_split($pattern,$str); } function uniord($a) { $M=strlen($a); $p=ord($a[0]); if($M==1) return $p; $p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p; $p-=2050; $p*=64; $p+=ord($a[2]); return $p; } //$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";" $a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";" //$a='⼥,女'; # pair of different unicode characters separated with coma $N=strlen($a); echo "The array has $N bytes; here is its splitting:\n"; for($n=0;$n<$N;$n++) { printf("%02x ",ord($a[$n]) ); } echo "\n"; $b = mb_str_split($a); var_dump($b); $M=count($b); #mb_internal_encoding("UTF-8"); for($m=0;$m<$M;$m++) { printf("\n"); $c=$b[$m]; $u=uniord($c); printf("Unicode character number %05d id est, x%04X\n",$u,$u); $d=strlen($c); echo "Picture: $c uses $d bytes. These bytes are:\n"; for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n])); printf("in the hexadecimal representation and\n"); for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n])); printf("in the decimal representation\n"); } ?>
This program uses portable PHP functions mb_str_split.t and uniord.t. The oputut is copypasteed below:
The array has 20 bytes; here is its splitting: e2 bc a5 2c e5 a5 b3 3b 41 d0 90 e1 8e aa e1 97 85 ef bc a1 array(9) { [0]=> string(3) "⼥" [1]=> string(1) "," [2]=> string(3) "女" [3]=> string(1) ";" [4]=> string(1) "A" [5]=> string(2) "А" [6]=> string(3) "Ꭺ" [7]=> string(3) "ᗅ" [8]=> string(3) "A" } Unicode character number 12069 id est, x2F25 Picture: ⼥ uses 3 bytes. These bytes are: xE2 xBC xA5 in the hexadecimal representation and 226 188 165 in the decimal representation Unicode character number 00044 id est, x002C Picture: , uses 1 bytes. These bytes are: x2C in the hexadecimal representation and 44 in the decimal representation Unicode character number 22899 id est, x5973 Picture: 女 uses 3 bytes. These bytes are: xE5 xA5 xB3 in the hexadecimal representation and 229 165 179 in the decimal representation Unicode character number 00059 id est, x003B Picture: ; uses 1 bytes. These bytes are: x3B in the hexadecimal representation and 59 in the decimal representation Unicode character number 00065 id est, x0041 Picture: A uses 1 bytes. These bytes are: x41 in the hexadecimal representation and 65 in the decimal representation Unicode character number 01040 id est, x0410 Picture: А uses 2 bytes. These bytes are: xD0 x90 in the hexadecimal representation and 208 144 in the decimal representation Unicode character number 05034 id est, x13AA Picture: Ꭺ uses 3 bytes. These bytes are: xE1 x8E xAA in the hexadecimal representation and 225 142 170 in the decimal representation Unicode character number 05573 id est, x15C5 Picture: ᗅ uses 3 bytes. These bytes are: xE1 x97 x85 in the hexadecimal representation and 225 151 133 in the decimal representation Unicode character number 65313 id est, xFF21 Picture: A uses 3 bytes. These bytes are: xEF xBC xA1 in the hexadecimal representation and 239 188 161 in the decimal representation
References
- ↑ https://ja.wikipedia.org/wiki/おんな
- ↑ https://zh.wikipedia.org/wiki/⼥ (https://zh.wikipedia.org/wiki/%E5%A5%B3) 女性,是指雌性的人類。女性這個名詞可以用來表示生物學上的性別劃分,同時亦可指社會認定或自我認同的性別角色,一般只適用於稱呼人類,其他生物通常說是「雌性」或「母的」。 ..
- ↑ https://ja.wikipedia.org/wiki/女 or https://ja.wikipedia.org/wiki/女性 or https://ja.wikipedia.org/wiki/%E5%A5%B3 女性(じょせい、希: γυναίκα、英: woman)は、男性と対比されるヒト(人間)の性別のこと。一般には生物学のメスと同義だが、社会・個人の価値観や性向に基づいた多様な見方が存在する。
Keywords
Chinese, Female, Japanese, PHP, SomeUtf8, Utf8, UtfH, Utf8table, Woman