⼥
Utf8 character number 12069 is shown by your browser as ⼥ (⼥).
HTML input
Character number 12069 can be generated in html in any of the two follo]ing ways:
⼥ & # 1 2 0 6 9 ; in the decimal representation or
⼥ & # x 2 F 2 5 ; in the hexadecimal representation;
both the scripts above generate the same Unicode character number 12069.
Encoding
In Utf8, character ⼥ (⼥, (& # 1 2 0 6 9 ;))
is encoded with 3 bytes. These bytes are
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation.
Phonetic
⼥ may be pronounced おんな (onna) [1].
However, the pronunciation has many meanings, and several different kanji are used to adjust the case.
Semantic
Both in Chines [2] and in Japanese [3], character ⼥ may mean a woman, human female.
Synonym: 女
Character ⼥ ( & # 1 2 0 6 9 ;)
looks similar to
character 女 ( & # 6 3 8 7 3 ;)
and has similar meaning.
Existence of close synonyms is typical for the Unicode characters.
Often, characters with similar pictures and similar meaning appear in Unicode;
for example,
A (A, & # 0 0 6 5 ;),
А (А, & # 1 0 4 0 ;),
Ꭺ (Ꭺ, & # 5 0 3 4 ;),
A (A & # 6 5 3 1 3;)
at some graphical interfaces look very similar.
These similarities (and the encoding of the characters mentioned) can be revealed with the PHP program below:
<?php
function mb_str_split($str) {
// split multibyte string in characters
// at all positions excepr the start: ^
// and the end: $
$pattern = '/(?<!^)(?!$)/u';
return preg_split($pattern,$str);
}
function uniord($a)
{
$M=strlen($a);
$p=ord($a[0]); if($M==1) return $p;
$p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p;
$p-=2050; $p*=64; $p+=ord($a[2]); return $p;
}
//$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";"
$a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";"
//$a='⼥,女'; # pair of different unicode characters separated with coma
$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";
for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";
$b = mb_str_split($a);
var_dump($b);
$M=count($b);
#mb_internal_encoding("UTF-8");
for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>
This program uses portable PHP functions mb_str_split.t and uniord.t. The oputut is copypasteed below:
The array has 20 bytes; here is its splitting:
e2 bc a5 2c e5 a5 b3 3b 41 d0 90 e1 8e aa e1 97 85 ef bc a1
array(9) {
[0]=>
string(3) "⼥"
[1]=>
string(1) ","
[2]=>
string(3) "女"
[3]=>
string(1) ";"
[4]=>
string(1) "A"
[5]=>
string(2) "А"
[6]=>
string(3) "Ꭺ"
[7]=>
string(3) "ᗅ"
[8]=>
string(3) "A"
}
Unicode character number 12069 id est, x2F25
Picture: ⼥ uses 3 bytes. These bytes are:
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation
Unicode character number 00044 id est, x002C
Picture: , uses 1 bytes. These bytes are:
x2C in the hexadecimal representation and
44 in the decimal representation
Unicode character number 22899 id est, x5973
Picture: 女 uses 3 bytes. These bytes are:
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation
Unicode character number 00059 id est, x003B
Picture: ; uses 1 bytes. These bytes are:
x3B in the hexadecimal representation and
59 in the decimal representation
Unicode character number 00065 id est, x0041
Picture: A uses 1 bytes. These bytes are:
x41 in the hexadecimal representation and
65 in the decimal representation
Unicode character number 01040 id est, x0410
Picture: А uses 2 bytes. These bytes are:
xD0 x90 in the hexadecimal representation and
208 144 in the decimal representation
Unicode character number 05034 id est, x13AA
Picture: Ꭺ uses 3 bytes. These bytes are:
xE1 x8E xAA in the hexadecimal representation and
225 142 170 in the decimal representation
Unicode character number 05573 id est, x15C5
Picture: ᗅ uses 3 bytes. These bytes are:
xE1 x97 x85 in the hexadecimal representation and
225 151 133 in the decimal representation
Unicode character number 65313 id est, xFF21
Picture: A uses 3 bytes. These bytes are:
xEF xBC xA1 in the hexadecimal representation and
239 188 161 in the decimal representation
References
- ↑ https://ja.wikipedia.org/wiki/おんな
- ↑ https://zh.wikipedia.org/wiki/⼥ (https://zh.wikipedia.org/wiki/%E5%A5%B3) 女性,是指雌性的人類。女性這個名詞可以用來表示生物學上的性別劃分,同時亦可指社會認定或自我認同的性別角色,一般只適用於稱呼人類,其他生物通常說是「雌性」或「母的」。 ..
- ↑ https://ja.wikipedia.org/wiki/女 or https://ja.wikipedia.org/wiki/女性 or https://ja.wikipedia.org/wiki/%E5%A5%B3 女性(じょせい、希: γυναίκα、英: woman)は、男性と対比されるヒト(人間)の性別のこと。一般には生物学のメスと同義だが、社会・個人の価値観や性向に基づいた多様な見方が存在する。
Keywords
Chinese, Female, Japanese, PHP, SomeUtf8, Utf8, UtfH, Utf8table, Woman