Difference between revisions of "⼥"

From TORI
Jump to navigation Jump to search
Line 9: Line 9:
   
 
==Encoding==
 
==Encoding==
In [[Utf8]], character [[⼥]] ([[⼥]], (& # 1 2 0 6 9 ;)))
+
In [[Utf8]], character [[⼥]] ([[⼥]], (& # 1 2 0 6 9 ;))
 
is encoded with 3 bytes. These bytes are <br>
 
is encoded with 3 bytes. These bytes are <br>
 
xE2 xBC xA5 in the hexadecimal representation and<br>
 
xE2 xBC xA5 in the hexadecimal representation and<br>

Revision as of 17:36, 24 May 2021

Woman227px.png

Utf8 character number 12069 is shown by your browser as ().

HTML input

Character number 12069 can be generated in html in any of the two follo]ing ways:
  & # 1 2 0 6 9 ; in the decimal representation or
  & # x 2 F 2 5 ; in the hexadecimal representation;
both the scripts above generate the same Unicode character number 12069.

Encoding

In Utf8, character (, (& # 1 2 0 6 9 ;)) is encoded with 3 bytes. These bytes are
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation.

Phonetic

may be pronounced おんな (onna) [1].

However, the pronunciation has many meanings, and several different kanji are used to adjust the case.

Semantic

OnnaPic0.jpg

or

Both in Chines [2] and in Japanese [3], character may mean a woman, human female.

Synonym:

Character ( & # 1 2 0 6 9 ;) looks similar to
character ( & # 6 3 8 7 3 ;) and has similar meaning.

Existence of close synonyms is typical for the Unicode characters. Often, characters with similar pictures and similar meaning appear in Unicode; for example, A (A, & # 0 0 6 5 ;), А (А, & # 1 0 4 0 ;), (, & # 5 0 3 4 ;), ( & # 6 5 3 1 3;)
at some graphical interfaces look very similar.

These similarities (and the encoding of the characters mentioned) can be revealed with the PHP program below:

<?php
function mb_str_split($str) {
   // split multibyte string in characters
   // at all positions excepr the start: ^
   // and the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a) 
{
  $M=strlen($a);
  $p=ord($a[0]);                    if($M==1) return $p;
  $p-=194;  $p*=64; $p+=ord($a[1]); if($M==2) return $p;
  $p-=2050; $p*=64; $p+=ord($a[2]);           return $p;
}

//$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";"
$a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";"
//$a='⼥,女'; # pair of different unicode characters separated with coma

$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";

$b = mb_str_split($a);

var_dump($b);
$M=count($b);

#mb_internal_encoding("UTF-8");

for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

This program uses portable PHP functions mb_str_split.t and uniord.t. The oputut is copypasteed below:


The array has 20 bytes; here is its splitting:
e2 bc a5 2c e5 a5 b3 3b 41 d0 90 e1 8e aa e1 97 85 ef bc a1 
array(9) {
  [0]=>
  string(3) "⼥"
  [1]=>
  string(1) ","
  [2]=>
  string(3) "女"
  [3]=>
  string(1) ";"
  [4]=>
  string(1) "A"
  [5]=>
  string(2) "А"
  [6]=>
  string(3) "Ꭺ"
  [7]=>
  string(3) "ᗅ"
  [8]=>
  string(3) "A"
}

Unicode character number 12069 id est, x2F25
Picture: ⼥ uses 3 bytes. These bytes are:
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation

Unicode character number 00044 id est, x002C
Picture: , uses 1 bytes. These bytes are:
x2C in the hexadecimal representation and
 44 in the decimal representation

Unicode character number 22899 id est, x5973
Picture: 女 uses 3 bytes. These bytes are:
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation

Unicode character number 00059 id est, x003B
Picture: ; uses 1 bytes. These bytes are:
x3B in the hexadecimal representation and
 59 in the decimal representation

Unicode character number 00065 id est, x0041
Picture: A uses 1 bytes. These bytes are:
x41 in the hexadecimal representation and
 65 in the decimal representation

Unicode character number 01040 id est, x0410
Picture: А uses 2 bytes. These bytes are:
xD0 x90 in the hexadecimal representation and
208 144 in the decimal representation

Unicode character number 05034 id est, x13AA
Picture: Ꭺ uses 3 bytes. These bytes are:
xE1 x8E xAA in the hexadecimal representation and
225 142 170 in the decimal representation

Unicode character number 05573 id est, x15C5
Picture: ᗅ uses 3 bytes. These bytes are:
xE1 x97 x85 in the hexadecimal representation and
225 151 133 in the decimal representation

Unicode character number 65313 id est, xFF21
Picture: A uses 3 bytes. These bytes are:
xEF xBC xA1 in the hexadecimal representation and
239 188 161 in the decimal representation

References

  1. https://ja.wikipedia.org/wiki/おんな
  2. https://zh.wikipedia.org/wiki/⼥ (https://zh.wikipedia.org/wiki/%E5%A5%B3) 女性,是指雌性的人類。女性這個名詞可以用來表示生物學上的性別劃分,同時亦可指社會認定或自我認同的性別角色,一般只適用於稱呼人類,其他生物通常說是「雌性」或「母的」。 ..
  3. https://ja.wikipedia.org/wiki/女 or https://ja.wikipedia.org/wiki/女性 or https://ja.wikipedia.org/wiki/%E5%A5%B3 女性(じょせい、希: γυναίκα、英: woman)は、男性と対比されるヒト(人間)の性別のこと。一般には生物学のメスと同義だが、社会・個人の価値観や性向に基づいた多様な見方が存在する。

Keywords

Chinese, Female, Japanese, PHP, SomeUtf8, Utf8, UtfH, Utf8table, Woman

,