From TORI
Revision as of 17:36, 24 May 2021 by T (talk | contribs) (→‎Encoding)
Jump to navigation Jump to search
Woman227px.png

Utf8 character number 12069 is shown by your browser as ().

HTML input

Character number 12069 can be generated in html in any of the two follo]ing ways:
  & # 1 2 0 6 9 ; in the decimal representation or
  & # x 2 F 2 5 ; in the hexadecimal representation;
both the scripts above generate the same Unicode character number 12069.

Encoding

In Utf8, character (, (& # 1 2 0 6 9 ;)) is encoded with 3 bytes. These bytes are
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation.

Phonetic

may be pronounced おんな (onna) [1].

However, the pronunciation has many meanings, and several different kanji are used to adjust the case.

Semantic

OnnaPic0.jpg

or

Both in Chines [2] and in Japanese [3], character may mean a woman, human female.

Synonym:

Character ( & # 1 2 0 6 9 ;) looks similar to
character ( & # 6 3 8 7 3 ;) and has similar meaning.

Existence of close synonyms is typical for the Unicode characters. Often, characters with similar pictures and similar meaning appear in Unicode; for example, A (A, & # 0 0 6 5 ;), А (А, & # 1 0 4 0 ;), (, & # 5 0 3 4 ;), ( & # 6 5 3 1 3;)
at some graphical interfaces look very similar.

These similarities (and the encoding of the characters mentioned) can be revealed with the PHP program below:

<?php
function mb_str_split($str) {
   // split multibyte string in characters
   // at all positions excepr the start: ^
   // and the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a) 
{
  $M=strlen($a);
  $p=ord($a[0]);                    if($M==1) return $p;
  $p-=194;  $p*=64; $p+=ord($a[1]); if($M==2) return $p;
  $p-=2050; $p*=64; $p+=ord($a[2]);           return $p;
}

//$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";"
$a='⼥,女;AАᎪᗅA'; # different unicode characters separated with "," and ";"
//$a='⼥,女'; # pair of different unicode characters separated with coma

$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";

$b = mb_str_split($a);

var_dump($b);
$M=count($b);

#mb_internal_encoding("UTF-8");

for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

This program uses portable PHP functions mb_str_split.t and uniord.t. The oputut is copypasteed below:


The array has 20 bytes; here is its splitting:
e2 bc a5 2c e5 a5 b3 3b 41 d0 90 e1 8e aa e1 97 85 ef bc a1 
array(9) {
  [0]=>
  string(3) "⼥"
  [1]=>
  string(1) ","
  [2]=>
  string(3) "女"
  [3]=>
  string(1) ";"
  [4]=>
  string(1) "A"
  [5]=>
  string(2) "А"
  [6]=>
  string(3) "Ꭺ"
  [7]=>
  string(3) "ᗅ"
  [8]=>
  string(3) "A"
}

Unicode character number 12069 id est, x2F25
Picture: ⼥ uses 3 bytes. These bytes are:
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation

Unicode character number 00044 id est, x002C
Picture: , uses 1 bytes. These bytes are:
x2C in the hexadecimal representation and
 44 in the decimal representation

Unicode character number 22899 id est, x5973
Picture: 女 uses 3 bytes. These bytes are:
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation

Unicode character number 00059 id est, x003B
Picture: ; uses 1 bytes. These bytes are:
x3B in the hexadecimal representation and
 59 in the decimal representation

Unicode character number 00065 id est, x0041
Picture: A uses 1 bytes. These bytes are:
x41 in the hexadecimal representation and
 65 in the decimal representation

Unicode character number 01040 id est, x0410
Picture: А uses 2 bytes. These bytes are:
xD0 x90 in the hexadecimal representation and
208 144 in the decimal representation

Unicode character number 05034 id est, x13AA
Picture: Ꭺ uses 3 bytes. These bytes are:
xE1 x8E xAA in the hexadecimal representation and
225 142 170 in the decimal representation

Unicode character number 05573 id est, x15C5
Picture: ᗅ uses 3 bytes. These bytes are:
xE1 x97 x85 in the hexadecimal representation and
225 151 133 in the decimal representation

Unicode character number 65313 id est, xFF21
Picture: A uses 3 bytes. These bytes are:
xEF xBC xA1 in the hexadecimal representation and
239 188 161 in the decimal representation

References

  1. https://ja.wikipedia.org/wiki/おんな
  2. https://zh.wikipedia.org/wiki/⼥ (https://zh.wikipedia.org/wiki/%E5%A5%B3) 女性,是指雌性的人類。女性這個名詞可以用來表示生物學上的性別劃分,同時亦可指社會認定或自我認同的性別角色,一般只適用於稱呼人類,其他生物通常說是「雌性」或「母的」。 ..
  3. https://ja.wikipedia.org/wiki/女 or https://ja.wikipedia.org/wiki/女性 or https://ja.wikipedia.org/wiki/%E5%A5%B3 女性(じょせい、希: γυναίκα、英: woman)は、男性と対比されるヒト(人間)の性別のこと。一般には生物学のメスと同義だが、社会・個人の価値観や性向に基づいた多様な見方が存在する。

Keywords

Chinese, Female, Japanese, PHP, SomeUtf8, Utf8, UtfH, Utf8table, Woman

,