From TORI
Jump to navigation Jump to search
Lilliput2.jpg
森の間の

is unicode character number 23567 (see Utf8table) [1]

Phonetic

In Japanese, can be pronounced [2] as
"ショウ", "ちい-さい", "こ", "お";
"shō", "chii-sai", "ko", "o"

Semantic

may mean "small", "tiny", "young" [3]

is often followed by the two hiragana characters, さな;
these two hiragana characters show, that refers to size (small) of an object.

Synonym:

Character
( (& # 1 2 0 7 3 ;)):
is synonym of
( (& # 2 3 5 6 7 ;))
In the most of software, these characters have similar pictures (that may cause confusion mentioned below);
in Japanese, these characters have similar meaning ("small") and similar pronunciation ("chiisai").

Antonyms: and

Gulliver2.jpg
岩の間の

Unicode characters
( (& # 1 2 0 6 8 ;)),
( (& # 2 2 8 2 3 ;)),
can be interpreted as antonyms of . In Japanese and in Chinese, these characters may have meaning "big", "large", "huge".

Encoding

Character is encoded with 3 bytes:
xe5 xb0 x8f in the hexadecimal representation and
229 176 143 in the decimal representation.

Html input:
( (& # 2 3 5 6 7 ;))
( (& # x 5 c 0 f ;))

The encoding of and related characters can be seen with the PHP program below:

<?php
function mb_str_split($str) {
   // split multibyte string in characters
   // at all positions excepr the start: ^
   // and the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a) 
{
  $M=strlen($a);
  $p=ord($a[0]);                    if($M==1) return $p;
  $p-=194;  $p*=64; $p+=ord($a[1]); if($M==2) return $p;
  $p-=2050; $p*=64; $p+=ord($a[2]);           return $p;
}

$a='⼤,大;⼩,小'; # two pairs of different unicode characters separated with "," and ";"

$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";

$b = mb_str_split($a);

var_dump($b);
$M=count($b);

#mb_internal_encoding("UTF-8");

for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04x\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2x ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

The output is


The array has 15 bytes; here is its splitting:
e2 bc a4 2c e5 a4 a7 3b e2 bc a9 2c e5 b0 8f 
array(7) {
  [0]=>
  string(3) "⼤"
  [1]=>
  string(1) ","
  [2]=>
  string(3) "大"
  [3]=>
  string(1) ";"
  [4]=>
  string(3) "⼩"
  [5]=>
  string(1) ","
  [6]=>
  string(3) "小"
}

Unicode character number 12068 id est, x2f24
Picture: ⼤ uses 3 bytes. These bytes are:
xe2 xbc xa4 in the hexadecimal representation and
226 188 164 in the decimal representation

Unicode character number 00044 id est, x002c
Picture: , uses 1 bytes. These bytes are:
x2c in the hexadecimal representation and
 44 in the decimal representation

Unicode character number 22823 id est, x5927
Picture: 大 uses 3 bytes. These bytes are:
xe5 xa4 xa7 in the hexadecimal representation and
229 164 167 in the decimal representation

Unicode character number 00059 id est, x003b
Picture: ; uses 1 bytes. These bytes are:
x3b in the hexadecimal representation and
 59 in the decimal representation

Unicode character number 12073 id est, x2f29
Picture: ⼩ uses 3 bytes. These bytes are:
xe2 xbc xa9 in the hexadecimal representation and
226 188 169 in the decimal representation

Unicode character number 00044 id est, x002c
Picture: , uses 1 bytes. These bytes are:
x2c in the hexadecimal representation and
 44 in the decimal representation

Unicode character number 23567 id est, x5c0f
Picture: 小 uses 3 bytes. These bytes are:
xe5 xb0 x8f in the hexadecimal representation and
229 176 143 in the decimal representation

The program above uses the PHP functions mb_str_split.t and uniord.t that loaded here because they are not supported by default at the standard PHP paclage.
The program reveals the encoding of the four related Unicode characters:
( (& # 1 2 0 6 8 ;)),
( (& # 2 2 8 2 3 ;)),
( (& # 1 2 0 7 3 ;)),
( (& # 2 3 5 6 7 ;))

References

  1. https://0g0.org/unicode/5C0F/ 小 U+5C0F Unicode文字
  2. https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji system of representing written Japanese consists of 2,136 characters. ..
  3. https://en.wiktionary.org/wiki/%E5%B0%8F
    https://en.wiktionary.org/wiki/小 Han character Commons-logo.svg See images of Radical 42 小 小 (radical 42, 小+0, 3 strokes, cangjie input 弓金 (NC), four-corner 90000, composition ⿻亅八) Kangxi radical #42, ⼩. Shuowen Jiezi radical №15 ..

Keywords

Japanese, Kanji, SomeU, Unicode, Utf8, UtfH, Utf8table,

( (& # 1 2 0 6 8 ;)), ( (& # 2 2 8 2 3 ;)), ( (& # 1 2 0 7 3 ;)), ( (& # 2 3 5 6 7 ;))