From TORI
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Gulliver2.jpg
岩の間の

is Unicode character number 12068 (see Utf8table).

Html input:
(& # 1 2 0 6 8 ;)
(& # x 2 F 2 4 ;)

Phonetic

may be pronounced as ダイ, "dai".

Semantic

may have sense "big", "large"; especially in combination 大きい.

大学 means "big school", id est, college or University; pronounced as だいがく. [1]

Antonyms

Lilliput2.jpg
森の間の

Unicode Character 12073; , html input:
( (& # 1 2 0 7 3 ;))
( (& # x 2 F 2 9 ;))

may have opposite meaning: "little", "small", "petite", "pequeno", "klein".

The example is shown in figure at right.

In Japanese, often, is followed with two hiragana symbols: さい.

Unicode Character 23567 , html input:
( (& # 2 3 5 6 7 ;))
( (& # x 5 C 0 F ;))
also can be considered as antonym of [2]

Characters
( (& # 1 2 0 7 3 ;)) and
( (& # 2 3 5 6 7 ;)) are easy to confuse.

Encoding

Character is encoded with 3 bytes:
226 188 164

The encoding of and related characters can be seen with the PHP code below:

<?php
function mb_str_split($str) {
   // split multibyte string in characters
   // Split at all positions, not after the start: ^
   // and not before the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a) 
 {
   $M=strlen($a);
   $p=ord($a[0]);                    if($M==1) return $p;
   $p-=194;  $p*=64; $p+=ord($a[1]); if($M==2) return $p;
   $p-=2050; $p*=64; $p+=ord($a[2]);           return $p;
 }

$a='⼤ 大 ⼩ 小'; /* two pairs of different unicode characters separated with spacebars */

$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++)
{
printf("%02x ",ord($a[$n]) );
}
echo "\n";

$b = mb_str_split($a);

var_dump($b);
$M=count($b);

#mb_internal_encoding("UTF-8");

for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04x\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2x ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

The output is:


The array has 15 bytes; here is its splitting:
e2 bc a4 20 e5 a4 a7 20 e2 bc a9 20 e5 b0 8f 
array(7) {
  [0]=>
  string(3) "⼤"
  [1]=>
  string(1) " "
  [2]=>
  string(3) "大"
  [3]=>
  string(1) " "
  [4]=>
  string(3) "⼩"
  [5]=>
  string(1) " "
  [6]=>
  string(3) "小"
}

Unicode character number 12068 id est, x2f24
Picture: ⼤ uses 3 bytes. These bytes are:
xe2 xbc xa4 in the hexadecimal representation and
226 188 164 in the decimal representation

Unicode character number 00032 id est, x0020
Picture:   uses 1 bytes. These bytes are:
x20 in the hexadecimal representation and
 32 in the decimal representation

Unicode character number 22823 id est, x5927
Picture: 大 uses 3 bytes. These bytes are:
xe5 xa4 xa7 in the hexadecimal representation and
229 164 167 in the decimal representation

Unicode character number 00032 id est, x0020
Picture:   uses 1 bytes. These bytes are:
x20 in the hexadecimal representation and
 32 in the decimal representation

Unicode character number 12073 id est, x2f29
Picture: ⼩ uses 3 bytes. These bytes are:
xe2 xbc xa9 in the hexadecimal representation and
226 188 169 in the decimal representation

Unicode character number 00032 id est, x0020
Picture:   uses 1 bytes. These bytes are:
x20 in the hexadecimal representation and
 32 in the decimal representation

Unicode character number 23567 id est, x5c0f
Picture: 小 uses 3 bytes. These bytes are:
xe5 xb0 x8f in the hexadecimal representation and
229 176 143 in the decimal representation

Confusion

With some softwares, character number 12068 () looks similar to character number 22823 () [3][4].
Html input:
(& # 2 2 8 2 3 ;)
(& # x 5 9 2 7 ;)

The similarity in the graphical representations of characters and may cause confusions.

The main difference is, Character (& # 2 2 8 2 3 ;) can be interpreted as Chinese Kanji, while is interpreted as Japanese one.

References

This article can be referred as https://mizugadro.mydns.jp/t/index.php/%E2%BC%A4

  1. https://ja.wikipedia.org/wiki/大学 大学(だいがく、英: college、university)..
  2. https://en.wiktionary.org/wiki/%E5%B0%8F
  3. https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji
    https://en.wikipedia.org/wiki/List_of_jōyō_kanji 大 3 1 large ダイ、タイ、おお、おお-きい、おお-いに dai, tai, oo, oo-kii, oo-ini ..
  4. https://0g0.org/unicode/5927/ U+5927 Unicode文字 Unicode U+5927 大 分類 CJK統合漢字 CJK Unified Ideographs - 3 数値文字参照 大 大 URLエンコード(UTF-8) %E5%A4%A7 URLエンコード(EUC-JP) %C2%E7 URLエンコード(SHIFT_JIS) %91%E5 ユニコード名 CJK UNIFIED IDEOGRAPH-5927 一般カテゴリ- Letter, Other(文字,その他) 文字化けする可能性のある文字 UTF-16 : ꓥ� Shift_JIS : 螟ァ CP932 : 螟ァ EUC-JP : 紊� Base64エンコード : 5aSn

https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji system of representing written Japanese consists of 2,136 characters.

Keywords

Japanese, Kanji, SomeU, Unicode, UtfH, Utf8table,

(& # 1 2 0 6 8 ;), (& # 2 2 8 2 3 ;), ( (& # 1 2 0 7 3 ;)), ( (& # 2 3 5 6 7 ;))