Difference between revisions of "⼤"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
<div style="float:right;margin:-60px -14ps opx 8px"> |
<div style="float:right;margin:-60px -14ps opx 8px"> |
||
− | [[File:Gulliver2.jpg|300px]]<br><big><big><center> |
+ | [[File:Gulliver2.jpg|300px]]<br><big><big><center>岩の間の[[⼤]][[男]]</center></big></big> |
− | 岩の間の[[⼤]][[男]] |
||
− | </center></big><big> |
||
</div> |
</div> |
||
+ | |||
− | [[⼤]] is [[Unicode]] character number 12068. |
||
+ | [[⼤]] is [[Unicode]] character number 12068 (see [[Utf8table]]). |
||
Html input:<br> |
Html input:<br> |
||
Line 11: | Line 10: | ||
==Phonetic== |
==Phonetic== |
||
− | [[⼤]] may be pronounced as ダイ, "dai" |
+ | [[⼤]] may be pronounced as ダイ, "dai". |
==Semantic== |
==Semantic== |
||
Line 21: | Line 20: | ||
大学(だいがく、英: college、university).. |
大学(だいがく、英: college、university).. |
||
</ref> |
</ref> |
||
+ | |||
+ | ==Antonyms== |
||
+ | <div style="float:right;margin:-34px -14px 0px 12px"> |
||
+ | [[File:Lilliput2.jpg|220px]]<br><big><center> |
||
+ | 森の間の[[⼩]]<!--さな!-->男</center></big> |
||
+ | </div> |
||
+ | |||
+ | Unicode Character 12073; [[⼩]], html input:<br> |
||
+ | [[⼩]] ([[⼩]] (& # 1 2 0 7 3 ;))<br> |
||
+ | [[⼩]] ([[⼩]] (& # x 2 F 2 9 ;))<br> |
||
+ | |||
+ | may have opposite meaning: "little", "small", "petite", "pequeno", "klein". |
||
+ | |||
+ | The example is shown in figure at right. |
||
+ | |||
+ | In Japanese, often, [[⼩]] is followed with two hiragana symbols: [[⼩]]さい. |
||
+ | |||
+ | Unicode Character 23567 [[小]], html input:<br> |
||
+ | [[小]] ([[小]] (& # 2 3 5 6 7 ;))<br> |
||
+ | [[小]] ([[小]] (& # x 5 C 0 F ;))<br> |
||
+ | also can be considered as antonym of [[⼤]] |
||
+ | <ref> |
||
+ | https://en.wiktionary.org/wiki/%E5%B0%8F |
||
+ | </ref> |
||
+ | |||
+ | Characters <br> |
||
+ | [[⼩]] ([[⼩]] (& # 1 2 0 7 3 ;)) and <br> |
||
+ | [[小]] ([[小]] (& # 2 3 5 6 7 ;)) |
||
+ | are easy to confuse. |
||
+ | |||
+ | ==Encoding== |
||
+ | Character [[⼤]] is encoded with 3 bytes:<br> |
||
+ | 226 188 164 |
||
+ | |||
+ | The encoding of [[⼤]] and related characters can be seen with the [[PHP]] code below: |
||
+ | <pre> |
||
+ | <?php |
||
+ | function mb_str_split($str) { |
||
+ | // split multibyte string in characters |
||
+ | // Split at all positions, not after the start: ^ |
||
+ | // and not before the end: $ |
||
+ | $pattern = '/(?<!^)(?!$)/u'; |
||
+ | return preg_split($pattern,$str); |
||
+ | } |
||
+ | |||
+ | function uniord($a) |
||
+ | { |
||
+ | $M=strlen($a); |
||
+ | $p=ord($a[0]); if($M==1) return $p; |
||
+ | $p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p; |
||
+ | $p-=2050; $p*=64; $p+=ord($a[2]); return $p; |
||
+ | } |
||
+ | |||
+ | $a='⼤ 大 ⼩ 小'; /* two pairs of different unicode characters separated with spacebars */ |
||
+ | |||
+ | $N=strlen($a); |
||
+ | echo "The array has $N bytes; here is its splitting:\n"; |
||
+ | |||
+ | for($n=0;$n<$N;$n++) |
||
+ | { |
||
+ | printf("%02x ",ord($a[$n]) ); |
||
+ | } |
||
+ | echo "\n"; |
||
+ | |||
+ | $b = mb_str_split($a); |
||
+ | |||
+ | var_dump($b); |
||
+ | $M=count($b); |
||
+ | |||
+ | #mb_internal_encoding("UTF-8"); |
||
+ | |||
+ | for($m=0;$m<$M;$m++) |
||
+ | { |
||
+ | printf("\n"); |
||
+ | $c=$b[$m]; |
||
+ | $u=uniord($c); |
||
+ | printf("Unicode character number %05d id est, x%04x\n",$u,$u); |
||
+ | $d=strlen($c); |
||
+ | echo "Picture: $c uses $d bytes. These bytes are:\n"; |
||
+ | for($n=0;$n<$d;$n++) printf("x%2x ",ord($c[$n])); |
||
+ | printf("in the hexadecimal representation and\n"); |
||
+ | for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n])); |
||
+ | printf("in the decimal representation\n"); |
||
+ | } |
||
+ | ?> |
||
+ | </pre> |
||
+ | The output is: |
||
+ | |||
+ | <pre> |
||
+ | |||
+ | The array has 15 bytes; here is its splitting: |
||
+ | e2 bc a4 20 e5 a4 a7 20 e2 bc a9 20 e5 b0 8f |
||
+ | array(7) { |
||
+ | [0]=> |
||
+ | string(3) "⼤" |
||
+ | [1]=> |
||
+ | string(1) " " |
||
+ | [2]=> |
||
+ | string(3) "大" |
||
+ | [3]=> |
||
+ | string(1) " " |
||
+ | [4]=> |
||
+ | string(3) "⼩" |
||
+ | [5]=> |
||
+ | string(1) " " |
||
+ | [6]=> |
||
+ | string(3) "小" |
||
+ | } |
||
+ | |||
+ | Unicode character number 12068 id est, x2f24 |
||
+ | Picture: ⼤ uses 3 bytes. These bytes are: |
||
+ | xe2 xbc xa4 in the hexadecimal representation and |
||
+ | 226 188 164 in the decimal representation |
||
+ | |||
+ | Unicode character number 00032 id est, x0020 |
||
+ | Picture: uses 1 bytes. These bytes are: |
||
+ | x20 in the hexadecimal representation and |
||
+ | 32 in the decimal representation |
||
+ | |||
+ | Unicode character number 22823 id est, x5927 |
||
+ | Picture: 大 uses 3 bytes. These bytes are: |
||
+ | xe5 xa4 xa7 in the hexadecimal representation and |
||
+ | 229 164 167 in the decimal representation |
||
+ | |||
+ | Unicode character number 00032 id est, x0020 |
||
+ | Picture: uses 1 bytes. These bytes are: |
||
+ | x20 in the hexadecimal representation and |
||
+ | 32 in the decimal representation |
||
+ | |||
+ | Unicode character number 12073 id est, x2f29 |
||
+ | Picture: ⼩ uses 3 bytes. These bytes are: |
||
+ | xe2 xbc xa9 in the hexadecimal representation and |
||
+ | 226 188 169 in the decimal representation |
||
+ | |||
+ | Unicode character number 00032 id est, x0020 |
||
+ | Picture: uses 1 bytes. These bytes are: |
||
+ | x20 in the hexadecimal representation and |
||
+ | 32 in the decimal representation |
||
+ | |||
+ | Unicode character number 23567 id est, x5c0f |
||
+ | Picture: 小 uses 3 bytes. These bytes are: |
||
+ | xe5 xb0 x8f in the hexadecimal representation and |
||
+ | 229 176 143 in the decimal representation |
||
+ | </pre> |
||
==Confusion== |
==Confusion== |
||
With some softwares, character number 12068 ([[⼤]]) looks similar to character number 22823 |
With some softwares, character number 12068 ([[⼤]]) looks similar to character number 22823 |
||
− | ([[大]]) <ref> |
+ | ([[大]]) <ref> |
+ | https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji <br> |
||
+ | https://en.wikipedia.org/wiki/List_of_jōyō_kanji |
||
+ | [[大]] 大 3 1 large ダイ、タイ、おお、おお-きい、おお-いに |
||
+ | dai, tai, oo, oo-kii, oo-ini .. |
||
+ | </ref><ref>https://0g0.org/unicode/5927/ |
||
U+5927 Unicode文字 |
U+5927 Unicode文字 |
||
[[Unicode]] |
[[Unicode]] |
||
Line 55: | Line 203: | ||
The similarity in the graphical representations of characters [[⼤]] and [[大]] may cause confusions. |
The similarity in the graphical representations of characters [[⼤]] and [[大]] may cause confusions. |
||
+ | |||
+ | The main difference is, Character [[大]] (& # 2 2 8 2 3 ;) can be interpreted as [[Chinese]] [[Kanji]], |
||
+ | while [[⼤]] is interpreted as [[Japanese]] one. |
||
==References== |
==References== |
||
+ | This article can be referred as |
||
+ | https://mizugadro.mydns.jp/t/index.php/%E2%BC%A4 |
||
+ | |||
<references/> |
<references/> |
||
+ | https://en.wikipedia.org/wiki/List_of_jōyō_kanji |
||
+ | The jōyō kanji system of representing written Japanese consists of 2,136 characters. |
||
==Keywords== |
==Keywords== |
||
Line 66: | Line 222: | ||
[[UtfH]], |
[[UtfH]], |
||
[[Utf8table]], |
[[Utf8table]], |
||
+ | |||
− | [[⼤]],[[大]] |
||
+ | [[⼤]] [[⼤]] (& # 1 2 0 6 8 ;), |
||
+ | [[大]] [[大]] (& # 2 2 8 2 3 ;), |
||
+ | [[⼩]] ([[⼩]] (& # 1 2 0 7 3 ;)), |
||
+ | [[小]] ([[小]] (& # 2 3 5 6 7 ;)) |
||
[[Category:U12068]] |
[[Category:U12068]] |
Latest revision as of 21:01, 21 May 2021
⼤ is Unicode character number 12068 (see Utf8table).
Html input:
⼤ (& # 1 2 0 6 8 ;)
⼤ (& # x 2 F 2 4 ;)
Phonetic
⼤ may be pronounced as ダイ, "dai".
Semantic
⼤ may have sense "big", "large"; especially in combination 大きい.
大学 means "big school", id est, college or University; pronounced as だいがく. [1]
Antonyms
Unicode Character 12073; ⼩, html input:
⼩ (⼩ (& # 1 2 0 7 3 ;))
⼩ (⼩ (& # x 2 F 2 9 ;))
may have opposite meaning: "little", "small", "petite", "pequeno", "klein".
The example is shown in figure at right.
In Japanese, often, ⼩ is followed with two hiragana symbols: ⼩さい.
Unicode Character 23567 小, html input:
小 (小 (& # 2 3 5 6 7 ;))
小 (小 (& # x 5 C 0 F ;))
also can be considered as antonym of ⼤
[2]
Characters
⼩ (⼩ (& # 1 2 0 7 3 ;)) and
小 (小 (& # 2 3 5 6 7 ;))
are easy to confuse.
Encoding
Character ⼤ is encoded with 3 bytes:
226 188 164
The encoding of ⼤ and related characters can be seen with the PHP code below:
<?php function mb_str_split($str) { // split multibyte string in characters // Split at all positions, not after the start: ^ // and not before the end: $ $pattern = '/(?<!^)(?!$)/u'; return preg_split($pattern,$str); } function uniord($a) { $M=strlen($a); $p=ord($a[0]); if($M==1) return $p; $p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p; $p-=2050; $p*=64; $p+=ord($a[2]); return $p; } $a='⼤ 大 ⼩ 小'; /* two pairs of different unicode characters separated with spacebars */ $N=strlen($a); echo "The array has $N bytes; here is its splitting:\n"; for($n=0;$n<$N;$n++) { printf("%02x ",ord($a[$n]) ); } echo "\n"; $b = mb_str_split($a); var_dump($b); $M=count($b); #mb_internal_encoding("UTF-8"); for($m=0;$m<$M;$m++) { printf("\n"); $c=$b[$m]; $u=uniord($c); printf("Unicode character number %05d id est, x%04x\n",$u,$u); $d=strlen($c); echo "Picture: $c uses $d bytes. These bytes are:\n"; for($n=0;$n<$d;$n++) printf("x%2x ",ord($c[$n])); printf("in the hexadecimal representation and\n"); for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n])); printf("in the decimal representation\n"); } ?>
The output is:
The array has 15 bytes; here is its splitting: e2 bc a4 20 e5 a4 a7 20 e2 bc a9 20 e5 b0 8f array(7) { [0]=> string(3) "⼤" [1]=> string(1) " " [2]=> string(3) "大" [3]=> string(1) " " [4]=> string(3) "⼩" [5]=> string(1) " " [6]=> string(3) "小" } Unicode character number 12068 id est, x2f24 Picture: ⼤ uses 3 bytes. These bytes are: xe2 xbc xa4 in the hexadecimal representation and 226 188 164 in the decimal representation Unicode character number 00032 id est, x0020 Picture: uses 1 bytes. These bytes are: x20 in the hexadecimal representation and 32 in the decimal representation Unicode character number 22823 id est, x5927 Picture: 大 uses 3 bytes. These bytes are: xe5 xa4 xa7 in the hexadecimal representation and 229 164 167 in the decimal representation Unicode character number 00032 id est, x0020 Picture: uses 1 bytes. These bytes are: x20 in the hexadecimal representation and 32 in the decimal representation Unicode character number 12073 id est, x2f29 Picture: ⼩ uses 3 bytes. These bytes are: xe2 xbc xa9 in the hexadecimal representation and 226 188 169 in the decimal representation Unicode character number 00032 id est, x0020 Picture: uses 1 bytes. These bytes are: x20 in the hexadecimal representation and 32 in the decimal representation Unicode character number 23567 id est, x5c0f Picture: 小 uses 3 bytes. These bytes are: xe5 xb0 x8f in the hexadecimal representation and 229 176 143 in the decimal representation
Confusion
With some softwares, character number 12068 (⼤) looks similar to character number 22823
(大) [3][4].
Html input:
大 (& # 2 2 8 2 3 ;)
大 (& # x 5 9 2 7 ;)
The similarity in the graphical representations of characters ⼤ and 大 may cause confusions.
The main difference is, Character 大 (& # 2 2 8 2 3 ;) can be interpreted as Chinese Kanji, while ⼤ is interpreted as Japanese one.
References
This article can be referred as https://mizugadro.mydns.jp/t/index.php/%E2%BC%A4
- ↑ https://ja.wikipedia.org/wiki/大学 大学(だいがく、英: college、university)..
- ↑ https://en.wiktionary.org/wiki/%E5%B0%8F
- ↑
https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji
https://en.wikipedia.org/wiki/List_of_jōyō_kanji 大 大 3 1 large ダイ、タイ、おお、おお-きい、おお-いに dai, tai, oo, oo-kii, oo-ini .. - ↑ https://0g0.org/unicode/5927/ U+5927 Unicode文字 Unicode U+5927 大 分類 CJK統合漢字 CJK Unified Ideographs - 3 数値文字参照 大 大 URLエンコード(UTF-8) %E5%A4%A7 URLエンコード(EUC-JP) %C2%E7 URLエンコード(SHIFT_JIS) %91%E5 ユニコード名 CJK UNIFIED IDEOGRAPH-5927 一般カテゴリ- Letter, Other(文字,その他) 文字化けする可能性のある文字 UTF-16 : ꓥ� Shift_JIS : 螟ァ CP932 : 螟ァ EUC-JP : 紊� Base64エンコード : 5aSn
https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji system of representing written Japanese consists of 2,136 characters.
Keywords
Japanese, Kanji, SomeU, Unicode, UtfH, Utf8table,
⼤ ⼤ (& # 1 2 0 6 8 ;), 大 大 (& # 2 2 8 2 3 ;), ⼩ (⼩ (& # 1 2 0 7 3 ;)), 小 (小 (& # 2 3 5 6 7 ;))