Unicode is way of unified encoding of characters, used in the Human and algoritmic languages.
The first 127 characters of Unicode correspond to usual Ascii. (At least in the Utf8 encoding). The ascii characters are recommended in all doubtful cases, as they are well represented in all encoding systems; the graphical representation is almost the same for all software. The English abc is abc everywhere, even in Africa.
php du.t ±⼟⼠土士
File du.t should be loaded in order to execute the command above. The output is
The array has 14 bytes; here is its splitting:
c2 b1 e2 bc 9f e2 bc a0 e5 9c 9f e5 a3 ab
Unicode character number 00177 id est, X00B1
Picture: ± ; uses 2 bytes. These bytes are:
XC2 XB1 in the hexadecimal representation and
194 177 in the decimal representation
Unicode character number 12063 id est, X2F1F
Picture: ⼟ ; uses 3 bytes. These bytes are:
XE2 XBC X9F in the hexadecimal representation and
226 188 159 in the decimal representation
Unicode character number 12064 id est, X2F20
Picture: ⼠ ; uses 3 bytes. These bytes are:
XE2 XBC XA0 in the hexadecimal representation and
226 188 160 in the decimal representation
Unicode character number 22303 id est, X571F
Picture: 土 ; uses 3 bytes. These bytes are:
XE5 X9C X9F in the hexadecimal representation and
229 156 159 in the decimal representation
Unicode character number 22763 id est, X58EB
Picture: 士 ; uses 3 bytes. These bytes are:
XE5 XA3 XAB in the hexadecimal representation and
229 163 171 in the decimal representation
In cases of a confusion, the unicode characters should be specified with their ascii representations, for example, in the hexadecimal form. In such a way,
X2F1F should be written instead of ±;
X2F25 should be written instead of ⼥ and so on.
Even Japanese native speakers, looking at characters ⼥ and 女, cannot guess, which of them is X2F25 and which is X5973.
Synonyms and confusions
Many unicode symbols have established pictures. Often, the same or similar picture correspond to different unicode characters. For example, the unicode characters X2F25, X5973, XF981 have puctures ⼥, 女, 女, that look very similar; they have similar sense and may be considered as synonyms.
- https://util.unicode.org/UnicodeJsps/character.jsp?a=58EB Unicode Utilities: Character Properties. 58EB CJK UNIFIED IDEOGRAPH-58EB Han Script id: restricted confuse: ⼠ , 土 , ⼟
https://unicode-table.com/en/blocks/cjk-unified-ideographs-extension-a/ CJK Unified Ideographs Extension A Range: 3400—4DBF Quantity of characters: 6592
https://unicode-table.com/en/blocks/cjk-unified-ideographs/ CJK Unified Ideographs Range: 4E00—9FFF Quantity of characters: 20992