Glyph
A glyph is a simple picture used to represent a character (usually, a Unicode character) in the Human-recognizable form. Often, it is a letter of some alphabet or a Kanji, shown on the screen or printed on the paper.
In century 21, a Glyph appears as a visible picture, a shape that represents one or more characters.
Each glyph has its own visual form, while a character is a numerical code unit - an integer, usually expressed in one, two, or three bytes.
Glyphs have formed the basis of written Human civilization for thousands of years.
In a consistent writing system, every visible glyph should correspond to one and only one character, and vice versa.
In practice, in the standard table of characters (Unicode), this one-to-one mapping is often violated, producing confusion and errors in text processing, education, and information exchange.
Purpose of this article is to help programmers, teachers, students (and also Top Editor of TORI)
avoid misuse of relevant terms and to promote consistent, bijective mapping between glyphs and characters in digital systems.
In order to show that the problem is serious, the technical language Tarja is suggested as a simplified version of Japanese avoiding any glyph that is not related to a single character.
Overview
Historically, during thousands years, the wrote traces of various Human civilizations show that very similar small laconic pictures appear again and again.
These pictures are denoted with term glyphs. The classification and ordering of these pictures is denoted with term «Alphabet».
In the concept of alphabet, the similar pictures of the same language, drawn, written, printed or displayed in various texts, by various writers, by various softwares are considered as equivalent.
In the similar way, in arithmetic, sets with equal amount of elements are qualified as equivalent. The corresponding class of equivalence is denoted with term «number».
In analogy, in description and analysis of languages, the class of equivalence of similar letters written by various writers is considered as a glyph.
In this concept, similar letters, written in various ways - on a stone, on a clay, on a wood, on a paper - are considered as equivalent; they belong to the same class of equivalence.
This concept excludes the doubtful cases, when the letter is poorly drawn and cannot be certainly identified as element of any of the classes of equivalence mentioned.
The invention of printing reduced the amount of ambiguous terms in the writings.
With printing, the classification of glyphs gets the subcassification; in many cases, the pictures of the same Glyph are sub-classified with subsets; the new class of equivalence involves the specification of not only the glyph, but also a mode it is drown, id est, the font.
The abuse of various exotic fonts can make the glyphs barely distinguishable, if at al.
The standardization of glyphs even advances with appearance of computers.
In century 20, the glyphs are considered as primary items, while the numeration (encoding) appears as something secondary and less important. There are many ways of encoding of glyphs; that caused confusions.
In century 21, with automatic treatment of texts, the unified way to numerate glyphs becomes standard, it is so-called «Unicode». Then, the Unicode number of an encoded glyph is denoted with term «character»; and namely character becomes the primary item.
However the Glyphs remain the important tool in exchange or information between Humans and between a Human and a computer or any other gadget.
Terminology
Distinction between glyph, character, letter, symbol, and byte is shown in table below:
| Term | TORI interpretation | Typical misuse / ambiguity |
|---|---|---|
| Byte | The smallest addressable unit of computer memory, 8 bits. | Often confused with “character” in one-byte encodings such as ASCII. |
| Character | Numerical code unit (usually 1-3 bytes) representing a symbolic item of writing. | Sometimes called “letter” or “symbol.” |
| Glyph | Visible image of a character in some font or handwriting. | Commonly confused with “character,” especially in Unicode terminology. |
| Letter | Concept of a particular alphabet; one letter may correspond to several characters (uppercase, lowercase, variants). | Used where “glyph” or “character” would be more precise. |
| Symbol | General sign that may represent a concept, not necessarily linguistic. Symbol may combine many characters. Each name can be qualified as a symbol | Used broadly in many senses. |
In TORI, the additional concept «Kanji» («漢字») is used.
Kanji may refer to a 3-byte character used in Chinese and Japanese and/or Korean language; but also to a glyph
that corresponds to at least one of such characters.
Term «Kanji» may also refer to the set of such characters and/or their glyphs.
Kanjis with numbers since X4E00 to X4DB5 are especially popular in writing; they are denoted with term «CJK»[1] of «KanjiLiberal» (Named in analogy with the KanjiRadical).
Characters with numbers X3041 - X3096 are qualified as Hiragana and
Characters with numbers X30A1 - X30F6 are qualified as Katakana.
The same refer to their glyphs; they also are denoted with terms «Hiragana» and «Katakana».
History
Since the beginning of development of the writing systems, the attempts of standardization take place.
The goal is to provide the one-to-one relation between a symbol (for example, a Unicode character, one of few bytes) and various pictures, various versions of the corresponding glyph.
Then, a text, written by some writer, can be recognized, identified by another writer.
The style of writing, that does not allow this, is denoted with term «illegible handwriting» («escritura ilegible», «неразборчивый почерк», «判読できない筆跡», «Handoku dekinai hisseki»).
The similar problem exist with with relation between a character and the corresponding glyph.
English
At least for the English Alphabet, this problem had been resolved in century 21 with invention of Ascii.
Since that, each character of English language is encoded with a single byte.
The AsciiTable suggests the default glyph for each ascii character and default character, default encoding for each English glyph.
However, at some sabotage at side of the font designers, some different ascii characters may have similar Glyphs.
For example, letter «O» may look similar to decimal digit 0; letter «I» may look similar to the lowercase «l» or to vertical bar «|», spacebar (character number 32) may look similar to some of the special characters (from byte number 0 to byte number 31) or to some multibyte characters appearing as empty space.
Fortunately, so crazy fonts not so often appear as a default.
In the most of cases, just looking at a glyph of an ascii character, one can identify this character.
Often, versions of a glyph in various fonts, in various softwares look similar and allow to recognize that they correspond to the same glyph.
Other languages are not so lucky as English.
European languages
Many European languages used accents (Accent grave, accent aigu, accent circonflexe), Umlautes and even additional letters). The glyps for these symbols were encoded with bytes since 128 to 256.
Then it happens that the most of computers can write only in two languages, English and someone else. The encoding were not so compatible and caused confusions.
This problem was partially resolved only with invention and spreading of Unicode; practically, since century 21.
Asian languages
Even worse the problem with standardization of glyphs happen to be for languages that use Cyrillic alphabet or Kanji.
In particular, this touched Russia ("РСФСР") and Ukraine (that time, both these countries were under occupation of the USSR, as many other "Soviet" republics).
During the Soviet period, the Soviet offees used to withdraw the support of Russian characters from the software and the printers. Many Russian and Ukrainian enthusiasts used to make the support of Russian and Urkrainian characters by themselves. They were persecuted (see «Большевики убили почти всех», in Russian). It was one of elements of so-called «Iron Curtain» («Железный занавес», in Russian). No standard for representation of Russian glyphs had been established in the USSR.
After the collapse of USSR (practically, already in century 21), the problem with the computer support of the Russian alphabet had been almost resolved with the Unicode.
However, in Unicode, each Russian character is encoded with 2 bytes instead of a single byte of English.
In such a way, the Soviet offees injured the Russian language: Any text in Russian happens to be almost twice longer than its English version. Due to such a drastic difference in the performance, Russian language has few chances to survive in competition with English. (In the sci-fi utopia by Aleksandr Rozov about Meganesia, Russia already does not exist).
Kanjis
Many Unicode characters are supplied with glyphs.
Some of them refer to Kanjis.
These glyphs look similar to the basic elements of writing languages that were used before computers and even before the invention of printing.
At the beginning one saw no big deal in the fact that the same glyph appears several times in the Unicode alphabet.
The problem becomes serious at the automatic treatments of texts.
Similar problem took place before invention of the grammar rules, that fixes the orthography of written words.
Usually, the teachers of Japanese and manuals of Japanese just ignore the ambiguity of glyphs, they do not mention that the same glyph may refer to different characters, causing the confuses.
This ignorance leads to the serious problem: the pupils meet this confusion being not prepared for it. This cause mistakes, errors, accidents, catastrophe, disasters.
One (a little bit invented and a little bit exaggerated) example
of such a disaster is described in article «EarthquakesAndTsunamis»
[2]
The attempt to generate the Japanese version of that example using ChatGPT is performed; the result ls loaded as «Unicode と つなみ».
Editor leaves the colleague to estimate, how successive is this attempt,
and to qualify the readability and the usefulness of article «Unicode と つなみ».
Confusions
In Unicode, several Glyphs of Kanjis do not have unique Unicode number, do not have unique encoding; this cause confuses.
In other words, for a confusive glyph, no certain Unicode character is specified.
Few examples are considered in this section.
Rectangles
The five characters ⼝,⼞,ロ,口,ロ have similar glyphs or even the same glyph, dependently on the software.
Even a native Japanese speaker, looking at characters
⼝,
⼞,
ロ,
口,
ロ,
is unlikely to quess:
which of them is X2F1D (KanjiRadical)?
which of them is X2F1E (KanjiRadical)?
which of them is X30ED (Katkana "ro")?
which of them is X53E3 (KanjiLiberal)?
which of them is XFF9B ( Character from the UnicodeBottom table)?
Some programming is necessary to identify each of these characters.
Their Unicode numbers and the Utf8 encodings can be revealed with PHP program du.t,
with command
php du.t ⼝⼞ロ口ロ
In such a way, the glyph "square" is not yet assigned a unique exclusive Unicode number.
For this reason, such a glyph and the corresponding characters are better to avoid in technical texts.
Nichi
In TORI, term «Nichi»
denotes the set of the following Unicode characters:
⽇ X2F47 [3], KanjiRadical
⽈ X2F48 [4], KanjiRadical
日 X65E5 [5], KanjiLiberal, CJK
曰 X66F0 [6], KanjiLiberal, CJK
Also, term «Nichi» may refer to any of these four characters.
These characters correspond to the same Glyph; this Glyph is also denoted with term «Nichi».
Even a native Japanese speaker, looking at pictures
«⽇»,
«⽈»,
«日»,
«曰»,
is unlikely to guess:
Which of them correspond to Unicode character X2F47?
Which of them correspond to Unicode character X2F48?
Which of them correspond to Unicode character X65E5?
Which of them correspond to Unicode character X66F0?
In this sense, these pictures are the same glyph denoted with term «Nichi».
Chikara
In TORI, term «Chikara» denotes the set of the following characters:
Also, term Chikara may refer to sound, denoted with any of the characters カ, ⼒, 力, 力; in most of cases, it is either "ka" or "Chikara".
Also, term Chikara may refer to an object, denoted with any of characters カ , ⼒ , 力 , 力 . Term Chikara has many meanings [11][12][13][14][15][16][17]. Some of them refer to Energy, Force or Power. The four possible meanings of Glyph «Chikara» and term «Chikara» are illustrated with pictures below: energy, force, power, mosquito.
It could be a good idea, to assign the four different meanings to the four different characters of set Chikara.
However, in this case, each of the four characters of the set «Chikara» should have its unique glyph that is easy to distinguish from each of the three other Chikara characters. Some simplifications of the pictures above could be used as prototypes of these glyphs - however, if the software designers of many countries agree to take these as elements of a new default standard font for various softwares.
While this did not yet happen, it may have sense to avoid glyph Chikara, avoid any of characters of set Chikara, replacing them with the phonetic transliteration or with a word borrowed from another language - for example, «energy», «force», «power» or «mosquito», dependently on the desirable meaning.
Onna
In the example above, the same Glyph «Chikara» refers to four different characters and may have four different meanings (although the one-to-one relation between these characters and these meanings is not yet established).
In this subsection, the example is considered where the same Glyph
corresponds to 3 different characters, that have the same meanings:
Woman, Female, Hembra, Femme, as it is shown in the figure at right.
In TORI, this Glyph and the set these three characters are denoted with term «Onna». These characters are:
12069 (X2F25, ⼥), KanjiRadical
22899 (X5973, 女), KanjiLiberal, CJK
63873 (XF981, 女), KanjiConfudal
Even a native Japanese speaker, watching pictures
⼥,
女,
女
is unlikely to guess:
Which or these pictures refer to character X2F25?
Which or these pictures refer to character X5973?
Which or these pictures refer to character XF981?
However, the Japanese speaker, watching any of these pictures,
can easy guess, which glyph does it correspond to,
and, perhaps immediately says: «これ は おんな てす!»
(Kore ha onna desu!), even without to open the table of the Unicode characters.
For this reason, all the three pictures are qualified as the same Glyph, denoted with term «Onna».
Character X5BB6 «いえ»
The examples above show cases where the same glyph corresponds to various characters. In this section, the opposite case is shown: the same character X5BB6 «いえ» has at least two different glyphs, and the software shows one of them without any warning about existence of another glyph for the same character.
In Japanese, word いえ (Ie) may denote a home, a house - for example, that shown in figure at right.
The same can be expressed with either of the two glyphs shown in the figure at right. Both the glyphs correspond to the same Unicode character X5BB6 家.
The first picture on the figure shows the glyph that corresponds to character X5BB6 at the Macintosh default font; the second one shows the glyph that appears at Linux.
In such a way the two glyphs «Ie» correspond to the same character X5BB6: glyph «Ie Macintosh» and glyph «Ie Linux».
Unicode character X5BB6 has no unique default glyph.
Problem
The examples above show that there is no one-to-one correspondence between the glyphs and the characters.
It may happen that one character corresponds to several glyphs, and it may happen that the single glyph corresponds to several characters. Often this refers to confusion of KanjiRadicals with KanjiLiberals and sometimes also with KanjiConfudals.
The problem mentioned above coexists with other ambiguities: the same glyph may have different pronunciations, different tranliterations, and the same Hiragana word (sequence of Hiragana characters) may correspond to various different glyphs.
The apparent confusions arise the question: is it possible to use only those characters, that have only one unique and exclusive default glyph at various softwares?
This question is considered in the next section.
Tarja
The special language «Tarja» is under construction in order to handle the problem of ambiguity of Japanese Glyphs.
However, the designers of the software have their right to assign many characters to the same Glyph.
On the other hand, the researchers, who use this software, have their right to avoid, to boycott any ambiguous Glyph, that is not yet supplied with the unique encoding and does not yet correspond to some unique Unicode character.
In TORI, the attempt to avoid the confusion mentioned is denoted with term «Tarja».
Tarja is artificial, technical language designed for learning and interpretation of Japanese.
Tarja uses ascii characters and also those Japanese glyphs, that have assigned a unique Unicode character as default at the free software popular at the beginning of century 21.
The ambiguous glyphs (those that do not yet have unique exclusive Unicode number) are replaced with Hiragana, with Romaji transliterations and/or with words borrowed from other languages.
Other grammar of Tarja is borrowed from Japanese - at least in cases that cause no ambiguity.
Pronunciation and Semantics of Tarja is kept close to Japanese - as close as possible.
In order to simplify the reading in Tarja, the words are separated with a spacebar.
The Tarja-Japanese dictionary is under construction.
Warning
This article is loaded at TORI with scientific goals.
The interpretation suggested is not an appeal to the Extrajudicial punishment of the font designers that draw very similar (and sometimes identical) glyphs for different characters.
More civilized way to deal with such a sabotage is boycott of glyph that are not yet supplied with standard default encoding, avoiding characters that are not yet supplied with unique exclusive glyph that allows to identify the character (without a need to refer the context for the identification).
For Japanese Kanjis, the technical language Tarja is under construction; it is expected to avoid at least some of the confuses in representing of texts as a sequence of glyphs - as it takes place at the showing a text in a printed book, newspaper, as it appear at a screen of a computer of another gadget, etc.
References
- ↑ http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml CJK unifed ideographs - Common and uncommon kanji ( 4e00 - 9faf)
- ↑ In TORI the spacebars in the name of an article are omitted, if the article refers to a case specific for TORI. This allows to reserve the name with spacebars for an article about commonly accepted term usual through many sites. Absence of spacebars in the name of an article indicates that the term described is specific for TORI and, perhaps, is a little bit invented.
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=2F47 ⽇ 2F47 KANGXI RADICAL SUN Han Script id: allowed confuse: 日
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=2F48 ⽈ 2F48 KANGXI RADICAL SAY Han Script id: allowed confuse: 曰
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=65E5 日 65E5 CJK UNIFIED IDEOGRAPH-65E5 Han Script id: restricted confuse: ⽇
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=66F0 曰 66F0 CJK UNIFIED IDEOGRAPH-66F0 Han Script id: restricted confuse: ⽈
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=2F12 ⼒ 2F12 KANGXI RADICAL POWER Han Script id: allowed confuse: 力 , 力 , カ
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=30AB カ 30AB KATAKANA LETTER KA Katakana Script id: restricted confuse: ⼒, 力, 力
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=529B 力 529B CJK UNIFIED IDEOGRAPH-529B Han Script id: restricted confuse: ⼒, 力, カ
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=F98A 力 F98A CJK COMPATIBILITY IDEOGRAPH-F98A Han Script id: allowed confuse: ⼒, 力, カ
- ↑ https://en.wikipedia.org/wiki/Chikara 力, (Chikara), the Japanese word meaning power, capability, or influence The Four-horned Antelope, Tetraceros quadricornis Chikara (given name) Chikara (instrument), a stringed instrument from India. Chikara-mizu (力水), a ritual at the beginning of a sumo match Chikara (album), a compilation album by rock band Kiss Chikara (professional wrestling), a professional wrestling organization
- ↑ https://en.wiktionary.org/wiki/力 See also: カ, 九 and 丸 Wikipedia has articles on: 力 (Written Standard Chinese?) 力 (Cantonese) 力 (Gan) li̍t (Hakka) la̍t (Min Nan) 力 (Wu)
- ↑ https://zh.wikipedia.org/wiki/力 力 [编辑] 维基百科,自由的百科全书 ..
- ↑ https://zh-yue.wikipedia.org/wiki/力 力 出自維基百科,自由嘅百科全書 跳去導覽跳去搵嘢 Disambig.svg 想搵第個意思,請睇「力 (搞清楚)」。 唔同種類嘅力 力(Force)係一種物理概念,係物體同物體之間嘅相互作用。傳統物理觀念認為,令到有質量物體加速(或者減速)嘅一種影響,就係力。我們無論對一件物體拉開或者推開,都有力加諸喺物體上。
- ↑ https://gan.wikipedia.org/wiki/力 力 跳至導覽跳至搜尋 物理來話,傳統上讓有質量嗰物體加速(或者減速)嗰一種影響就係力。力係矢量,定義係動量改變嗰速率,佢咁有方向跟到佢。國際標準單位係牛頓。
- ↑ https://wuu.wikipedia.org/wiki/力 力 吴语维基百科,自由个百科全书 跳到导航跳到搜索 弗同種類个力 力(Force)是一種物理概念,是物體搭物體之間个相互作用。傳統物理觀念認爲,令到有質量物體加速(或者減速)个一種影響,就是力。我等無論對一件物體拉開或者推開,儕有力加垃物體丄。
- ↑ https://www.chikaracrossfit.com Chikara Intramural CrossFit Open (2021)
- ↑ https://commons.wikimedia.org/wiki/File:Human_female.jpg English: Naked female human body. Русский: Обнаженная женщина. English: Model name: (preferred not to be stated) At time of photograph: Age: 40 Height: 166 cm Weight: 47 kg BMI: 17.1 Ornaments: Ear piercing, ring on left ring finger (not in retouched images), nail polish on toe nails. There is some tilting of the upper trunk towards the left of the body, which may be positional or anatomical. Date 29 September 2011 Source Own work Author Taken at City Studios in Stockholm (www.stockholmsfotografen.se), September 29, 2011, with assistance from KYO (The organisation of life models) in Stockholm. Both models have consented to the licence of the image, and its usage in Wikipedia. Image uploaded by Mikael Häggström.
- ↑ https://jisho.org/search/%23kanji%20%E5%A5%B3 https://jisho.org/search/%23kanji_女 woman, female Kun: おんな、 め On: ジョ、 ニョ、 ニョウ Jōyō kanji, taught in grade 1 JLPT level N5 151 of 2500 most used kanji in newspapers On reading compounds 女 【ジョ】 woman, girl, daughter, Chinese "Girl" constellation (one of the 28 mansions) 女王 【ジョオウ】 queen, female champion 処女 【ショジョ】 virgin, maiden 一女 【イチジョ】 one daughter, eldest daughter, first-born daughter 女王 【ジョオウ】 queen, female champion 女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest) 老若男女 【ロウニャクナンニョ】 men and women of all ages 天女 【テンニョ】 heavenly nymph, celestial maiden, beautiful and kind woman 女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest) 女官 【ジョカン】 court lady, lady-in-waiting Kun reading compounds 女 【おんな】 female, woman, female sex, female lover, girlfriend, mistress, (someone's) woman 女形 【おんながた】 onnagata, male actor in female kabuki roles, female partner (in a relationship) 醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon 囲い女 【かこいおんな】 mistress 雌 【め】 female, smaller (of the two), weaker, woman, wife 女神 【めがみ】 goddess, female deity 早乙女 【さおとめ】 young female rice planter, young girl 醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon
- ↑ http://www.genebuil.co.jp/gallery/よこすかのいえ/ 施工事例 よこすかのいえ (2021)
- ↑ https://util.unicode.org/UnicodeJsps/character.jsp?a=5BB6 家 5BB6 CJK UNIFIED IDEOGRAPH-5BB6 Han Script id: restricted confuse: none ..
https://en.wikipedia.org/wiki/Glyph A glyph (/ɡlɪf/ GLIF) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character".[1] It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme (such as a diacritic), or sometimes several graphemes in combination (a composed glyph)[a] can be represented by a glyph. ..
https://en.citizendium.org/wiki/Letter_(alphabet) Letter (alphabet) A letter is an element of a writing system. Writing systems include alphabets, abjads, abugidas and syllabaries. As components of writing systems, letters are associated with symbols — also called signs, characters, glyphs and letterforms. A broader term than "letter" is grapheme; it is used for the "atomic units of writing", which include other marks in addition to letters such as punctuation and numerals.
https://en.wikipedia.org/wiki/Kanji Kanji (/ˈkændʒi, ˈkɑːn-/;[1] Japanese: 漢字, pronounced [kaɲ.dʑi] ⓘ ,'Chinese characters'[2][3]) are logographic Chinese characters, adapted from Chinese script, used in the writing of Japanese.[4] //
https://en.wikipedia.org/wiki/List_of_jōyō_kanji The jōyō kanji (常用漢字; Japanese pronunciation: [dʑoːjoːkaꜜɲdʑi], lit. "regular-use kanji") system of representing written Japanese currently consists of 2,136 characters.
https://glyphwiki.org/wiki/GlyphWiki:メインページ グリフウィキ(GlyphWiki)は、明朝体の漢字グリフ(漢字字形)を登録・管理し、皆で自由に共有することを目的としたウィキです。..
http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html
https://www.edrdg.org/~jwb/paperdir/kanjicomp.html Kanji and the Computer A Brief History of Japanese Character Set Standards // James Breen, Monash University // (2025)