Bijective graphical representation

From TORI
Revision as of 15:55, 21 October 2025 by T (talk | contribs) (Created page with "{{top}} Bijective graphical representation is property of a character (especially, a Unicode character) characterized in that<br> that the character is assigned a we...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Bijective graphical representation is property of a character (especially, a Unicode character) characterized in that
that the character is assigned a well established default picture, and this picture is unique: the character looks the same at different computers, but no other Unicode character has the same picture.

Many Unicode characters have no Bijective graphical representation.

These characters are qualified as ConfusiveCharacters.

Overview

Sometimes, the picture of the character depends on the software used, and it is difficult to guess, that the same character is shown.

Sometimes, the same picture is assigned to different characters. So it happens with some Japanese Kanjis.

It seems, the professional programmers developing the fonts for the Unicode characters do not care that some pictures they assign to the characters reproduce pictures already assigned to other characters.

The problem is grave: teachers of Japanese and the manuals, textbooks of Japanese ignore the confusion with Kanjis. They also do not care that often, several Unicode characters have similar, confusing pictures, and the pupils get confused.

The special language Tarja is suggested avoid confusions. Only those characters are allowed in Tarja that have well established default bijective graphical representation: only one picture corresponds to the character, and this picture does not refer to any other Unicode character.
The ConfusiveCharacters are excluded from Tarja; words with these characters are replaced with their synonyms: with the representations through Hiragana or with the transliterations to Ascii or with the translations to other languages, mainly to English (if the translation has no omonym in Japanese).
In such a way, Tarja appears as a kind of surjik, pedgin. This surjik helps to boycott characters that are not yet (for century 21) supplied with a Bijective graphical representation.

One of goals of Tarja is to attract the attention of professionals to the problem.

Some examples avoiding ConfusiveCharacters are suggested in this articles.

Broken bijectivity

The bijectivity can be broken in two ways:

1. The same character looks in different ways at different computers.

2. Different characters look the same at the same computers.

Below, examples for each of the two cases are suggested.

can be pronounced as Ie (いえ) and may refer to home, house.

(X5BB6) is confusiveCharacter: it has no bijective graphical representation.

No unique pic is established for character X5BB6 ;
its view depends on the default font of the operational system.

Here are the examples, the pics are zoomed in:


appears if observed at your computer.


 X5BB6m.png appears if observed at Macintosh

 X5BB6L.png appears if observed at Linux

In addition, at the same computer, the view of the character may depend on the software used (and even on the setting of the software).

Someone learning Japanese is difficult to guess, that both the pictures above refer to the same character number X5BB6 .

In Tarja, character is replaced with one of its synonyms:

«いえ», Hiragana representation

«Ie», the Romanji representation (Ascii transliteration)

«Home», the English translation.

JapanTa

Characters look similar. Even a native Japanese speaker, watching them, is unlikely to guess:
Which of them is Unicode character number X2F23?
Which of them is Unicode character number X30BF?
Which of them is Unicode character number X5915?

Many Katakana characters have analogies, some Kanji look very similar. For this reason, Katakana characters are not allowed in Tarja.

Onna

Term Onna (おんな) or "onna" may refer to one of the following three Unicode characters:
12069 (X2F25, ), KanjiRadical
22899 (X5973, ), KanjiLiberal
63873 (XF981, ), KanjiConfudal

Even a native Japanese speaker, looking at characters , , is unlikely to guess:
Which of them is character number X2F25?
Which of them is character number X5973?
Which of them is character number XF981?

For Tarja, each of these characters should be replaced to its synonym that already (to century 21) has the Bijective graphical representation, for example, おんな or Onna or onna.

Combinations of Kanji

The simple rules of Tarja above are not so straightforward.

The pronunciation of a Kanji may depend on the context.

In this case, either the phonetic transliteration of the whole word (either Hiragana or Romanji) is used or the translation to other language (preferable, English), pure ascii, and preferably, with single word; perhaps, with a compound word in such a way that allows the straightforward translation back, from Tarja to Japanese.

Not only Kanji

Some 2-byte characters may look similar to Ascii.

Some 3-byte characters may look similar to 2-byte characters.

Few examples are suggested below.

Latin and Cyrillic

At some computers, the Cyrillic letters look similar to their Ascii analogies.

Even a native Russian speaker, looking at characters A, А is unlikely to guess:
Which of them is character number X0041?
Which of them is character number X0410?

View of this characters may depend on the software, but both this characters look very similar.

CYRILLIC CAPITAL LETTER IO

Conbination of characters
X07F3 (two dots above a next letter)
X0045 (Ascii E)

looks as follows: ߳E

Character X0401 (CYRILLIC CAPITAL LETTER IO)

Looks as follows: Ё

These two symbols are difficult to distinguish one from another.

Even native Russian speaker, looking at characters ߳E and Ё

is unlikely to guess:
Which of them is combination of 3 bytes xDF xB3 x45 ?
Which of them is combination of 2 bytes xD0 x81 ?

Exclude

There are many confusions with Cyrillic characters.

Many Cyrillic characters have no Bijective graphical representation.

So, the Cyrillic characters are not allowed in Tarja.

Conclusion

To century 21, no Bijective graphical representation is yet developed for the Unicode characters that are encoded with more than 1 byte.

While this common bug is not fixed,
it may have sense to avoid characters that have no default, well established Bijective graphical representation.

The Technical language Tarja is an attempt to approach this goal.

References