Japanese

From TORI
Revision as of 13:36, 4 October 2025 by T (talk | contribs) (→‎Ambiguity and Tarja)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Japanese is main language and official language in Japan.

Japanese has 4 writing systems:

Hiragana (characters X3041 - X3096, phonetic alphabet used to indicate pronunciation of native Japanese words)

Katakana (characters X3097 - X30F6, used for words borrowed from other languages)

Kanji (characters X0F90 - XFA6D, native Hieroglyphs)

Romanji (characters since X0020 (spacebar) to X007E (tilde); practically, the same as Ascii).

Some Japanese characters are collected in article SomeU; most of them are encoded with 3 bytes.

There is no isomorphic mapping of words in Kanji to their synonyms in Hiragana.
In this sense, there are two Japanese languages, ideographic and phonetic. The translation from one to another makes problems for foreigners and may cause confusion even for the native Japanese speakers.

Hiragana and Katakana

Here is phonetic table of Hiragana and Katakana characters:

wrymhんンtsk
わワらラやヤまマはハなナたタさサかカあア
-りリ-みミひヒにニちチしシきキいイ
-るルゆユむムふフぬヌつツすスくクうウ
-れレ-めメへヘねネてテせセけケえエ
をヲろロよヨもモほホのノとトそソこコおオ

Unicode and confusions

Many Japanese Kanji have no unique pictures. To century 21, in various software, often, few characters have the same picture, the same semantics and the same mode of pronunciations.

The ambiguous characters are classified as KanjiRadical, KanjiLiberal (almost the same as CJK chharcters) or KanjiConfudal.

Some software (Mainly at Macintosh) use the same pictures for KanjiRadical and KanjiLiberal characters, causing concussions.

Some software automatically and silently (without any warning) reface KanjiConfudal with KanjiLiberal, making confusions even worse.

The PHP code du.t allows to identify characters, returning their unicode numbers and the encoding (assuming the UTF-8 Unicode system). Typically, each Japanese character is encoded with 3 bytes; so, the text in Japanese is a little bit longer than its English version (that uses a single byte per a character).

Ambiguity and Tarja

Many Japanese Kanji have no unique encoding.

In TORI, the technical language Tarja is under development; it collects Japanese characters that have unique 3-byte encoding and both, Hiragana and Romanji for replacement of characters that have no unique encoding.

Characters that have no unique encoding, are replaced with Hiragana or Romanji; either transliteration into Ascii, or translation of the whole word into English; the grammar most similar to Japanese is preserved.

The ambiguity and the confusion of the Japanese Kanjis has analogies in other languages.
Ascii Characters also may be confused in the similar way; for example, the most of Humans looking at
word (1) PABEHCTBO cannot distinguish it from
word (2) РАВЕНСТВО,
although word (1) is written in Ascii characters and counts 9 bytes while word (2) is written in Russian and counts 18 bytes.

Warning

Interpretation of Japanese in terms of Tarja is an attempt to simplify use of Japanese by the English-speaking foreigners.

It is not an attempt to substitute Japanese with any surrogate
nor a suggestion to modify the current version of Japanese.

References

2017.12.21. https://www.youtube.com/watch?v=b-LF-iLS_ys&list=PLhcJvXrBVQgoLbowh7Cvn8zqGPZz6Kdg3&index=4 Learn Japanese with JapanesePod101.com // Dec 21, 2017

2023.10.16. https://www.youtube.com/watch?v=dcKQyLaJXIE Japanese Learn While Sleeping | BASIC Japanese for Beginners Oct 16, 2023 Learn Japanese Everyday

Keywords

«Du.t», «Hiragana», «[[]]», «Japan», «Japanese», «Kanji», «KanjiConfudal», «KanjiLiberal», «KanjiRadical», «Katakana», «Tarja», «Unicode»,