Japanese

From TORI
Revision as of 13:15, 22 February 2026 by T (talk | contribs) (add link)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Japanese is main language and official language in Japan.

Japanese has 4 writing systems:

Hiragana (characters X3041 - X3096, phonetic alphabet used to indicate pronunciation of native Japanese words)

Katakana (characters X3097 - X30F6, used for words borrowed from other languages)

Kanji (characters X0F90 - XFA6D, native Hieroglyphs)

Romaji (characters since X0020 (spacebar) to X007E (tilde); practically, the same as Ascii).

Some Japanese characters are collected in article SomeU; most of them are encoded with 3 bytes.

There is no isomorphic mapping of words in Kanji to their synonyms in Hiragana.
In this sense, there are two Japanese languages, ideographic and phonetic. The translation from one to another makes problems for foreigners and may cause confusion even for the native Japanese speakers.

Multitud

Actually, term Japanese may refer to any of 3 languages, one oral (verbal) and two typeral (printable).

1. Romaji, that can be described with Latin characters (some equivalent of Ascii)

2. Hiragana, that represent characters with special phonetic alphabet

3. Kanji.

4. In addition, there is special phonetic alphabet Katakana for representation of foreign words with sounds that are allowed in phonetic Japanese language.

Hiragana and Katakana

Here is phonetic table of Hiragana and Katakana characters:

wrymhんンtsk
わワらラやヤまマはハなナたタさサかカあア
-りリ-みミひヒにニちチしシきキいイ
-るルゆユむムふフぬヌつツすスくクうウ
-れレ-めメへヘねネてテせセけケえエ
をヲろロよヨもモほホのノとトそソこコおオ

Latex

By default, Japanese characters are not supported in Latex, even at computers made in Japan.

The special efforts may be required in order to type in Japanese; there seem to exist no standard default way to type Japanese characters in Latex.

Two options are mentioned below.

CJK

Many Japanese characters can be printed with CJK package. The example is below:

https://tex.stackexchange.com/questions/223237/packages-cjk-versus-cjkutf8

% !TEX encoding = UTF-8
% !TEX program = pdflatex
\documentclass{article}
\usepackage{CJKutf8}
\usepackage[utf8]{inputenc} % optional
\usepackage[T1]{fontenc}

\begin{document}
% We always use CJK package globally to prevent some bugs.
\begin{CJK}{UTF8}{gbsn}
Without \texttt{CJKutf8} package, the result will be wrong.

Café: 咖啡厅

Gödel: 哥德尔

© 版权所有

\clearpage\end{CJK}
\end{document}

CJK has bugs: Some Kanji are not printed. Here are 4 examples:

結 過 長 論

with CJK package, these characters are ignored.

XeLaTeX

In some versions of Latex (mainly at Macintosh), there is option XeLaTeX.

XeLaTeX seems to be not compatible with the CJK package, but allows to type various Unicode characters.

The default example of XeLaTeX document is copipasted below:

% XeLaTeX can use any Mac OS X font. See the setromanfont command below.
% Input to XeLaTeX is full Unicode, so Unicode characters can be typed directly into the source.

% The next lines tell TeXShop to typeset with xelatex, and to open and save the source with Unicode encoding.

%!TEX TS-program = xelatex
%!TEX encoding = UTF-8 Unicode

\documentclass[12pt]{article}
\usepackage{geometry}                % See geometry.pdf to learn the layout options. There are lots.
\geometry{letterpaper}                   % ... or a4paper or a5paper or ... 
%\geometry{landscape}                % Activate for for rotated page geometry
%\usepackage[parfill]{parskip}    % Activate to begin paragraphs with an empty line rather than an indent
\usepackage{graphicx}
\usepackage{amssymb}

% Will Robertson's fontspec.sty can be used to simplify font choices.
% To experiment, open /Applications/Font Book to examine the fonts provided on Mac OS X,
% and change "Hoefler Text" to any of these choices.

\usepackage{fontspec,xltxtra,xunicode}
\defaultfontfeatures{Mapping=tex-text}
\setromanfont[Mapping=tex-text]{Hoefler Text}
\setsansfont[Scale=MatchLowercase,Mapping=tex-text]{Gill Sans}
\setmonofont[Scale=MatchLowercase]{Andale Mono}

\title{Brief Article}
\author{The Author}
%\date{}                                           % Activate to display a given date or no date

\begin{document}
\maketitle

% For many users, the previous commands will be enough.
% If you want to directly input Unicode, add an Input Menu or Keyboard to the menu bar 
% using the International Panel in System Preferences.
% Unicode must be typeset using a font containing the appropriate characters.
% Remove the comment signs below for examples.

% \newfontfamily{\A}{Geeza Pro}
% \newfontfamily{\H}[Scale=0.9]{Lucida Grande}
% \newfontfamily{\J}[Scale=0.85]{Osaka}

% Here are some multilingual Unicode fonts: this is Arabic text: {\A السلام عليكم}, this is Hebrew: {\H שלום}, 
% and here's some Japanese: {\J 今日は}.
\end{document}  

Some of comments (but not all) can be omitted in the document above, and the example still seems to be compiled well.

Unicode and confusions

Many Japanese Kanji have no unique pictures. To century 21, in various software, often, few characters have the same picture, the same semantics and the same mode of pronunciations.

The ambiguous characters are classified as KanjiRadical, KanjiLiberal (almost the same as CJK chharcters) or KanjiConfudal.

Some software (Mainly at Macintosh) use the same pictures for KanjiRadical and KanjiLiberal characters, causing concussions.

Some software automatically and silently (without any warning) reface KanjiConfudal with KanjiLiberal, making confusions even worse.

The PHP code du.t allows to identify characters, returning their unicode numbers and the encoding (assuming the UTF-8 Unicode system). Typically, each Japanese character is encoded with 3 bytes; so, the text in Japanese is a little bit longer than its English version (that uses a single byte per a character).

Ambiguity and Tarja

Many Japanese Kanji have no unique encoding.

In TORI, the technical language Tarja is under developing with goal to collect Japanese characters that have unique 3-byte encoding.

Characters that have no unique encoding, are replaced with Hiragana or Romaji; either transliteration into Ascii, or translation of the whole word into English; the grammar most similar to Japanese is preserved.

The ambiguity and the confusion of the Japanese Kanjis has analogies in other languages.
Ascii Characters also may be confused in the similar way; for example, the most of Humans looking at
word (1) PABEHCTBO cannot distinguish it from
word (2) РАВЕНСТВО,
although word (1) is written in Ascii characters and counts 9 bytes while word (2) is written in Russian and counts 18 bytes.

Warning

Interpretation of Japanese in terms of Tarja is an attempt to simplify use of Japanese by the English-speaking foreigners.

It is not an attempt to substitute Japanese with any surrogate
nor a suggestion to modify the current version of Japanese.

References

2017.12.21. https://www.youtube.com/watch?v=b-LF-iLS_ys&list=PLhcJvXrBVQgoLbowh7Cvn8zqGPZz6Kdg3&index=4 Learn Japanese with JapanesePod101.com // Dec 21, 2017

2022.01.28. https://www.youtube.com/watch?v=xGruG40wifQ Japanese Conversation | Learn Japanese While Sleeping #learnjapanese Japanese Everyday Jan 28, 2022

2023.10.16. https://www.youtube.com/watch?v=dcKQyLaJXIE Japanese Learn While Sleeping | BASIC Japanese for Beginners Oct 16, 2023 Learn Japanese Everyday

https://www3.nhk.or.jp/nhkworld/lesson/ja/ Easy Japanese(NEW) やさしい日本語 (2026)

Keywords

«Du.t», «Hiragana», «[[]]», «Japan», «Japanese», «Kanji», «KanjiConfudal», «KanjiLiberal», «KanjiRadical», «Katakana», «Romaji», «Tarja», «Unicode»,