Difference between revisions of "Onna"

From TORI
Jump to navigation Jump to search
(add sections)
 
(3 intermediate revisions by the same user not shown)
Line 55: Line 55:
 
Therefore different characters may share identical visual representations in many fonts. This causes a confusion: different characters look the same.
 
Therefore different characters may share identical visual representations in many fonts. This causes a confusion: different characters look the same.
   
Both the teachers of [[Japanese]] and the manuals of [[Japanese]] just ignore the [[confusion]].
+
Both teachers of [[Japanese]] and Japanese textbooks ignore this [[confusion]].
   
 
A newcomer encounters the problem and first attributes it to his or her own mistake.
 
A newcomer encounters the problem and first attributes it to his or her own mistake.
Then, the novice sees, that the error reproduces, again and again, and, en fin, recognizes, that it is not his/her mistake, but the [[bug]] of the software,
+
Then novice notices that the error appears repeatedly, again and again, and, en fin, recognizes, that it is not his/her mistake, but the [[bug]] of the software. <!--
the dirty trick on side of the font designers, committed in order to [[trump]] the newcomers, and both the teachers and the manuals make this difficulty harder, keeping it in secret from the pupils.
+
the dirty trick on side of the font designers, committed in order to [[trump]] the newcomers, and both the teachers and the manuals make this difficulty harder, keeping it in secret from the pupils.!-->
   
 
The problem is related not only to learning [[Japanese]].
 
The problem is related not only to learning [[Japanese]].
Line 78: Line 78:
   
 
The default [[Mediawiki]] software replaces character [[XF981]] to character [[X5973]] without any warning. <br>
 
The default [[Mediawiki]] software replaces character [[XF981]] to character [[X5973]] without any warning. <br>
This causes problems at automatic treat of data: in some cases the two objects are the same, and sometimes they are not. (Similar confusion takes place at the careless use of term «[[equality]]» applied to triangles in [[geometry]]).
+
This causes problems at automatic processing of data: in some cases the two objects are the same, and sometimes they are not. (Similar confusion takes place at the careless use of term «[[equality]]» applied to triangles in [[geometry]]).
   
In a text, that assumes any kind of citing, for example, copypasting to frame of a brouser or any search engine,
+
In a text, that assumes any kind of citing, for example, copy-pasting
  +
into the address bar of a brouser or any search engine,
 
the characters corresponding to the sound "[[onna]]"
 
the characters corresponding to the sound "[[onna]]"
 
should be specified as
 
should be specified as
Line 108: Line 109:
   
 
==Unicode==
 
==Unicode==
At least three Unicode characters are related to sound [[Onna]].<br>
+
At least three Unicode characters are qualified as «[[Onna]]».<br>
 
These characters are<!--
 
These characters are<!--
 
[[X2F25]],
 
[[X2F25]],
Line 195: Line 196:
 
In the sci-fi utopia «[[Tartaria]]», the advanced font [[Uniglif]] is mentioned.
 
In the sci-fi utopia «[[Tartaria]]», the advanced font [[Uniglif]] is mentioned.
 
Each [[glyph]] in that font is assigned the unique [[Unicode]] number, providing the
 
Each [[glyph]] in that font is assigned the unique [[Unicode]] number, providing the
bijective relation between the set of [[glyph]]s and the set of [[character]]s, at least for characters with number not exceeding XFFFF ([[ascii]], [[TwoByteCharacter]]s, [[ThrteeByteCharacter]]s). With font [[Uniglif]], no confusions similar to that above appear.
+
bijective relation between the set of [[glyph]]s and the set of [[character]]s, at least for characters with number not exceeding XFFFF ([[ascii]], [[TwoByteCharacter]]s, [[ThreeByteCharacter]]s). With font [[Uniglif]], no confusions similar to that above appear.
   
 
In the real life, while no analogy of [[Uniglif]] is available, the technical language [[Tarja]] can be used to avoid confusions.
 
In the real life, while no analogy of [[Uniglif]] is available, the technical language [[Tarja]] can be used to avoid confusions.
Line 280: Line 281:
   
 
==Reuse the [[glyph]]==
 
==Reuse the [[glyph]]==
  +
[[女]] appears as a component in various [[glyph]]s.
  +
Few examples are suggested in this section.
   
 
[[36B2]] [[&#X36B2;]] <ref>
 
[[36B2]] [[&#X36B2;]] <ref>
Line 378: Line 381:
 
Then, the lack of the unique encoding for a glyph becomes a problem; one needs some programming (see the example above) to patch the defects of the historic combination of the Unicode with existing fonts.
 
Then, the lack of the unique encoding for a glyph becomes a problem; one needs some programming (see the example above) to patch the defects of the historic combination of the Unicode with existing fonts.
   
ChatGPT indicates, that there was not bad will of the designers of the Unicode,
+
ChatGPT indicates, that there was no bad will of the designers of the Unicode,
 
nor that of teachers of Japanese and authors of the manuals.
 
nor that of teachers of Japanese and authors of the manuals.
 
They assumed, that their students never begin to write (nor analyze) [[character]]s in Japanese,
 
They assumed, that their students never begin to write (nor analyze) [[character]]s in Japanese,

Latest revision as of 10:15, 17 March 2026


OnnaDeaw.png

Drawing of X2F25, X5973 or XF981 [1]

Onna is a set of three visually identical Unicode characters representing the kanji (“woman”) that originate from different parts of the Unicode standard.

Term Onna (おんな) or "onna" may refer to one of the following three Unicode characters:
12069 (X2F25, )[2], KanjiRadical
22899 (X5973, )[3], KanjiLiberal
63873 (XF981, )[4], KanjiConfudal

Onna may refer also to the set of these 3 characters.

Often, these characters are pronounced as "Onna" and refer to a woman, Human female [1], see picture below.

Characters of set Onna cause confusions.

Confusion

As of 2026, no unique Unicode number is assigned to the glyph . Unicode encodes abstract characters rather than graphical glyphs. Therefore different characters may share identical visual representations in many fonts. This causes a confusion: different characters look the same.

Both teachers of Japanese and Japanese textbooks ignore this confusion.

A newcomer encounters the problem and first attributes it to his or her own mistake. Then novice notices that the error appears repeatedly, again and again, and, en fin, recognizes, that it is not his/her mistake, but the bug of the software.

The problem is related not only to learning Japanese.

Even specialists, even native Japanese speakers looking at characters , are unlikely to guess:
Which of them is character X2F25 [2]?
Which of them is character X5973 [3]?
Which of them is character XF981 [4]?

Term Onna is synonym of construction « or or " for cases, when the only view of the Kanji is available, and it is difficult to identify it.

Not only Humans, but also some software confuse the characters with similar or the same glyph(s).

The default Mediawiki software replaces character XF981 to character X5973 without any warning.
This causes problems at automatic processing of data: in some cases the two objects are the same, and sometimes they are not. (Similar confusion takes place at the careless use of term «equality» applied to triangles in geometry).

In a text, that assumes any kind of citing, for example, copy-pasting into the address bar of a brouser or any search engine, the characters corresponding to the sound "onna" should be specified as X2F25, X5973, XF981 rather than , , : the software confuses the last two characters.

At least since year 2021, this confusion is recognized and described [2][3][4].

Confusions related to the apparent bugs with graphical representations of the Unicode characters are described in articles «Chikara», «Miru», «Onna», «Sakana», «StickPi», «TsukiGatsu».

Unicode

At least three Unicode characters are qualified as «Onna».
These characters are X2F25, X5973, XF981.

The Utf8 encoding can be revealed with the PHP program onna.t;
it is copypasted below. File uni.t also may have need to be loaded.

<?php 
include "uni.t";
$a=unichr(0x2f25);
$a.=unichr(0x5973);
$a.=unichr(0xF981);
echo "$a\n";
$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";
for($n=0;$n<$N;$n++){ printf("%02x ",ord($a[$n]) ); }
echo "\n";

$b = mb_str_split($a);
var_dump($b);
$M=count($b);
for($m=0;$m<$M;$m++) {
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
$d=strlen($c);
echo "Picture: $c uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
                      }
?>

File uni.t also would be loaded for the execution; then, command

php onna.t

produces the output below:

⼥女女
The array has 9 bytes; here is its splitting:
e2 bc a5 e5 a5 b3 ef a6 81
array(3) {
  [0]=>
  string(3) "⼥"
  [1]=>
  string(3) "女"
  [2]=>
  string(3) "女"
}

Unicode character number 12069 id est, x2F25
Picture: ⼥ uses 3 bytes. These bytes are:
xE2 xBC xA5 in the hexadecimal representation and
226 188 165 in the decimal representation

Unicode character number 22899 id est, x5973
Picture: 女 uses 3 bytes. These bytes are:
xE5 xA5 xB3 in the hexadecimal representation and
229 165 179 in the decimal representation

Unicode character number 63873 id est, xF981
Picture: 女 uses 3 bytes. These bytes are:
xEF xA6 x81 in the hexadecimal representation and
239 166 129 in the decimal representation

The similar analysis can be performed with more universal dumping routine du.t; here is the example of the use:

php du.t "⼥女女"

In such a way, a consequence of historical encoding decisions in Unicode and legacy character sets, together with ignorance of teachers and authors of manuals force the pupils to learn Unicode, UTF8 and the programming in order to distinguish characters used in Japanese language.

Uniglif and Tarja

In the sci-fi utopia «Tartaria», the advanced font Uniglif is mentioned. Each glyph in that font is assigned the unique Unicode number, providing the bijective relation between the set of glyphs and the set of characters, at least for characters with number not exceeding XFFFF (ascii, TwoByteCharacters, ThreeByteCharacters). With font Uniglif, no confusions similar to that above appear.

In the real life, while no analogy of Uniglif is available, the technical language Tarja can be used to avoid confusions. This is japanese-based technical slang that avoids ambiguous glyphs, avoid characters that are not yet supplied with default exclusive glyphs.

At the translation form Japanese to Tarja, for characters , and , their explicit numbers X2F25, X5973 and XF981 can be used.

Alternatively, the transliterations «おんな» or «Onna» of «onna» can be used.

In addition, words borrowed from other languages («Female», «Mujer», ..) can be used in Tarja when it causes no confusion.

Examples

Dictionary Jisho suggest examples with sound onna [6]:

おんなざか 女坂 mother; female parent​

おんなきょうだい 女兄弟 the easier of two slopes​

女姉妹 【おんなきょうだい】 sisters; female siblings​

おんなかぶき 女歌舞伎 girls' kabuki​

Censorship and Vestism

Svetlana2.091.jpg Svetlana4.101a.jpg Svetlana6.110.jpg Svetlana8.120a.jpg Svetlana9.130b.jpg

Objects and subjects, denoted with term onna, often become targets of aggression and/or censorship.
Vestists insist, that the body should be hidden, and punish those who do not obey [7].

Such a practice is described also in the sci-fi novel «Meganesia.Deportation».

The 5 pictures at right are designed to measure the hatred/tolerance of a religion.
Counting, how many of shown dressing styles are allowed by a religion, gives the qualification of its tolerance with respect to onna in the 5 grade scale.

Reuse the glyph

appears as a component in various glyphs. Few examples are suggested in this section.

36B2 [8] セン, ⁠ショウ, ⁠テン, yan, ten, small, weak

597B [9] ダン, ナン, dan, nan, quarrel, dispute

597D [10] コウ, このむ, すく, ⁠よい, よし, yoi, good

5999 [11] ミョウ, ⁠ビョウ, ⁠たえ, miyou, mysterious, strange

59B9 [12] マイ, ⁠バイ, ⁠メ, ⁠いもうと, いも, imouto, younger sister

59C9 [13] シ, ⁠あね, ⁠ねえさん, vasan, elder sister

59E6 [14] : カン, ケン, ⁠かしましい, みだら , midara, making 3 women at once.

5B89 [15] アン, やすい, ⁠いずくに, いずくにか, いずくにか, いずくんぞ, やすんじる, yasashi, cheap.

Historic context

The Unicode and the many default fonts had been designed in century 20, while the computation had been underdeveloped. The printing techniques, contrary, already existed during centuries. This predetermined the attitude of the designers to the encoding and too fonts. The goal wad to reproduce the required glyph, on the screen or in the printing; it was supposed that nobody cares, how is it encoded.

In Century 21, the roles of a glyph and that of a character swap. The character becomes the principal part of the textual information; the glyphes are still needed for the Human reception of characters.

Then, the lack of the unique encoding for a glyph becomes a problem; one needs some programming (see the example above) to patch the defects of the historic combination of the Unicode with existing fonts.

ChatGPT indicates, that there was no bad will of the designers of the Unicode, nor that of teachers of Japanese and authors of the manuals. They assumed, that their students never begin to write (nor analyze) characters in Japanese, and never meet errors relates to the confusing graphical representation of the characters.

Warning

Publications about characters of the Onna set are collected and analyzed in TORI with scientific goals.

The analysis and the interpretation above should not be interpreted as an appeal for the extrajudicial execution of the font/unicode designers who did not supply some popular glyphs with unique Unicode numbers.

The more civilized solution would be to convince them to develop some realistic default analogy of the fantastic Uniglif, the font with bijective relation between glyphs and characters.

The description above may require correction(s) by a native Japanese speaker.

References

  1. 1.0 1.1 https://jisho.org/search/%23kanji%20%E5%A5%B3 https://jisho.org/search/%23kanji_女 woman, female Kun: おんな、 め On: ジョ、 ニョ、 ニョウ Jōyō kanji, taught in grade 1 JLPT level N5 151 of 2500 most used kanji in newspapers On reading compounds 【ジョ】 woman, girl, daughter, Chinese "Girl" constellation (one of the 28 mansions) 女王 【ジョオウ】 queen, female champion 処女 【ショジョ】 virgin, maiden 一女 【イチジョ】 one daughter, eldest daughter, first-born daughter 女王 【ジョオウ】 queen, female champion 女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest) 老若男女 【ロウニャクナンニョ】 men and women of all ages 天女 【テンニョ】 heavenly nymph, celestial maiden, beautiful and kind woman 女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest) 女官 【ジョカン】 court lady, lady-in-waiting Kun reading compounds 女 【おんな】 female, woman, female sex, female lover, girlfriend, mistress, (someone's) woman 女形 【おんながた】 onnagata, male actor in female kabuki roles, female partner (in a relationship) 醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon 囲い女 【かこいおんな】 mistress 雌 【め】 female, smaller (of the two), weaker, woman, wife 女神 【めがみ】 goddess, female deity 早乙女 【さおとめ】 young female rice planter, young girl 醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon
  2. 2.0 2.1 2.2 https://util.unicode.org/UnicodeJsps/character.jsp?a=2F25 2F25 KANGXI RADICAL WOMAN Han Script id: allowed confuse: 女 , 女
  3. 3.0 3.1 3.2 https://util.unicode.org/UnicodeJsps/character.jsp?a=5973 5973 CJK UNIFIED IDEOGRAPH-5973 Han Script id: restricted confuse: 女 , ⼥
  4. 4.0 4.1 4.2 https://util.unicode.org/UnicodeJsps/character.jsp?a=F981 女 F981 CJK COMPATIBILITY IDEOGRAPH-F981 Han Script id: allowed confuse: ,
  5. https://commons.wikimedia.org/wiki/File:Human_female.jpg English: Naked female human body. Русский: Обнаженная женщина. English: Model name: (preferred not to be stated) At time of photograph: Age: 40 Height: 166 cm Weight: 47 kg BMI: 17.1 Ornaments: Ear piercing, ring on left ring finger (not in retouched images), nail polish on toe nails. There is some tilting of the upper trunk towards the left of the body, which may be positional or anatomical. Date 29 September 2011 Source Own work Author Taken at City Studios in Stockholm (www.stockholmsfotografen.se), September 29, 2011, with assistance from KYO (The organisation of life models) in Stockholm. ..
  6. https://jisho.org/search/%E5%A5%B3%20%E3%81%8A%E3%82%93%E3%81%AA%20%23words?page=2 女 おんな #words .. Words — 107 found おんなおや 女親 Links Noun 1. mother; female parent​ Details ▸ おんなざか 女坂 Links Noun 1. the easier of two slopes​ Details ▸ おんなきょうだい 女兄弟 Links Noun 1. sisters; female siblings​ Other forms 女姉妹 【おんなきょうだい】 Details ▸ おんなかぶき 女歌舞伎 Links Noun 1. girls' kabuki​ Details ▸ ..
  7. https://edition.cnn.com/2023/09/21/middleeast/iran-hijab-law-parliament-jail-intl-hnk Iranian women face 10 years in jail for inappropriate dress after ‘hijab bill’ approved By Tara Subramaniam, Adam Pourahmadi and Mostafa Salem, CNN. Published 12:34 PM EDT, Thu September 21, 2023
  8. https://util.unicode.org/UnicodeJsps/character.jsp?a=36B2 36B2 CJK UNIFIED IDEOGRAPH-36B2 Han Script confuse: none .. (kDefinition) small and weak, used in girl's name, a woman's feature; lady's face .. (kJapanese) セン|⁠ショウ|⁠テン ..
  9. 597B CJK UNIFIED IDEOGRAPH-597B Han Script confuse: none .. (kJapanese) ダン|⁠ナン ..
  10. 597D CJK UNIFIED IDEOGRAPH-597D Han Script confuse: none .. (kDefinition) good, excellent, fine; well .. (kJapanese) コウ|⁠このむ|⁠すく|⁠よい|⁠よし ..
  11. https://util.unicode.org/UnicodeJsps/character.jsp?a=5999 5999 CJK UNIFIED IDEOGRAPH-5999 Han Script confuse: none .. (kDefinition) mysterious, subtle; exquisite (kJapanese) ミョウ|⁠ビョウ|⁠たえ
  12. https://util.unicode.org/UnicodeJsps/character.jsp?a=59B9 59B9 CJK UNIFIED IDEOGRAPH-59B9 Han Script confuse: none .. (kDefinition) younger sister .. (kJapanese) マイ|⁠バイ|⁠メ|⁠いもうと|⁠いも ..
  13. https://util.unicode.org/UnicodeJsps/character.jsp?a=59C9 59C9 CJK UNIFIED IDEOGRAPH-59C9 Han Script confuse: none .. (kDefinition) elder sister .. (kJapanese) シ|⁠あね|⁠ねえさん ..
  14. https://util.unicode.org/UnicodeJsps/character.jsp?a=59E6 59E6 CJK UNIFIED IDEOGRAPH-59E6 Han Script confuse: none .. (kDefinition) adultery, debauchery; debauch .. (kJapanese) カン|⁠ケン|⁠かしましい|⁠みだら
  15. https://util.unicode.org/UnicodeJsps/character.jsp?a=5B89 5B89 CJK UNIFIED IDEOGRAPH-5B89 Han Script confuse: none .. (kDefinition) peaceful, tranquil, quiet .. (kJapanese) アン|⁠やすい|⁠いずくに|⁠いずくにか|⁠いずくんぞ|⁠やすんじる ..

Keywords

«Bijective graphical representation», «Chinese», «Confusion», «Female», «Japanese», «Kanji», «KanjiConfudal», «KanjiLiberal», «KanjiRadical», «Onna», «SomeU», «Tarja», «Unicode», «Uniglif», «Utf8», «Utf8table», «UtfH», «Woman», «X2F25» «», «X5973» «», «XF981» «»,