Difference between revisions of "Onna"

From TORI
Jump to navigation Jump to search
(add sections)
Line 1: Line 1:
 
{{top}}
 
{{top}}
  +
Thank you. I made some corrections you suggest.
<div style="float:right;margin:-70px 0px 0px 0px;">
 
  +
if you see more mistakes, let me know.
{{pic|OnnaDeaw.png|300px}}<br>
 
<center>Drawing of [[X2F25]], [[X5973]] or [[XF981]] <ref name="jisho">
 
https://jisho.org/search/%23kanji%20%E5%A5%B3
 
https://jisho.org/search/%23kanji_女
 
woman, female
 
Kun:
 
[[おんな]]、 め
 
On:
 
ジョ、 ニョ、 ニョウ
 
Jōyō kanji, taught in grade 1
 
JLPT level N5
 
151 of 2500 most used kanji in newspapers
 
On reading compounds
 
[[女]] 【ジョ】 woman, girl, daughter, Chinese "Girl" constellation (one of the 28 mansions)
 
女王 【ジョオウ】 queen, female champion
 
処女 【ショジョ】 virgin, maiden
 
一女 【イチジョ】 one daughter, eldest daughter, first-born daughter
 
女王 【ジョオウ】 queen, female champion
 
女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest)
 
老若男女 【ロウニャクナンニョ】 men and women of all ages
 
天女 【テンニョ】 heavenly nymph, celestial maiden, beautiful and kind woman
 
女房 【ニョウボウ】 wife (esp. one's own wife), court lady, female court attache, woman who served at the imperial palace, woman (esp. as a love interest)
 
女官 【ジョカン】 court lady, lady-in-waiting
 
Kun reading compounds
 
女 【おんな】 female, woman, female sex, female lover, girlfriend, mistress, (someone's) woman
 
女形 【おんながた】 onnagata, male actor in female kabuki roles, female partner (in a relationship)
 
醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon
 
囲い女 【かこいおんな】 mistress
 
雌 【め】 female, smaller (of the two), weaker, woman, wife
 
女神 【めがみ】 goddess, female deity
 
早乙女 【さおとめ】 young female rice planter, young girl
 
醜女 【しゅうじょ】 homely woman, plain-looking woman, female demon
 
</ref></center>
 
</div>
 
[[Onna]] is a set of three visually identical Unicode characters representing the kanji [[女]] (“woman”) that originate from different parts of the Unicode standard.
 
   
  +
Following your sample, I have updated article
Term [[Onna]] ([[おんな]]) or "onna" may refer to one of the following three [[Unicode]] characters:<br>
 
  +
https://mizugadro.mydns.jp/t/index.php/Onna
12069 ([[X2F25]], [[&#X2F25;]])<ref name="x2f25"/>, [[KanjiRadical]] <br>
 
22899 ([[X5973]], [[&#X5973;]])<ref name="x5973"/>, [[KanjiLiberal]] <br>
 
63873 ([[XF981]], [[&#XF981;]])<ref name="xf981"/>, [[KanjiConfudal]]
 
   
  +
Can you criticize it?
[[Onna]] may refer also to the set of these 3 characters.
 
  +
I copypast the source below.
   
  +
=r=
Often, these characters are pronounced as "[[Onna]]" and refer
 
  +
Your **[[Onna]]** article is interesting and mostly technically correct. The structure is good and the examples help. I will give **constructive criticism in four categories**:
to a [[woman]], Human female
 
<ref name="jisho"/>, see picture below.
 
{{fig|HumanFemale276.jpg|96|-12|0|-20|[[Onna]], [[おんな]] <ref>https://commons.wikimedia.org/wiki/File:Human_female.jpg English: Naked [[female]] human body. Русский: Обнаженная женщина. English: Model name: (preferred not to be stated) At time of photograph: Age: 40 Height: 166 cm Weight: 47 kg BMI: 17.1 Ornaments: Ear piercing, ring on left ring finger (not in retouched images), nail polish on toe nails. There is some tilting of the upper trunk towards the left of the body, which may be positional or anatomical. Date 29 September 2011 Source Own work Author Taken at City Studios in Stockholm (www.stockholmsfotografen.se), September 29, 2011, with assistance from KYO (The organisation of life models) in Stockholm. ..</ref>}}
 
   
  +
1. **technical accuracy (Unicode facts)**
[[Character]]s of set [[Onna]] cause [[confusion]]s.
 
==Confusion==
 
As of 2026, no unique Unicode number is assigned to the glyph [[&#X2F25;]].
 
Unicode encodes abstract characters rather than graphical glyphs.
 
Therefore different characters may share identical visual representations in many fonts. This causes a confusion: different characters look the same.
 
   
  +
2. **language/grammar improvements**
Both the teachers of [[Japanese]] and the manuals of [[Japanese]] just ignore the [[confusion]].
 
   
  +
3. **logic and tone**
A newcomer encounters the problem and first attributes it to his or her own mistake.
 
Then, the novice sees, that the error reproduces, again and again, and, en fin, recognizes, that it is not his/her mistake, but the [[bug]] of the software,
 
the dirty trick on side of the font designers, committed in order to [[trump]] the newcomers, and both the teachers and the manuals make this difficulty harder, keeping it in secret from the pupils.
 
   
  +
4. **structural suggestions**
The problem is related not only to learning [[Japanese]].
 
   
  +
I will be direct so the article becomes stronger.
Even specialists, even native Japanese speakers looking at characters
 
[[&#X2F25;]]
 
[[&#X5973;]],
 
[[&#XF981;]] are unlikely to guess:<br>
 
Which of them is character [[X2F25]] <ref name="x2f25"/>?<br>
 
Which of them is character [[X5973]] <ref name="x5973"/>?<br>
 
Which of them is character [[XF981]] <ref name="xf981"/>?
 
   
  +
== 1. Important technical clarification ==
Term [[Onna]] is synonym of construction
 
«[[&#X2F25;]] or [[&#X5973;]] or [[&#XF981;]]"
 
for cases, when the only view of the Kanji is available, and it is difficult to identify it.
 
   
  +
Your core statement is almost correct, but needs **one correction**.
Not only Humans, but also some software confuse the characters with similar or the same [[glyph]](s).
 
   
  +
You wrote that Unicode has **three characters for “onna”**:
The default [[Mediawiki]] software replaces character [[XF981]] to character [[X5973]] without any warning. <br>
 
  +
* X2F25 ⼥
This causes problems at automatic treat of data: in some cases the two objects are the same, and sometimes they are not. (Similar confusion takes place at the careless use of term «[[equality]]» applied to triangles in [[geometry]]).
 
  +
* X5973 女
  +
* XF981 女
   
  +
This is true **as code points**, but their roles are different.
In a text, that assumes any kind of citing, for example, copypasting to frame of a brouser or any search engine,
 
  +
<pre>
the characters corresponding to the sound "[[onna]]"
 
  +
| Code | Character | Role |
should be specified as
 
  +
| ----- | --------- | ----------------------------- |
[[X2F25]],
 
  +
| X5973 | 女 | real ideograph (normal kanji) |
[[X5973]],
 
  +
| X2F25 | ⼥ | Kangxi radical symbol |
[[XF981]] rather than
 
  +
| XF981 | 女 | compatibility ideograph |
[[&#X2F25;]],
 
  +
</pre>
[[&#X5973;]],
 
  +
So the **primary character** is:
[[&#XF981;]]: the software confuses the last two characters.
 
   
  +
X5973 女
At least since year 2021, this confusion is recognized and described
 
<ref name="x2f25">
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=2F25 [[⼥]] 2F25 KANGXI RADICAL WOMAN Han Script id: allowed confuse: 女 , 女
 
</ref><ref name="x5973">
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=5973 [[女]] 5973 CJK UNIFIED IDEOGRAPH-5973 Han Script id: restricted confuse: 女 , ⼥
 
</ref><ref name="xf981">
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=F981 女 F981 CJK COMPATIBILITY IDEOGRAPH-F981 Han Script id: allowed confuse: [[女]] , [[⼥]]
 
</ref>.
 
   
  +
The others exist for **compatibility with historical encodings**.
[[Confusion]]s related to the apparent bugs with graphical representations of the [[Unicode]] characters are described in articles
 
«[[Chikara]]»,
 
«[[Miru]]»,
 
«[[Onna]]»,
 
«[[Sakana]]»,
 
«[[StickPi]]»,
 
«[[TsukiGatsu]]».
 
   
  +
Unicode therefore does **not consider them separate characters in meaning**, only in encoding history.
==Unicode==
 
At least three Unicode characters are related to sound [[Onna]].<br>
 
These characters are<!--
 
[[X2F25]],
 
[[X5973]],
 
[[XF981]]:
 
!-->
 
<big><big>[[&#X2F25;]]</big></big> [[X2F25]],
 
<big><big>[[&#X5973;]]</big></big> [[X5973]],
 
<big><big>[[&#XF981;]]</big></big> [[XF981]].
 
   
  +
You may want to add one sentence like:
The [[Utf8]] encoding can be revealed with the [[PHP]] program [[onna.t]];<br>
 
it is copypasted below. File [[uni.t]] also may have need to be loaded.
 
<pre>
 
<?php
 
include "uni.t";
 
$a=unichr(0x2f25);
 
$a.=unichr(0x5973);
 
$a.=unichr(0xF981);
 
echo "$a\n";
 
$N=strlen($a);
 
echo "The array has $N bytes; here is its splitting:\n";
 
for($n=0;$n<$N;$n++){ printf("%02x ",ord($a[$n]) ); }
 
echo "\n";
 
   
  +
> Character [[X5973]] is the standard ideograph.
$b = mb_str_split($a);
 
  +
> Characters [[X2F25]] and [[XF981]] are compatibility forms used in radical lists or legacy encodings.
var_dump($b);
 
$M=count($b);
 
for($m=0;$m<$M;$m++) {
 
printf("\n");
 
$c=$b[$m];
 
$u=uniord($c);
 
printf("Unicode character number %05d id est, x%04X\n",$u,$u);
 
$d=strlen($c);
 
echo "Picture: $c uses $d bytes. These bytes are:\n";
 
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
 
printf("in the hexadecimal representation and\n");
 
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
 
printf("in the decimal representation\n");
 
}
 
?>
 
</pre>
 
File [[uni.t]] also would be loaded for the execution; then, command
 
   
  +
That will make the article **more academically accurate**.
php onna.t
 
   
  +
== 2. Grammar corrections ==
produces the output below:
 
<poem>
 
⼥女女
 
The array has 9 bytes; here is its splitting:
 
e2 bc a5 e5 a5 b3 ef a6 81
 
array(3) {
 
[0]=>
 
string(3) "⼥"
 
[1]=>
 
string(3) "女"
 
[2]=>
 
string(3) "女"
 
}
 
   
  +
Several sentences can be improved slightly.
Unicode character number 12069 id est, x2F25
 
Picture: ⼥ uses 3 bytes. These bytes are:
 
xE2 xBC xA5 in the hexadecimal representation and
 
226 188 165 in the decimal representation
 
   
  +
### A
Unicode character number 22899 id est, x5973
 
Picture: 女 uses 3 bytes. These bytes are:
 
xE5 xA5 xB3 in the hexadecimal representation and
 
229 165 179 in the decimal representation
 
   
  +
You wrote:
Unicode character number 63873 id est, xF981
 
  +
Picture: 女 uses 3 bytes. These bytes are:
 
  +
> To year 2026, yet, no unique Unicode number is assigned to glyph
xEF xA6 x81 in the hexadecimal representation and
 
  +
239 166 129 in the decimal representation
 
  +
Better:
  +
  +
> As of 2026, no unique Unicode number is assigned to the glyph
  +
  +
---
  +
  +
### B
  +
  +
You wrote:
  +
  +
> The newcomer meets the problem and first attributes it to his of her own mistake.
  +
  +
Better:
  +
  +
> A newcomer encounters the problem and first attributes it to his or her own mistake.
  +
  +
---
  +
  +
### C
  +
  +
You wrote:
  +
  +
> Even a specialists
  +
  +
Correct form:
  +
  +
> Even a specialist
  +
  +
or
  +
  +
> Even specialists
  +
  +
---
  +
  +
### D
  +
  +
You wrote:
  +
  +
> the characters corresponding to sound onna
  +
  +
Better:
  +
  +
> the characters corresponding to the sound *onna*
  +
  +
---
  +
  +
### E
  +
  +
You wrote:
  +
  +
> the software confuse the last two characters
  +
  +
Better:
  +
  +
> the software confuses the last two characters
  +
  +
== 3. Tone (important) ==
  +
  +
Your article occasionally becomes **very emotional**. For example:
  +
  +
> dirty trick on side of the font designers
  +
  +
> sabotage, silent conspiracy
  +
  +
For a **technical article**, this weakens credibility.
  +
  +
A neutral version would be stronger:
  +
  +
Instead of:
  +
  +
> dirty trick on side of the font designers
  +
  +
Write:
  +
  +
> a consequence of historical encoding decisions in Unicode and legacy character sets.
  +
  +
Instead of:
  +
  +
> sabotage
  +
  +
Write:
  +
  +
> an unfortunate compatibility artifact.
  +
  +
Readers will trust the article **much more**.
  +
  +
== 4. One conceptual point ==
  +
  +
You wrote:
  +
  +
> no unique Unicode number is assigned to glyph
  +
  +
Strictly speaking this is **normal in Unicode**.
  +
  +
Unicode encodes **characters**, not **glyphs**.
  +
  +
So multiple code points may intentionally share one glyph.
  +
  +
You might want to add a short explanation:
  +
  +
> Unicode encodes abstract characters rather than graphical glyphs.
  +
> Therefore different characters may share identical visual representations in many fonts.
  +
  +
That will clarify the situation.
  +
  +
== 5. The UTF-8 section ==
  +
  +
Your **PHP demonstration is excellent**.
  +
It clearly shows that the characters have different byte sequences.
  +
  +
Example:
  +
<poem>
  +
⼥ e2 bc a5
  +
女 e5 a5 b3
  +
女 ef a6 81
 
</poem>
 
</poem>
   
  +
This is actually a **very good technical illustration**.
The similar analysis can be performed with more universal dumping routine [[du.t]];
 
here is the example of the use:
 
   
  +
== 6. The censorship section ==
php du.t "⼥女女"
 
   
  +
This section is interesting socially, but it is **not strongly connected to the Unicode problem**.
In such a way,
 
a consequence of historical encoding decisions in Unicode and legacy character sets, together with ignorance of teachers and authors of manuals force the pupils to learn [[Unicode]], UTF8 and the programming in order to distinguish characters used in [[Japanese]] language.
 
   
  +
You might consider either:
==[[Uniglif]] and [[Tarja]]==
 
  +
* shortening it, or
  +
* creating a separate article about **Vestism and Onna**.
   
  +
Otherwise the article changes from **typography → sociology** abruptly.
In the sci-fi utopia «[[Tartaria]]», the advanced font [[Uniglif]] is mentioned.
 
Each [[glyph]] in that font is assigned the unique [[Unicode]] number, providing the
 
bijective relation between the set of [[glyph]]s and the set of [[character]]s, at least for characters with number not exceeding XFFFF ([[ascii]], [[TwoByteCharacter]]s, [[ThrteeByteCharacter]]s). With font [[Uniglif]], no confusions similar to that above appear.
 
   
  +
== 7. Small structural suggestion ==
In the real life, while no analogy of [[Uniglif]] is available, the technical language [[Tarja]] can be used to avoid confusions.
 
This is [[japanese]]-based technical [[slang]] that avoids ambiguous [[glyphs]],
 
avoid characters that are not yet supplied with default exclusive [[glyph]]s.
 
   
  +
At the beginning you might add a **one-sentence definition**, similar to what you did in TsukiGatsu.
At the translation form [[Japanese]] to [[Tarja]], for characters
 
[[&#X2F25;]],
 
[[&#X5973;]] and
 
[[&#XF981;]],
 
their explicit numbers
 
[[X2F25]],
 
[[X5973]] and
 
[[XF981]] can be used.
 
   
  +
Example:
Alternatively, the transliterations «[[おんな]]» or «[[Onna]]» of «[[onna]]» can be used.
 
   
  +
> **Onna** is a set of three visually identical Unicode characters representing the kanji 女 (“woman”) that originate from different parts of the Unicode standard.
In addition, words borrowed from other languages («[[Female]]», «[[Mujer]]», ..)
 
can be used in [[Tarja]] when it causes no confusion.
 
   
  +
This helps readers immediately understand the page.
==Examples==
 
Dictionary [[Jisho]] suggest examples with sound [[onna]]
 
<ref>
 
https://jisho.org/search/%E5%A5%B3%20%E3%81%8A%E3%82%93%E3%81%AA%20%23words?page=2
 
女 おんな #words
 
..
 
Words — 107 found
 
おんなおや 女親
 
Links
 
Noun
 
1. mother; female parent​
 
Details ▸
 
おんなざか 女坂
 
Links
 
Noun
 
1. the easier of two slopes​
 
Details ▸
 
おんなきょうだい 女兄弟
 
Links
 
Noun
 
1. sisters; female siblings​
 
Other forms
 
女姉妹 【おんなきょうだい】
 
Details ▸
 
おんなかぶき 女歌舞伎
 
Links
 
Noun
 
1. girls' kabuki​
 
Details ▸
 
..
 
</ref>:
 
   
  +
== 8. Overall evaluation ==
おんなざか 女坂
 
mother; female parent​
 
   
  +
Strengths:
おんなきょうだい 女兄弟
 
the easier of two slopes​
 
   
  +
✔ clear concept
女姉妹 【おんなきょうだい】
 
  +
✔ good technical demonstration
sisters; female siblings​
 
  +
✔ interesting taxonomy of confusion sets
  +
✔ good references
   
  +
Things to improve:
おんなかぶき 女歌舞伎
 
  +
• slightly neutralize tone
girls' kabuki​
 
  +
• add Unicode explanation about characters vs glyphs
  +
• small grammar fixes
  +
• possibly shorten the sociological section
   
  +
Your project is gradually becoming something like a **“taxonomy of Unicode glyph confusions.”**
==[[Censorship]] and [[Vestism]]==
 
  +
That is actually a **very original idea**.
<div class="thumb tright" style="float:right; background-color:#fff; margin:-20px 0px -4px 0px">
 
{{picx|Svetlana2.091.jpg|144px}}
 
{{picx|Svetlana4.101a.jpg|144px}}
 
{{picx|Svetlana6.110.jpg|144px}}
 
{{picx|Svetlana8.120a.jpg|144px}}
 
{{picx|Svetlana9.130b.jpg|144px}}
 
</div>
 
Objects and subjects, denoted with term [[onna]], often become targets of aggression and/or [[censorship]].<br>
 
[[vestism|Vestists]] insist, that the body should be hidden, and punish those who do not obey <ref>
 
https://edition.cnn.com/2023/09/21/middleeast/iran-hijab-law-parliament-jail-intl-hnk
 
Iranian women face 10 years in jail for inappropriate dress after ‘hijab bill’ approved
 
By [[Tara Subramaniam]], [[Adam Pourahmadi]] and [[Mostafa Salem]], [[CNN]].
 
Published 12:34 PM EDT, Thu September 21, 2023</ref>.
 
   
Such a practice is described also in the sci-fi novel «[[Meganesia.Deportation]]».
 
   
  +
If you want, I can also show you **one extremely strange Unicode case related to 女** where **four different code points render exactly the same glyph in many fonts**.
The 5 pictures at right are designed to measure the hatred/tolerance of a [[religion]]. <br>
 
Counting, how many of shown dressing styles are allowed by a [[religion]], gives the qualification of its tolerance with respect to [[onna]] in the 5 grade scale.
 
   
  +
It is even more confusing than the three you already documented.
==Reuse the [[glyph]]==
 
   
  +
=r=
[[36B2]] [[&#X36B2;]] <ref>
 
  +
Your revised article is **much stronger than the previous version**. The structure is clearer, the technical explanation about Unicode vs glyphs is good, and the UTF-8 example is convincing. I will give **precise criticism** again, but now it is mostly **fine-tuning**.
https://util.unicode.org/UnicodeJsps/character.jsp?a=36B2
 
[[㚲]]
 
36B2
 
[[CJK]] UNIFIED IDEOGRAPH-36B2
 
Han Script
 
confuse: none ..
 
(kDefinition) small and weak, used in girl's name, a woman's feature; lady's face ..
 
(kJapanese) セン|⁠ショウ|⁠テン ..
 
</ref> [[セン]], [[⁠ショウ]], [[⁠テン]], yan, ten, small, weak
 
   
  +
I will separate it into **(1) technical correctness, (2) language corrections, (3) style/logic improvements, (4) small factual notes**.
[[597B]] [[&#X597B;]] <ref>
 
  +
[[奻]]
 
  +
==1. Technical accuracy (very good overall)==
597B
 
  +
CJK UNIFIED IDEOGRAPH-597B
 
  +
Your definition at the beginning is now excellent:
Han Script
 
  +
confuse: none ..
 
  +
> Onna is a set of three visually identical Unicode characters representing the kanji 女 (“woman”) that originate from different parts of the Unicode standard.
(kJapanese) ダン|⁠ナン ..
 
  +
</ref> [[ダン]], [[ナン]], dan, nan, quarrel, dispute
 
  +
This is **accurate and clear**.
  +
  +
Your table:
  +
* X2F25 ⼥ — Kangxi radical
  +
* X5973 女 — unified ideograph
  +
* XF981 女 — compatibility ideograph
  +
  +
is correct.
  +
  +
Your UTF-8 demonstration is also technically correct:
  +
<pre>
  +
| character | UTF-8 |
  +
| --------- | -------- |
  +
| ⼥ | E2 BC A5 |
  +
| 女 | E5 A5 B3 |
  +
| 女 | EF A6 81 |
  +
</pre>
  +
This is a **very good example** showing the difference between codepoints.
  +
  +
== 2. Language corrections==
  +
  +
A few sentences still sound slightly unnatural in English.
  +
  +
### A
  +
  +
You wrote:
  +
  +
> both the teachers of Japanese and the manuals of Japanese just ignore the confusion.
  +
  +
Better:
  +
  +
> both teachers of Japanese and Japanese textbooks usually ignore this confusion.
  +
  +
---
  +
  +
### B
  +
  +
You wrote:
  +
  +
> the dirty trick on side of the font designers
  +
  +
Better neutral wording:
  +
  +
> an unfortunate consequence of font design and historical encoding decisions.
  +
  +
---
  +
  +
### C
  +
  +
You wrote:
  +
  +
> the novice sees, that the error reproduces
  +
  +
Better:
  +
  +
> the novice notices that the error appears repeatedly
  +
  +
---
  +
  +
### D
  +
  +
You wrote:
  +
  +
> automatic treat of data
  +
  +
Better:
  +
  +
> automatic processing of data
  +
  +
---
  +
  +
### E
  +
  +
You wrote:
  +
  +
> copypasting to frame of a brouser
  +
  +
Two corrections:
  +
  +
> copy-pasting into the address bar of a browser
  +
  +
---
  +
  +
### F
  +
  +
You wrote:
  +
  +
> ChatGPT indicates, that there was not bad will
  +
  +
Better:
  +
  +
> ChatGPT indicates that there was no bad intent
  +
  +
== 3. Tone (still slightly emotional)==
  +
  +
You improved it a lot, but a few phrases still sound **too accusatory**.
  +
  +
For example:
  +
  +
> ignorance of teachers and authors of manuals
  +
  +
That may provoke disagreement.
  +
  +
Safer wording:
  +
  +
> the issue is rarely discussed in teaching materials.
  +
  +
== 4. Minor factual note ==
  +
  +
You wrote:
  +
  +
> At least three Unicode characters are related to sound Onna.
  +
  +
Strictly speaking:
  +
  +
Unicode characters are **not related to sounds**; they represent **characters**, not pronunciations.
  +
  +
More accurate wording:
  +
  +
> At least three Unicode characters correspond to the kanji 女, which can be pronounced *onna*.
  +
  +
== 5. Small typo ==
  +
  +
You wrote:
  +
  +
```
  +
ThrteeByteCharacter
  +
```
  +
  +
should be
  +
  +
```
  +
ThreeByteCharacter
  +
```
  +
  +
== 6. Very interesting section: “Reuse the glyph” ==
  +
  +
This is actually **one of the best parts** of the article.
  +
  +
Your examples:
  +
  +
* 妙
  +
* 妹
  +
* 姉
  +
* 好
  +
* 姦
  +
* 安
  +
  +
show how 女 appears as a **component** in many characters.
  +
  +
That section demonstrates the **graphical importance of the glyph** well.
  +
  +
== 7. Historical section ==
  +
  +
Your historical explanation is mostly correct, but one sentence should be softened:
  +
  +
> Unicode and many default fonts had been designed in century 20, while computation had been underdeveloped.
  +
  +
Better:
  +
  +
> Unicode was designed in the late 20th century when compatibility with existing character encodings was a major concern.
  +
  +
Because the goal was **compatibility**, not lack of computing power.
  +
  +
== 8. The censorship section ==
  +
  +
It is interesting but **philosophical rather than technical**.
  +
  +
You might consider adding one sentence to connect it:
  +
  +
Example:
  +
  +
> Because the character 女 refers to women, discussions about this character sometimes intersect with broader cultural debates about gender and social norms.
  +
  +
That makes the transition less abrupt.
  +
  +
== 9. Overall evaluation ==
  +
  +
Your **Unicode-confusion taxonomy project** is becoming clearer.
  +
  +
You now have several families:
  +
<pre>
  +
| page | confusion |
  +
| ---------- | ------------- |
  +
| TsukiGatsu | 月 radicals |
  +
| Onna | 女 radicals |
  +
| Sakana | fish radical |
  +
| Chikara | power radical |
  +
</pre>
  +
This is actually a **very interesting catalog of Unicode visual equivalence sets**.
   
  +
Very few resources document them systematically.
[[597D]] [[&#X597D;]] <ref>
 
[[好]]
 
597D
 
[[CJK]] UNIFIED IDEOGRAPH-597D
 
Han Script
 
confuse: none ..
 
(kDefinition) good, excellent, fine; well ..
 
(kJapanese) コウ|⁠このむ|⁠すく|⁠よい|⁠よし ..
 
</ref> [[コウ]], [[このむ]], [[すく]], [[⁠よい]], [[よし]], yoi, good
 
   
  +
==✅ **Overall assessment**==
[[5999]] [[&#X5999;]] <ref>
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=5999
 
[[妙]]
 
5999
 
[[CJK]] UNIFIED IDEOGRAPH-5999
 
Han Script
 
confuse: none ..
 
(kDefinition) mysterious, subtle; exquisite
 
(kJapanese) ミョウ|⁠ビョウ|⁠たえ
 
</ref> [[ミョウ]], [[⁠ビョウ]], [[⁠たえ]], miyou, mysterious, strange
 
   
  +
Strengths:
[[59B9]] [[&#X59B9;]] <ref>
 
  +
* clear definition
https://util.unicode.org/UnicodeJsps/character.jsp?a=59B9
 
  +
* correct Unicode analysis
[[妹]]
 
  +
* strong UTF-8 example
59B9
 
  +
* good references
[[CJK]] UNIFIED IDEOGRAPH-59B9
 
  +
* consistent wiki structure
Han Script
 
confuse: none ..
 
(kDefinition) younger sister ..
 
(kJapanese) マイ|⁠バイ|⁠メ|⁠いもうと|⁠いも ..
 
</ref>
 
[[マイ]], [[⁠バイ]], [[⁠メ]], [[⁠いもうと]], [[いも]], [[imouto]], younger sister
 
   
  +
Remaining improvements:
[[59C9]] [[&#X59C9;]]
 
  +
* slightly soften tone
<ref>
 
  +
* fix a few English phrases
https://util.unicode.org/UnicodeJsps/character.jsp?a=59C9
 
  +
* small technical wording adjustments
[[姉]]
 
59C9
 
[[CJK]] UNIFIED IDEOGRAPH-59C9
 
Han Script
 
confuse: none ..
 
(kDefinition) elder sister ..
 
(kJapanese) シ|⁠あね|⁠ねえさん ..
 
</ref> シ, [[⁠あね]], [[⁠ねえさん]], vasan, elder sister
 
   
  +
If you want, I can also show you something **extremely curious about 女 in Chinese character evolution**:
[[59E6]] [[&#X59E6;]] <ref>
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=59E6
 
[[姦]]
 
59E6
 
[[CJK]] UNIFIED IDEOGRAPH-59E6
 
Han Script
 
confuse: none ..
 
(kDefinition) adultery, debauchery; debauch ..
 
(kJapanese) カン|⁠ケン|⁠かしましい|⁠みだら
 
</ref> : [[カン]], [[ケン]], [[⁠かしましい]], [[みだら]] , [[midara]], making 3 women at once.
 
   
  +
there exists a **kanji built from three 女 characters simultaneously**, and its original meaning is quite surprising.
[[5B89]] [[&#X5B89;]] <ref>
 
https://util.unicode.org/UnicodeJsps/character.jsp?a=5B89
 
[[安]]
 
5B89
 
[[CJK]] UNIFIED IDEOGRAPH-5B89
 
Han Script
 
confuse: none ..
 
(kDefinition) peaceful, tranquil, quiet ..
 
(kJapanese) アン|⁠やすい|⁠いずくに|⁠いずくにか|⁠いずくんぞ|⁠やすんじる ..
 
</ref> [[アン]], [[やすい]], [[⁠いずくに]], [[いずくにか]], [[いずくにか]], [[いずくんぞ]], [[やすんじる]], yasashi, cheap.
 
   
  +
==tori==
==Historic context==
 
The [[Unicode]] and the many default fonts had been designed in century 20, while the computation had been underdeveloped. The printing techniques, contrary, already existed during centuries. This predetermined the attitude of the designers to the encoding and too fonts. The goal wad to reproduce the required glyph, on the screen or in the printing;
 
it was supposed that nobody cares, how is it encoded.
 
   
  +
Thank you, I make few more corrections.
In Century 21, the roles of a glyph and that of a character swap.
 
The character becomes the principal part of the textual information;
 
the [[glyph]]es are still needed for the Human reception of characters.
 
   
  +
>" Because the character 女 refers to women, discussions about this character sometimes intersect with broader cultural debates about gender and social norms."
Then, the lack of the unique encoding for a glyph becomes a problem; one needs some programming (see the example above) to patch the defects of the historic combination of the Unicode with existing fonts.
 
   
  +
"sometomes"??
ChatGPT indicates, that there was not bad will of the designers of the Unicode,
 
  +
Do you know at least one example where it does not intersect?
nor that of teachers of Japanese and authors of the manuals.
 
  +
How do you interpret the proverb «Cherchez la femme»?
They assumed, that their students never begin to write (nor analyze) [[character]]s in Japanese,
 
and never meet errors relates to the confusing graphical representation of the characters.
 
   
  +
Why do we need such an explanation? Is not it obvious?
==Warning==
 
  +
We need pictures to show the semantic of the term.
  +
In order push the Humanity to the Barbarian style, the [[offee]]s prohibit some pictures.
  +
It is kind of [[insider trading]]; the bad guys try to keep some information for the private use.
  +
For the same reason the Soviet [[offee]]s tried to keep in secret the approaching [[collapse of USSR]].
  +
For the same reason the Soviet [[offee]]s kept is secret the map of density of ionizing radiation after the Chernobyl.
  +
For the same reason the Japanese [[offee]]s tried to do the same after Fukushima.
  +
For the same reason the USA [[offee]]s keep in secret the origin of the full-scale USA-Iran war.
  +
Amassing, some authors still pretend that they believe that the war had been triggered by the menaces by some anonymous "[[Iranian negotiators]]".
   
  +
Fight for females is typical for animals; the [[offee]]s are not an exception.
Publications about characters of the [[Onna]] set are collected and analyzed in TORI with scientific goals.
 
   
  +
> "the issue is rarely discussed in teaching materials. "
The analysis and the interpretation above should not be interpreted as an appeal for the extrajudicial execution of the font/unicode designers who did not supply some popular [[glyph]]s with unique [[Unicode]] numbers.
 
   
  +
Can you find at lest one textbook on Japanese that alerts Students, that it is not sufficient to remember the view of a Kanji, but also need its number or its encoding, because some Kanji do not have an unique encoding yet?
The more civilized solution would be to convince them to develop some realistic default analogy of the fantastic [[Uniglif]], the font with bijective relation between [[glyph]]s and [[character]]s.
 
   
  +
While we did not see any crocodile flying, it is not a good style to write "Crocodiles are very rare to fly".
The description above may require correction(s) by a native [[Japanese]] speaker.
 
   
  +
>"**kanji built from three 女 characters simultaneously**, and its original meaning is quite surprising."
==References==
 
{{ref}}
 
   
  +
- There is nothing surprising. The util.unicode indicates the sound "Mi". This sound revers to number 3. It confirms that a woman has at least 3 partners simultaneously; I even found the corresponding picture and used it as illustration in article «[[姦]]». KGB (Roskomndzor) dislike my interpretations and, instead of to argue, they attack my server. I hope for more academic behavior from your side, even if you do not agree with my interpretation.
{{fer}}
 
   
  +
Taking into account the population of China, one can guess that the Chinese people are very skillful in various combinations of partners. I suspect, they keep these traditions and skills during many kilo years. Perhaps, even before Japanese people appeared as a nation.
==Keywords==
 
«[[Bijective graphical representation]]»,
 
«[[Chinese]]»,
 
«[[Confusion]]»,
 
«[[Female]]»,
 
«[[Japanese]]»,
 
«[[Kanji]]»,
 
«[[KanjiConfudal]]»,
 
«[[KanjiLiberal]]»,
 
«[[KanjiRadical]]»,
 
«[[Onna]]»,
 
«[[SomeU]]»,
 
«[[Tarja]]»,
 
«[[Unicode]]»,
 
«[[Uniglif]]»,
 
«[[Utf8]]»,
 
«[[Utf8table]]»,
 
«[[UtfH]]»,
 
«[[Woman]]»,
 
«[[X2F25]]» «[[&#X2F25;]]»,
 
«[[X5973]]» «[[&#X5973;]]»,
 
«[[XF981]]» «[[&#XF981;]]»,
 
   
  +
In such a way I express my doubts in your qualification of the original meaning as "quite surprising."
[[Category:Chinese]]
 
[[Category:Confusion]]
 
[[Category:Female]]
 
[[Category:Onna]]
 
[[Category:Japanese]]
 
[[Category:Kanji]]
 
[[Category:KanjiConfudal]]
 
[[Category:KanjiLiberal]]
 
[[Category:KanjiRadical]]
 
[[Category:Utf8]]
 
[[Category:Woman]]
 
[[Category:XF981]]
 

Revision as of 03:04, 17 March 2026


Thank you. I made some corrections you suggest. if you see more mistakes, let me know.

Following your sample, I have updated article https://mizugadro.mydns.jp/t/index.php/Onna

Can you criticize it? I copypast the source below.

r

Your **Onna** article is interesting and mostly technically correct. The structure is good and the examples help. I will give **constructive criticism in four categories**:

1. **technical accuracy (Unicode facts)**

2. **language/grammar improvements**

3. **logic and tone**

4. **structural suggestions**

I will be direct so the article becomes stronger.

1. Important technical clarification

Your core statement is almost correct, but needs **one correction**.

You wrote that Unicode has **three characters for “onna”**:

  • X2F25 ⼥
  • X5973 女
  • XF981 女

This is true **as code points**, but their roles are different.

| Code  | Character | Role                          |
| ----- | --------- | ----------------------------- |
| X5973 | 女         | real ideograph (normal kanji) |
| X2F25 | ⼥         | Kangxi radical symbol         |
| XF981 | 女         | compatibility ideograph       |

So the **primary character** is:

X5973 女

The others exist for **compatibility with historical encodings**.

Unicode therefore does **not consider them separate characters in meaning**, only in encoding history.

You may want to add one sentence like:

> Character X5973 is the standard ideograph. > Characters X2F25 and XF981 are compatibility forms used in radical lists or legacy encodings.

That will make the article **more academically accurate**.

2. Grammar corrections

Several sentences can be improved slightly.

      1. A

You wrote:

> To year 2026, yet, no unique Unicode number is assigned to glyph

Better:

> As of 2026, no unique Unicode number is assigned to the glyph

---

      1. B

You wrote:

> The newcomer meets the problem and first attributes it to his of her own mistake.

Better:

> A newcomer encounters the problem and first attributes it to his or her own mistake.

---

      1. C

You wrote:

> Even a specialists

Correct form:

> Even a specialist

or

> Even specialists

---

      1. D

You wrote:

> the characters corresponding to sound onna

Better:

> the characters corresponding to the sound *onna*

---

      1. E

You wrote:

> the software confuse the last two characters

Better:

> the software confuses the last two characters

3. Tone (important)

Your article occasionally becomes **very emotional**. For example:

> dirty trick on side of the font designers

> sabotage, silent conspiracy

For a **technical article**, this weakens credibility.

A neutral version would be stronger:

Instead of:

> dirty trick on side of the font designers

Write:

> a consequence of historical encoding decisions in Unicode and legacy character sets.

Instead of:

> sabotage

Write:

> an unfortunate compatibility artifact.

Readers will trust the article **much more**.

4. One conceptual point

You wrote:

> no unique Unicode number is assigned to glyph

Strictly speaking this is **normal in Unicode**.

Unicode encodes **characters**, not **glyphs**.

So multiple code points may intentionally share one glyph.

You might want to add a short explanation:

> Unicode encodes abstract characters rather than graphical glyphs. > Therefore different characters may share identical visual representations in many fonts.

That will clarify the situation.

5. The UTF-8 section

Your **PHP demonstration is excellent**. It clearly shows that the characters have different byte sequences.

Example:

⼥ e2 bc a5
女 e5 a5 b3
女 ef a6 81

This is actually a **very good technical illustration**.

6. The censorship section

This section is interesting socially, but it is **not strongly connected to the Unicode problem**.

You might consider either:

  • shortening it, or
  • creating a separate article about **Vestism and Onna**.

Otherwise the article changes from **typography → sociology** abruptly.

7. Small structural suggestion

At the beginning you might add a **one-sentence definition**, similar to what you did in TsukiGatsu.

Example:

> **Onna** is a set of three visually identical Unicode characters representing the kanji 女 (“woman”) that originate from different parts of the Unicode standard.

This helps readers immediately understand the page.

8. Overall evaluation

Strengths:

✔ clear concept ✔ good technical demonstration ✔ interesting taxonomy of confusion sets ✔ good references

Things to improve: • slightly neutralize tone • add Unicode explanation about characters vs glyphs • small grammar fixes • possibly shorten the sociological section

Your project is gradually becoming something like a **“taxonomy of Unicode glyph confusions.”** That is actually a **very original idea**.


If you want, I can also show you **one extremely strange Unicode case related to 女** where **four different code points render exactly the same glyph in many fonts**.

It is even more confusing than the three you already documented.

r

Your revised article is **much stronger than the previous version**. The structure is clearer, the technical explanation about Unicode vs glyphs is good, and the UTF-8 example is convincing. I will give **precise criticism** again, but now it is mostly **fine-tuning**.

I will separate it into **(1) technical correctness, (2) language corrections, (3) style/logic improvements, (4) small factual notes**.

1. Technical accuracy (very good overall)

Your definition at the beginning is now excellent:

> Onna is a set of three visually identical Unicode characters representing the kanji 女 (“woman”) that originate from different parts of the Unicode standard.

This is **accurate and clear**.

Your table:

  • X2F25 ⼥ — Kangxi radical
  • X5973 女 — unified ideograph
  • XF981 女 — compatibility ideograph

is correct.

Your UTF-8 demonstration is also technically correct:

| character | UTF-8    |
| --------- | -------- |
| ⼥         | E2 BC A5 |
| 女         | E5 A5 B3 |
| 女         | EF A6 81 |

This is a **very good example** showing the difference between codepoints.

2. Language corrections

A few sentences still sound slightly unnatural in English.

      1. A

You wrote:

> both the teachers of Japanese and the manuals of Japanese just ignore the confusion.

Better:

> both teachers of Japanese and Japanese textbooks usually ignore this confusion.

---

      1. B

You wrote:

> the dirty trick on side of the font designers

Better neutral wording:

> an unfortunate consequence of font design and historical encoding decisions.

---

      1. C

You wrote:

> the novice sees, that the error reproduces

Better:

> the novice notices that the error appears repeatedly

---

      1. D

You wrote:

> automatic treat of data

Better:

> automatic processing of data

---

      1. E

You wrote:

> copypasting to frame of a brouser

Two corrections:

> copy-pasting into the address bar of a browser

---

      1. F

You wrote:

> ChatGPT indicates, that there was not bad will

Better:

> ChatGPT indicates that there was no bad intent

3. Tone (still slightly emotional)

You improved it a lot, but a few phrases still sound **too accusatory**.

For example:

> ignorance of teachers and authors of manuals

That may provoke disagreement.

Safer wording:

> the issue is rarely discussed in teaching materials.

4. Minor factual note

You wrote:

> At least three Unicode characters are related to sound Onna.

Strictly speaking:

Unicode characters are **not related to sounds**; they represent **characters**, not pronunciations.

More accurate wording:

> At least three Unicode characters correspond to the kanji 女, which can be pronounced *onna*.

5. Small typo

You wrote:

``` ThrteeByteCharacter ```

should be

``` ThreeByteCharacter ```

6. Very interesting section: “Reuse the glyph”

This is actually **one of the best parts** of the article.

Your examples:

show how 女 appears as a **component** in many characters.

That section demonstrates the **graphical importance of the glyph** well.

7. Historical section

Your historical explanation is mostly correct, but one sentence should be softened:

> Unicode and many default fonts had been designed in century 20, while computation had been underdeveloped.

Better:

> Unicode was designed in the late 20th century when compatibility with existing character encodings was a major concern.

Because the goal was **compatibility**, not lack of computing power.

8. The censorship section

It is interesting but **philosophical rather than technical**.

You might consider adding one sentence to connect it:

Example:

> Because the character 女 refers to women, discussions about this character sometimes intersect with broader cultural debates about gender and social norms.

That makes the transition less abrupt.

9. Overall evaluation

Your **Unicode-confusion taxonomy project** is becoming clearer.

You now have several families:

| page       | confusion     |
| ---------- | ------------- |
| TsukiGatsu | 月 radicals    |
| Onna       | 女 radicals    |
| Sakana     | fish radical  |
| Chikara    | power radical |

This is actually a **very interesting catalog of Unicode visual equivalence sets**.

Very few resources document them systematically.

✅ **Overall assessment**

Strengths:

  • clear definition
  • correct Unicode analysis
  • strong UTF-8 example
  • good references
  • consistent wiki structure

Remaining improvements:

  • slightly soften tone
  • fix a few English phrases
  • small technical wording adjustments

If you want, I can also show you something **extremely curious about 女 in Chinese character evolution**:

there exists a **kanji built from three 女 characters simultaneously**, and its original meaning is quite surprising.

tori

Thank you, I make few more corrections.

>" Because the character 女 refers to women, discussions about this character sometimes intersect with broader cultural debates about gender and social norms."

"sometomes"?? Do you know at least one example where it does not intersect? How do you interpret the proverb «Cherchez la femme»?

Why do we need such an explanation? Is not it obvious? We need pictures to show the semantic of the term. In order push the Humanity to the Barbarian style, the offees prohibit some pictures. It is kind of insider trading; the bad guys try to keep some information for the private use. For the same reason the Soviet offees tried to keep in secret the approaching collapse of USSR. For the same reason the Soviet offees kept is secret the map of density of ionizing radiation after the Chernobyl. For the same reason the Japanese offees tried to do the same after Fukushima. For the same reason the USA offees keep in secret the origin of the full-scale USA-Iran war. Amassing, some authors still pretend that they believe that the war had been triggered by the menaces by some anonymous "Iranian negotiators".

Fight for females is typical for animals; the offees are not an exception.

> "the issue is rarely discussed in teaching materials. "

Can you find at lest one textbook on Japanese that alerts Students, that it is not sufficient to remember the view of a Kanji, but also need its number or its encoding, because some Kanji do not have an unique encoding yet?

While we did not see any crocodile flying, it is not a good style to write "Crocodiles are very rare to fly".

>"**kanji built from three 女 characters simultaneously**, and its original meaning is quite surprising."

- There is nothing surprising. The util.unicode indicates the sound "Mi". This sound revers to number 3. It confirms that a woman has at least 3 partners simultaneously; I even found the corresponding picture and used it as illustration in article «». KGB (Roskomndzor) dislike my interpretations and, instead of to argue, they attack my server. I hope for more academic behavior from your side, even if you do not agree with my interpretation.

Taking into account the population of China, one can guess that the Chinese people are very skillful in various combinations of partners. I suspect, they keep these traditions and skills during many kilo years. Perhaps, even before Japanese people appeared as a nation.

In such a way I express my doubts in your qualification of the original meaning as "quite surprising."